Things you didn't know about (Postgres) indexes

[-]

Safe-Pin-5472@reddit

Very well written article, explains in an understandable way with good examples. Like other people said, TIL WHERE and INCLUDE can be applied to indexes!

[-]

OzoneGrif@reddit

These are all very basic features... if you don't know that, it is kinda worrying. Ready the documentation of the tools you use!

[-]

applemasher@reddit

Wow, I never knew about partial indexes before. Very interesting.

[-]

MrChocodemon@reddit

>Me knowing nothing about (Postgres) indexes (indices) "Wow, I really didn't know that"

[-]

Ecksters@reddit

Didn't a recent Postgres version improve usage of partial matches on composite indexes?

[-]

brockvenom@reddit

Such a good read in this age if AI. Thanks for posting

[-]

IceIllustrious5529@reddit

Partial indexes can be a game-changer for query performance, especially when you know how to use them. Seems like a solid read for a quick refresher.

[-]

I’ve struggled to find a simple, clear explanation of Postgres indexes and how to use them to best effect for some time, thanks for the writeup! This is super useful, I’m saving, and I’m going to share with my team at work too.

[-]

Sillocan@reddit

Great write up! I think most people run into these issues at some point

[-]

hipsterdad_sf@reddit

Partial indexes are genuinely one of the most underused Postgres features. The classic example that saves teams real money: if you have a status column and 95% of rows are "completed" but you only ever query for "pending" or "failed" rows, a partial index with WHERE status != 'completed' gives you a tiny index that covers all your actual queries.

The INCLUDE clause for covering indexes is another one that more people should know about. Before Postgres 11, you'd either add columns to the index key (which affects ordering and bloats the tree) or accept an extra heap fetch. INCLUDE lets you store extra columns in the leaf pages without affecting the index structure. The performance difference on read heavy workloads can be dramatic because you avoid the heap lookup entirely.

One thing the article could have emphasized more: EXPLAIN ANALYZE is the only way to know if your index is actually being used. I've seen teams create elaborate multi column indexes that the planner completely ignores because the table is small enough for a sequential scan to be faster, or because the column selectivity doesn't justify an index lookup. Always verify with real data volumes, not with your dev database that has 50 rows.

[-]

programming-ModTeam@reddit

No content written mostly by an LLM. If you don't want to write it, we don't want to read it.

[-]

_meegoo_@reddit

EXPLAIN ANALYZE is the only way to know if your index is actually being used.

It's not the only way. You can also check statistics. pg_stat_all_indexes I think

[-]

hipsterdad_sf@reddit

Fair point, and good addition from the reply below too. pg_stat_all_indexes is actually better for production monitoring since you can track idx_scan counts over time and spot which indexes are gathering dust. And yeah, plain EXPLAIN without ANALYZE will show you the planned index usage. ANALYZE just confirms the actual execution matches. I overstated that.

[-]

gjionergqwebrlkbjg@reddit

And you don't have to use ANALYZE, it the plan itself is fully sufficient to tell if index is going to be used or not.

[-]

jmickeyd@reddit

Small caveat about covering indexes: due to the MVCC implementation, full tuple visibility data only exists in the table heap. Which means the table data has to be read anyway to make sure that the indexed row is visible to the current transaction. There is a trick though. When a VACUUM runs and clears out old rows, it sets a bit in a bitmap marking that page in the table as 100% visible, which causes the visibility check to be skipped, and a true index only lookup to occur. Unfortunately if there is any modification to that page, the whole page is removed from the bitmap and any rows on that page have to be checked for all reads until another VACUUM occurs. This means that based on your data patterns and autovacuum configuration, the covering index can be anywhere between huge performance win and worthless extra unused data clogging up your index.

[-]

FlyingRhenquest@reddit

Nice to see someone talking about database indexes and how they work. Back in the day the team would have a DBA to handle stuff like that and us ordinary application programmer wouldn't be allowed to touch the database like that. And in this day the team doesn't have a DBA and the database and index layout tends to be crap because no one ever got hands on experience setting all that stuff up.

I'm reasonably comfortable working with a database but I had to go out of the way and install a Postgres instance on my home computer in the 90s to get any hands on experience with that side of things.

[-]

lottspot@reddit

Thank you for taking the time to do something AI can't do. This is fascinating, and relevant to so many disciplines

[-]

NotTreeFiddy@reddit (OP)

I'm sure LLMs could do a decent job of summarizing the information, perhaps better than I have, given the right prompt. But in a world that's getting filled with more and more slop, I'm only too happy to add something else human written.

[-]

therealgaxbo@reddit

Write performance is not the reason to use include to create covering indexes. There will be very little difference in performance.

The real reasons are:

It allows datatypes without an appropriate operator class to be included (e.g. you can't just add a box column to a btree)
It allows you to include columns in a unique index without changing semantics, like create unique index on users(email) include (user_id)

[-]

NotTreeFiddy@reddit (OP)

Thanks for pointing this out. You're right. I was going to defend the point on performance, but it's definitely overstated.

I've updated the article with your correction and additional points, and I've credited you by linking back here - I hope that's okay.

[-]

markhemstead@reddit

Nicely written article! I had no idea about WHERE and INCLUDE either

[-]

gjionergqwebrlkbjg@reddit

It seems like very surface level things.

As an example - covering indexes don't work well in postgres due to MVCC. Indexes don't know if a tuple exists in the given transaction or not, you still have to look up the page in visibility maps and possibly look up the actual page contents.

[-]

NotTreeFiddy@reddit (OP)

I had the same exact reaction when I learned about it.
Thanks!

[-]

ThatAgainPlease@reddit

Partial, covering, and function indices are discussed. Examples seemed reasonable on a quick skim. If you don’t know about those things or want a reminder this is a good article.