Things you didn't know about (Postgres) indexes
Posted by NotTreeFiddy@reddit | programming | View on Reddit | 28 comments
Posted by NotTreeFiddy@reddit | programming | View on Reddit | 28 comments
Safe-Pin-5472@reddit
Very well written article, explains in an understandable way with good examples. Like other people said, TIL WHERE and INCLUDE can be applied to indexes!
OzoneGrif@reddit
These are all very basic features... if you don't know that, it is kinda worrying. Ready the documentation of the tools you use!
applemasher@reddit
Wow, I never knew about partial indexes before. Very interesting.
MrChocodemon@reddit
>Me knowing nothing about (Postgres) indexes (indices) "Wow, I really didn't know that"
Ecksters@reddit
Didn't a recent Postgres version improve usage of partial matches on composite indexes?
brockvenom@reddit
Such a good read in this age if AI. Thanks for posting
IceIllustrious5529@reddit
Partial indexes can be a game-changer for query performance, especially when you know how to use them. Seems like a solid read for a quick refresher.
kareesi@reddit
I’ve struggled to find a simple, clear explanation of Postgres indexes and how to use them to best effect for some time, thanks for the writeup! This is super useful, I’m saving, and I’m going to share with my team at work too.
Sillocan@reddit
Great write up! I think most people run into these issues at some point
hipsterdad_sf@reddit
Partial indexes are genuinely one of the most underused Postgres features. The classic example that saves teams real money: if you have a status column and 95% of rows are "completed" but you only ever query for "pending" or "failed" rows, a partial index with WHERE status != 'completed' gives you a tiny index that covers all your actual queries.
The INCLUDE clause for covering indexes is another one that more people should know about. Before Postgres 11, you'd either add columns to the index key (which affects ordering and bloats the tree) or accept an extra heap fetch. INCLUDE lets you store extra columns in the leaf pages without affecting the index structure. The performance difference on read heavy workloads can be dramatic because you avoid the heap lookup entirely.
One thing the article could have emphasized more: EXPLAIN ANALYZE is the only way to know if your index is actually being used. I've seen teams create elaborate multi column indexes that the planner completely ignores because the table is small enough for a sequential scan to be faster, or because the column selectivity doesn't justify an index lookup. Always verify with real data volumes, not with your dev database that has 50 rows.
programming-ModTeam@reddit
No content written mostly by an LLM. If you don't want to write it, we don't want to read it.
_meegoo_@reddit
It's not the only way. You can also check statistics.
pg_stat_all_indexesI thinkhipsterdad_sf@reddit
Fair point, and good addition from the reply below too. pg_stat_all_indexes is actually better for production monitoring since you can track idx_scan counts over time and spot which indexes are gathering dust. And yeah, plain EXPLAIN without ANALYZE will show you the planned index usage. ANALYZE just confirms the actual execution matches. I overstated that.
gjionergqwebrlkbjg@reddit
And you don't have to use ANALYZE, it the plan itself is fully sufficient to tell if index is going to be used or not.
jmickeyd@reddit
Small caveat about covering indexes: due to the MVCC implementation, full tuple visibility data only exists in the table heap. Which means the table data has to be read anyway to make sure that the indexed row is visible to the current transaction. There is a trick though. When a
VACUUMruns and clears out old rows, it sets a bit in a bitmap marking that page in the table as 100% visible, which causes the visibility check to be skipped, and a true index only lookup to occur. Unfortunately if there is any modification to that page, the whole page is removed from the bitmap and any rows on that page have to be checked for all reads until anotherVACUUMoccurs. This means that based on your data patterns and autovacuum configuration, the covering index can be anywhere between huge performance win and worthless extra unused data clogging up your index.FlyingRhenquest@reddit
Nice to see someone talking about database indexes and how they work. Back in the day the team would have a DBA to handle stuff like that and us ordinary application programmer wouldn't be allowed to touch the database like that. And in this day the team doesn't have a DBA and the database and index layout tends to be crap because no one ever got hands on experience setting all that stuff up.
I'm reasonably comfortable working with a database but I had to go out of the way and install a Postgres instance on my home computer in the 90s to get any hands on experience with that side of things.
lottspot@reddit
Thank you for taking the time to do something AI can't do. This is fascinating, and relevant to so many disciplines
NotTreeFiddy@reddit (OP)
I'm sure LLMs could do a decent job of summarizing the information, perhaps better than I have, given the right prompt. But in a world that's getting filled with more and more slop, I'm only too happy to add something else human written.
therealgaxbo@reddit
Write performance is not the reason to use
includeto create covering indexes. There will be very little difference in performance.The real reasons are:
It allows datatypes without an appropriate operator class to be included (e.g. you can't just add a
boxcolumn to a btree)It allows you to include columns in a unique index without changing semantics, like
create unique index on users(email) include (user_id)NotTreeFiddy@reddit (OP)
Thanks for pointing this out. You're right. I was going to defend the point on performance, but it's definitely overstated.
I've updated the article with your correction and additional points, and I've credited you by linking back here - I hope that's okay.
markhemstead@reddit
Nicely written article! I had no idea about WHERE and INCLUDE either
gjionergqwebrlkbjg@reddit
It seems like very surface level things.
As an example - covering indexes don't work well in postgres due to MVCC. Indexes don't know if a tuple exists in the given transaction or not, you still have to look up the page in visibility maps and possibly look up the actual page contents.
boysitisover@reddit
Someone tell Claude about this I don't wanna read
BuriedStPatrick@reddit
Great article, demystified a lot for me, very well written and clear.
Div64@reddit
Very interesting read and super clear. Thanks!
dagguh2@reddit
WHERE and INCLUDE in indexes, TIL! Well explained
NotTreeFiddy@reddit (OP)
I had the same exact reaction when I learned about it.
Thanks!
ThatAgainPlease@reddit
Partial, covering, and function indices are discussed. Examples seemed reasonable on a quick skim. If you don’t know about those things or want a reminder this is a good article.