I wrote a PostgreSQL patch to make materialized view refreshes O(delta) instead of O(total)
Posted by Inkbot_dev@reddit | programming | View on Reddit | 12 comments
CommanderKeynes@reddit
Good luck with this patch. It's very difficult to get stuff into postgres. I had a patch that has been sitting there for 2 years, multiple major contributors thought it was great, and was subsequently soft rejected by a committer for a decent reason. Very high standards. Worth it to be persistent. You get a coin if you are able to get your patch in. Pretty neat.
thisisjlw@reddit
Neat, this looks like it would work great for my usecase. I wanted to pre-crunch data for my data analytics software using a materialized view. Only I wanted to pre-crunch the results partitioned by `customer_id`.I ended up using a regular table with a `DELETE FROM table WHERE customer_id = ` + `INSERT` in order to re-crunch results for a given `customer_id`. This feature would save me from managing the delta myself and just run a materialized view refreshing on `customer_id`. Nice work!
TheBigLobotomy@reddit
This is awesome! This is half the reason my team uses timescaleDB, but we probably would have used this functionality instead of timescale if it were an option.
Nice
Inkbot_dev@reddit (OP)
Great to hear that type of feedback. It's been a personal pain point for me as well.
I hadn't touched C since I used it for a semester in college, but I figured I should try to pick up what I need to implement this. Wasn't too bad, especially with being able to have an LLM translate concepts that I know well (Java, Python) into their C counterparts, and just looking at how the existing codebase did things.
I have maybe 3 days of work on the actual implementation as well as iterating on some bugs found, and another couple days on writing emails, writing this blog post, etc, and another couple days on writing a benchmark that actually tests this feature out, and running it multiple times.
Just for anyone else wondering what type of time commitment was necessary for someone to contribute code to Postgres for the first time.
glenrhodes@reddit
This is the kind of patch that should just be in Postgres core. Full refresh on every REFRESH MATERIALIZED VIEW is one of those design decisions that made sense in 2000 and just hasn't aged well. Would love to see the incremental view maintenance work finally land properly.
Inkbot_dev@reddit (OP)
> one of those design decisions that made sense in 2000
The worst part is that mat views were a very late addition to PG in general, and they haven't really improved since. They were added in PG 9.3 (2013)
>This is the kind of patch that should just be in Postgres core
Yup, i've got it as part of the PG 20 commitfest, because I missed the deadline for PG 19: https://commitfest.postgresql.org/patch/6305/
Fingers crossed I can get it accepted.
hokkos@reddit
it a little bit more complex than that: https://wiki.postgresql.org/wiki/Incremental_View_Maintenance
Inkbot_dev@reddit (OP)
Yeah, I've seen that page and waited on many of the prior attempts to get into core. That's why I'm not trying to implement full incremental view refresh as a feature (yet). I also don't view this as competition for those efforts. For the use cases they support, it will likely be faster to have whatever traditional IVM implementations gets into core handle keeping it up to date for you. This is just a building block that developers can use to keep their views updated, and they are responsible for actually making sure that the logic for that refresh is correct. At least for now. Not every type of materialized view will be able to use this effectively, but there are a huge number of workloads that could use it.
theIncredibleAlex@reddit
never ran into this problem, but this looks incredibly cool! elegant solution
Inkbot_dev@reddit (OP)
Really depends on the types of applications you are writing. I have been working on financial applications for the last 15 years, and they are very useful patterns for that type of application.
I've just always had to do it using regular tables and functions and triggers in the past because materialized views as they were implemented were useless for just about every use case I had.
I saw a live stream on YouTube of some postgres developers talking about the feature and trying to implement it about 6 months ago. They didn't get all that far in the live stream, and I just picked it up and tried my hand at implementing it.
Mysterious-Rent7233@reddit
I hope your patch gets accepted! That sounds like a big improvement!
AutoModerator@reddit
r/programming is not a place to post your project, get feedback, ask for help, or promote your startup.
Technical write-ups on what makes a project technically challenging, interesting, or educational are allowed and encouraged, but just a link to a GitHub page or a list of features is not allowed.
The technical write-up must be the focus of the post, not just a tickbox-checking exercise to get us to allow it. This is a technical subreddit.
We don't care what you built, we care how you build it.
I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.