Introducing pg_lake: Integrate Your Data Lakehouse with Postgres
Posted by craigkerstiens@reddit | programming | View on Reddit | 36 comments
Posted by craigkerstiens@reddit | programming | View on Reddit | 36 comments
Adventurous-Pin6443@reddit
This sub reminds me standup comic audition.
VictoryMotel@reddit
Does the data lake house have a data dock and a data speed boat for data skiing and data fishing? Is it in a data cove so there are less data waves?
mcel595@reddit
Date like truly is a funny name for throw all your trash in the pile we will figure it out later
azirale@reddit
While it is fun to meme on these terms, they fit in the theme with existing terms. Moving and transforming data getting it from a source to destination is a 'pipeline'. A constant flow of data is a 'stream'. A large storage to collect freeform data is a 'lake' and when it gets filthy it is a 'swamp'.
On the more traditional fully structured side you would have a 'warehouse' that orders, categorises, and structures all your data. Within that you may create 'datamarts' that are small target collections for easy consumption.
Bridging the 'lake' storage component into a 'warehouse' catalog and query engine, gets you the portmanteau of 'lakehouse'. The terms all have sensible connotations to people operating in the space.
FeepingCreature@reddit
Yes, the weird name that nobody takes seriously fits in well with a bunch of other names that also nobody takes seriously. There's one term in there that has serious use.
Ais3@reddit
what do u mean nobody takes them seriously? these are widely used terms in the industry
FeepingCreature@reddit
I think they're widely used among people who write marketing material and people who read marketing material. I don't think they're widely used among developers, though I could be wrong of course.
Ais3@reddit
i dunno what u are on about. im a developer and use concepts like streams and pipelines daily, and datalakes weekly
FeepingCreature@reddit
Sure, but streams and pipelines long predate 'datalakes' and have nothing directly to do with them.
Do you use that term in any relation other than a particular vendor who decided to use it for a particular product?
Ais3@reddit
who said that they’re directly related? datalake is just a new concept. and i mean database was coined by a guy from IBM, do u think that is just a marketing term?
HotlLava@reddit
Programmers in general don't have a lot of reasons to interact with data lakes and/or warehouses, it's more of an infrastructure/ops thing. But those who implement the storage backends for these lakes and warehouses will be familiar with the terms.
aykcak@reddit
I decided to look up what a data lake house is. I now have the opinion that it is a term for sugarcoating that mess big companies make when they have no idea or know how to deal with the massive amounts of unstructured big data they keep collecting in hopes of it somehow leading them to make a profit
lazazael@reddit
a lake house and the plot is worthy
Solokiller@reddit
Is there a data shark to jump?
wrosecrans@reddit
Data shark doo doo doo doo doo doo, data shark doo doo doo doo doo doooo.
Elegant-Sense-1948@reddit
Is the data shark the one you jump over or is it the data shark you jump in the back alley?
inotocracy@reddit
You missed a good opportunity to incorporate stream in there somewhere.
BlueGoliath@reddit
Do you ever get that feeling of Deja Vu?
MagicWishMonkey@reddit
I'll be honest the first time I head someone talking about a data lakehouse i thought they were bullshitting me. I really hate "big data"
VictoryMotel@reddit
Its as if there is a whole generation that has never heard of a filesystem on a network.
enricojr@reddit
It'd be nice if there were a data mart nearby, for easy shopping :-)
Somepotato@reddit
I've literally never heard anyone call a data lake a data lake house
azirale@reddit
A 'lakehouse' is when you using data warehousing style structure and querying, but over data stored in a separate service that operates like a data lake.
Unlike a data lake you do have structure and controls around the data. Unlike a warehouse you have control of the data service and layout, and can access the data directly without having to go through the warehouse execution service itself.
Somepotato@reddit
Hm. We have a setup that is that (we use postgres as our data lake as opposed to the typical distributed file store) so it is directly queriable, but it makes the transition to the warehouse a lot easier.
FenixR@reddit
its supposed to be the best from a Data Lake and a Data Warehouse into one structure or something.
Somepotato@reddit
Except they're distinct for very important reasons, rarely should they be in the same area.
echanuda@reddit
I’m not sure I trust your word here considering you didn’t know what a data lakehouse was until now lol
Somepotato@reddit
I mean anyone can come up with any term, but I work with terabytes of data in and out daily, so shrug.
StrangeRabbit1613@reddit
How’s the fishing at this lakehouse?
Nwallins@reddit
So...
lakehouseis an industry term that combines the sensibilities of a 'data warehouse' with a 'data lake'.https://www.databricks.com/glossary/data-lakehouse
BlueGoliath@reddit
Data Lakehouse lmao
elastic_psychiatrist@reddit
See as literally zero of the other dozen commenters so far have made a substantive yet...
This is pretty cool. There's been lots happening with postges OLAP extensions recently, but this looks like the most end-to-end so far. Happy to see the Cruncy Data folks still building product from within Snowflake.
Now who's gonna take on the task of adding arrow-native data transfer for querying out of postgres (i.e. something like FlightSQL)?
gimpwiz@reddit
My data... what? lakehouse? I don't think I can afford one of those. I mean maybe somewhere deep in Montana but then getting to it will be a pain.
RoomyRoots@reddit
r/dataengineering
dlsspy@reddit
I’m a pretty big ducklake fan.
combinatorial_quest@reddit
... ... ...
I know its not your fault OP, but that title is a crime!