Pandas feels clunky coming from R. What about Haskell?
Posted by m-chav@reddit | programming | View on Reddit | 12 comments
Posted by m-chav@reddit | programming | View on Reddit | 12 comments
Buttleston@reddit
If you're starting fresh start from polars
If you love R use R. Tell me why you need to switch and maybe I can tell you what to switch to
m-chav@reddit (OP)
Transformations where things like type errors and variable name mistakes can be caught at compile time rather than hours into a pipeline. Spark datasets in scala provide this sort of assurance but to add and remove columns you have to define new case classes. I’m not sure what else is solved that problem that far up the stack. Polars Lazy API does but mostly does runtime reflection so an error down in your pipeline could also waste compute.
BroBroMate@reddit
Some feedback future posts, I found your code colour scheme made it hard to read, not much contrast between the background and the text.
ritchie46@reddit
Polars verifies those things before running the query at query planning, not hours in compute later.
You cannot do it at compile times, as often schemas in files are unknown until you read the file(s).
If you compile a new program for every file you can do it
m-chav@reddit (OP)
Agreed. Polars mostly solves this with query planning and failures are caught really early. Does the Rust version have a similar meta programming system that allows you to move these checks to compile time? That way if you have: do expensive compute #1, save results, then do expensive compute #2 (depends on #1 but has an error) - it crashes during compilation instead of after computation 1?
ritchie46@reddit
No it doesn't. DataFrames and Columns are type erased.
m-chav@reddit (OP)
I see. In the Haskell I decided to make them type erased as well but there is a typed veneer on top that does schema and column tracking you can opt into (outlined towards the end of the article). In principle such a thing could also wrap polars.
Buttleston@reddit
Hah I didn't even see there was an article and thought you were asking a question
IanisVasilev@reddit
polars may fix some shortcomings of pandas, but is nowhere near a panacea. Especially usability-wise.
edimaudo@reddit
might want to take a look at polars
Ralwus@reddit
Other than groupby apply, what feels clunky in pandas?
Able-Bridge-8037@reddit
e for a while, this finally made me comment