Pandas feels clunky coming from R. What about Haskell?

[-]

Buttleston@reddit

If you're starting fresh start from polars

If you love R use R. Tell me why you need to switch and maybe I can tell you what to switch to

[-]

Transformations where things like type errors and variable name mistakes can be caught at compile time rather than hours into a pipeline. Spark datasets in scala provide this sort of assurance but to add and remove columns you have to define new case classes. I’m not sure what else is solved that problem that far up the stack. Polars Lazy API does but mostly does runtime reflection so an error down in your pipeline could also waste compute.

[-]

BroBroMate@reddit

Some feedback future posts, I found your code colour scheme made it hard to read, not much contrast between the background and the text.

[-]

ritchie46@reddit

Polars verifies those things before running the query at query planning, not hours in compute later.

You cannot do it at compile times, as often schemas in files are unknown until you read the file(s).

If you compile a new program for every file you can do it

[-]

m-chav@reddit (OP)

Agreed. Polars mostly solves this with query planning and failures are caught really early. Does the Rust version have a similar meta programming system that allows you to move these checks to compile time? That way if you have: do expensive compute #1, save results, then do expensive compute #2 (depends on #1 but has an error) - it crashes during compilation instead of after computation 1?

[-]

ritchie46@reddit

No it doesn't. DataFrames and Columns are type erased.

[-]

m-chav@reddit (OP)

I see. In the Haskell I decided to make them type erased as well but there is a typed veneer on top that does schema and column tracking you can opt into (outlined towards the end of the article). In principle such a thing could also wrap polars.

[-]