Polars code runs slower on 128-core EC2

Posted by Popular-Sand-3185@reddit | Python | View on Reddit | 61 comments

Disclaimer: I am not sure this post is appropriate for r/LearnPython since it's not a question of "how to do something in Python", rather I am looking for a lower-level discussion for why my Python application performs poorly on a significantly more powerful server. Hence I'm posting it here.

The problem:

I have a relatively complex data pipeline that is written in Polars. On my local machine with 12 cores, the pipeline finishes in about 1200ms. On my 128-core EC2, it takes 13000ms to complete. I have tried setting the POLARS_MAX_THREADS parameter to 12 on the EC2, and it's still slower.

I am using a TMPFS partition on both machines to read the data into the pipeline directly from RAM. Both my machine and the EC2 have DDR5 RAM so I think they should be comparable.

Anyone have any ideas why the pipeline would run much slower on the EC2?