Modules that perform JIT at runtime
Posted by ttoommxx@reddit | Python | View on Reddit | 36 comments
I have been trying to develop high performance functions in Python, and I am looking for packages that can compile blocks of code. I am aware of packages like Nuitka, MyPyc etc, I used them before and they work wonderfully (I especially like mypyc), however I now need to develop code for a large code base and we are restricted to pushing exclusively .py packges.
To overcome this issue I used numba a little bit, works really well but it's extremely limited in its usage. I wonder if there was any other package out there that let's you compile a function at runtime by just decorating it.
EducationalTie1946@reddit
Your best bets are jax and numba. Additionally using modules like numpy, multiprocessing/threads and using the correct data types will help you a lot. And if you are only restricted to using .py files you could just make a seperate module with mypyc functions, publish that on pypi or make a command importing the github repo with that code at runtime. This could technically be correct in the eyes of the project requirements.
ttoommxx@reddit (OP)
We have pypi routing to our local server, obviously I cannot install whatever I want from the internet.
I was using numba before, now I am going to try with jax. Numba works really well but it's too specific.
EducationalTie1946@reddit
It isnt whatever you want. Its a github project you would make and you would publish om girhub and then you would download. It isnt some random repo
ttoommxx@reddit (OP)
I mean I cannot download anything I want via pip, out local pip install searches exclusively on packages that are approved by the organization, and they would never approve something I publish, it would start a process that would take months, for each single update of such package
char101@reddit
Embed your .so files as string and extract them at runtime?
Crazy_Anywhere_4572@reddit
Do you know C or C++? If yes, then you can write C code and import it with ctypes.
ttoommxx@reddit (OP)
I did write a module using pure C before. The issue is that I cannot push .so files to the repo, work for a big organization and everything needs to be a python 3.9 script for obvious reasons.
bronzewrath@reddit
If possible try to update to python 3.12. The have been lots of performance improvements in Python 3.11 and 3.12.
I have a script that I run everyday and it processes millions of CSV rows. I got almost 2x speed improvements just updating from 3.10 to 3.12.
caleb@reddit
For the best performance-vs-simplicity trade-off, Numba is by far your best option. It doesn't support all Python types, and perhaps that's what you meant by "limited", but for really high performance you're not going to create a lot of class instances anyway, regardless of what you use, and this is the same approach even in other programming languages with optimizing compilers. You're going to structure like data in compact arrays of native types to exploit locality in the various CPU caches. Numba is exceptionally good at processing these. Numba can even automatically unroll some loops with SIMD instructions. Numba is also easy to use interactively unlike many of the AOT compiler options, which is a signficant advantage if your workflow involves a lot of interactivity.
ttoommxx@reddit (OP)
Yeah I used numba before, I was looking for something that is as flexible as mypyc even at the cost of slightly less performance compared to numba. Any idea if there is another project like what I am looking for out there?
caleb@reddit
From what you've said, I think you already know all the options. It sounds like your employer will only allow .py files in the repo, and if you can't build wheels (say using mypyc) and put them on an internal registry, then a JIT really is probably your only option.
In my previous response, I was really trying to highlight data-oriented design as the optimal way to improve performance as close to what the hardware allows. This is a language-agnostic perspective. In the python world, numba (and numpy) certainly facilitate this, but you have to code for it. I don't know of any compilers that automatically convert class-based OOP code into vectorized calculation streams automatically. Even in C++/Rust etc. you want to avoid pushing large structs through calculation pipelines because the cpu cache needs too many updates.
ttoommxx@reddit (OP)
That's fair, thank you for clarifying! Will probably stick to Numba and try to break down things more and more
caleb@reddit
Good luck! Instead of thinking in terms of sequences of classes like
list[MyClass(a=1, b=2)]
, change that to something liketuple[np.array, np.array]
or even justnp.array
with dim nx2 say. And then, instead of putting a calculation into a method onMyClass
, rather write a numba-decorated function that receives yournp.array
values. It's not pretty but it's fast. You can save even more memory by using 4-byte ints and 4-byte floats as the dtype for the arrays, when possible, and this improves cpu cache efficiency even more. Sometimesparallel=True
and prange is also applicable and you get some multicore for free.You don't have to do this for all code, only when you need to crunch a bunch of data as fast as possible.
ChurchillsLlama@reddit
Why is everyone using these compilers? I’m a data engineer so my scope is of course limited but genuinely curious in what these real world use cases are. Maybe it’ll help me up my game.
thuiop1@reddit
Mostly performance. But really, using numpy/pandas/polars/... will get you like, 90% of the way there. Numba can help you scrap that extra performance and do stuff like parallelize your code with little effort.
ttoommxx@reddit (OP)
Numba is a bless, the improvement is incredible and makes it incredibly easier to parallelize simple for loops
New-Watercress1717@reddit
Sadly, all python 'jit' decorator packages only target numeric/scientific use cases.
numba/lpython/torchscript/jax (I recall there is one more, whose target use case 3d rendering) all are numeric. If any of these support more general python, they will be slower than cpython in those cases. I recall reading that the reason that numba can't optimize string is due to the fact that some cpython api's are not public.
Sticking with mypyc is your best bet, assuming you don't want to write cython code and want to keep writing python. I know there is currently an attempt to give cpython a jit, but it is currently not making python any faster(according to the macro benchmarks). Maybe that attempt will give some 3ed party guys better c-apis to write better jits, who knows.
ttoommxx@reddit (OP)
Thank you!
reddisaurus@reddit
What are you trying to do? Current Numba can do almost everything except recursion, nor JIT third-party libraries. I use it extensively.
ttoommxx@reddit (OP)
I have am optimizing part of our codebase, but we work with big objects that can inevitably passed here and there. Numba seems to be fitted for running small functions that use numba only in my experience
reddisaurus@reddit
Numba has a JITClass decorator, and other JITClasses can be assigned inside of it. You will need to define a static type for these objects, as there is nothing for free… or add methods to have them emit cleaner data structures.
denehoffman@reddit
How has nobody mentioned Jax yet? I guess it only applies to numeric calculations though
thuiop1@reddit
I came here to say this.
ttoommxx@reddit (OP)
Thank you! I am going to have a look at it now
EveningAd3467@reddit
You should. I did a lot with JAX and it is great!!!
EveningAd3467@reddit
You should. I did a lot with JAX and it is great!!!
Oenomaus_3575@reddit
Maybe Cython? It's not that hard, but not as fat as real JIT
ttoommxx@reddit (OP)
It is a bit annoying to have to use different syntax. Rather than cython I will like to use mypyc and put everything in one external module. Is there a numba-like decorator for cython that does the job of compiling a single function within a .py script?
Oenomaus_3575@reddit
Idk about a decorator but Cython has been working on a pure python syntax, so you basically only use (Cython) type annotations. So check that out.
ttoommxx@reddit (OP)
Will do :)
Barafu@reddit
Stupid limitations require stupid solution. Use PyO3+maturin to create a single-file Python wheel. Store the contents of the wheel file in a literal inside your code. Write it down to a temp file before using.
Obliterative_hippo@reddit
/u/ttoommx do you need to support multiple architectures or platforms? If so, you can package your Python source in a tarball which is compiled at install time.
But if you only have one target in mind, say x86-64 Linux, then writing the compiled bytes to a temp file may be a feasible hack. Not one I would recommend but a jank solution is a solution.
EarthyFeet@reddit
Well, numba is one package that does something like that, so check it out.
ttoommxx@reddit (OP)
I did use numba but it's a bit too restrictive. I often have to work with a blend of numpy objects and python objects and numba becames very hard to set up then, and often just does not work at all.
bkx@reddit
Numba does this
FloxaY@reddit
good luck