That simply isnt true unfortunately. Everyone likes to say this, but everyone also likes to use python as a middleman for everything.
For example, do you debug C/C++ code with GDB or LLDB? Congratz, the speed at which your debugger displays variables is tied to CPython's performance due to the visualizer scripts.
Yes it does. Debugging sucks. Waiting for your debugger sucks even more.
For small frames and small objects, the time spent in the visualizer doesn't matter much. For large frames, or visualizations of, say, a vector containing 10,000+ elements, that timewaste starts to become an issue. Especially if any of that data needs to be recalculated on a per-step basis.
Is this an acceptable wait time for a visualization to you?
Python is probably fine for basic 2D games without tons of animations, but I’m sure there is some library or even using LLMs you could convert the code if you wish. Dunno if it would be worth the effort tho
Yeah it's mostly fine. There's a realtime rewind feature though that absolutely noms up any CPU it can find though. I'm hoping to move to pypy soon when pygame supports it and I should be in a better position
Performance improvements is a recent initiative though. Lua shows that interpreted languages can fast. It’s just a matter of it not previously being a priority. Partly because slow code could easily be implemented in C based modules.
Comparing Lua and Python is like comparing C and C++. They are on completely different levels of complexity, and they have completely different use cases. One is a smartphone, the other is a dumb phone.
Lmao, my little BrofessorOfLogic, I think you underestimate just how similar C and C++ actually are. Not just at a keyword/syntax level where you can write programs that will compile in both, but that the resulting code will genuinely be identical.
Additionally. I am not writing an opinion piece for some influencer to read out on stream. This is the stated objective of the Python Foundation. As of Python 3.11 they have been working towards improving Python's performance. It's why they are interested in removing the GIL and implementing a JIT.
They are doing this because one thing is Smort and the other is Dumdum.
Damn what a condescending, pompous, and uneducated comment.
If you think C and C++ are the same just because they happen to have similar syntax on the basic stuff, then you clearly don't know what you are talking about.
It's not the basic stuff. It is the core fundamentals of the language specifications that are identical. Until some recent stuff in C23 then C++ was essentially a superset of C. What you're confusing for language features are C++'s standard libraries as some kind of increase in complexity of the underlying code. But they aren't. They are quite literally compatible with C code.
This isn't because C++ has some special sandboxing feature to marshal C calls in to a special form that C++ code can understand. It's because C++ is the same language with extra additions to handle classes. It's why C++ was originally called "C with classes". It's why Bjarne Stroustrup frequently mentions his close friendship with Dennis Ritchie and how he wants to maintain compatibility between the two code bases wherever possible because their interoperability is not only a pragmatic and practical benefit but also of sentimental value to him.
Ehh, it’s not just a matter of it being a priority. The extremely dynamic nature of Python inherently makes many things difficult/impossible to optimize unless you are willing to tolerate significant breaking changes to the language. I think they learned the hard way that most people are not.
Don’t get me wrong, it’s great that people are trying to make Python faster, but I wouldn’t expect it to ever compete favorably against a language designed for performance without all the legacy code baggage Python has at this point.
I say this as someone whose favorite language is Python and who works primarily in Python (but has professional experience with many others).
It was the same deal with Javascript and it eventually got fast.
Python recently got a JIT which so far does not do much but in time could optimise by not bothering with what you could do with the afforded flexibility but what you did do.
A JIT will trace to figure out what patterns you are really using, then compile optimized code for that, and place a barrier to ensure that it’s assumptions are not broken. If they happen to be, it’ll give you back the slow code while it figures out something else.
Javascript got faster because Google spent an insane amount of money through top end engineering talent to make V8 fast. Unless some megacorp decides to make a similar investment into python that has community buy in to let them do it, the odds of python seeing similar improvements is low (assuming it is even possible, the reasons each language are slow likely do not map 1:1 so there's a risk issues in python may be more problematic to solve, but impossible to be sure without the investment).
There's been several attempts at speeding up Python by various big companies. Google had unladen swallow, Dropbox had Pyston, Meta has Cinder, Microsoft had Faster CPython. As far as I know, only Cinder remains.
Because numpy is not written in Python bottlenecks don't show up in Python. Because you only glue together library calls. The glue code doesn't bottleneck. The code in the libraries does
That's the one. I couldn't find it amidst discussion about the recent 3.14 release. Hopefully they didn't get hit too hard with all the recent Microsoft lay offs.
Usually people that think this, never used languages like Smalltalk, SELF, Common Lisp, which are just as dynamic, with good compilation toolchains.
In case of Smalltalk and SELF, it was their research that eventually led up to the first JIT implementations in Java and JavaScript.
On Smalltalk, with its image based model (similarly in Lisps with same approach), anything can change at any time, after a break into the debugger and redo step.
Likewise the languages have the same capabilities as Python, to change on the fly any already compiled code during execution, even without stepping into the debugger.
I’m not saying the language can’t be made faster, I’m saying that I don’t think it’s practical to do that without subtly breaking little things in broadly used libraries. It would be one thing if the language was designed with APIs that were conducive to this, Python just wasn’t.
If it is possible to fix this, it will be because a major company decides to invest as much into Python as Google did into JavaScript and it will be a years-long effort for a very uphill battle for modest gains. I’m not saying it’s not theoretically impossible, but it would be a LOT easier if you could make breaking changes to the language. But that would result in no one using it.
Another problem with all of this is that if you really care about performance this much, it’s almost certainly better bang for your buck to just switch to Go or some other more performant language and go back to relying on Python for the glue.
The extremely dynamic nature of Python inherently makes many things difficult/impossible to optimize unless you are willing to tolerate significant breaking changes to the language
not necessarily true
look at a language like MATLAB, which is kind in the same weight class as python, with the same dynamic nature. when they introduced the JIT many years ago the improvement was very noticeable. all of a sudden you could write naive loops, and the JIT would run it at same speeds as if you had written cleverly vectorized code, among many other performance improvements in cases like dynamically expanding lists, etc.
this was also evident when you compare matlab to octave, a compatible open source implementation (and yes i know octave is also working on its own jit)
Real world Matlab isn’t extremely dynamic like Python is.99% of things that affect speed are either doubles, vectors of doubles or matrices of doubles. The jit was massively beneficial because interpreted Matlab was extremely slow for scalar code. Now it’s merely slow but that ends up being good enough for use cases where Matlab is the right tool. It’s still an order of magnitude or two slower than eg. straightforward C++ code for scalars.
matlab is just as dynamic really, i wouldn't consider it an "easier case" than python
to be clear, i am not referring to the typical use-case of handling matrices of numbers (which matlab excels at anyway), the equivalent being python + numpy filling that same role, in which case both are acting more or less as wrappers for a linear algebra library implemented in native code (BLAS, LAPACK, etc)
i am focusing on the part where both are interpreted and dynamic languages, and how a proper JIT can drastically improve performance
typically the bottleneck part in matlab code comes from the overhead of function calls. idiomatic matlab tends to be vectorized so as to reduce method calls; think SIMD (single instruction multiple data) with one call to process entire vector of data at once
so the Test1 example timing really came down to the tight-loop with millions of function calls to foo1 and foo2, and less so about the assignment itself.
and was shown in the result, when the "new" JIT backend was enabled, it picked up on the hotspot and just-in-time optimized those calls, and to be clear this blog post is from a decade ago so those benchmark numbers shown are not exactly up-to-date, their jit continued to improve since then
That's not what I mean. The Python Foundation themselves have stated that improving Python's performance is one of their immediate objectives. It has been a frequent subject at various conferences. There have been several initiatives towards this aim. Removing the GIL, adding JIT compilation, and many other such performance related system changes since Python 3.11.
Indeed. Python is still very hard to make fast, as its probably one of the most dynamic languges out there. Compare to, say, PHP thats basically a thin wrapper over C, that was really, really slow for decades (it only just recently got some perf improvements), or javascript thats also highly dynamic but has multi-billion corporations pouring money to optimize it as much as possible.
Sure, but you need to compare to something, and "all we got" is the four main types of langauges (i know there are hybrids) of interpreted, compiled to byte code (usually always comes with a GC), compiled to machine code with GC, and compiled to machine code without GC.
I know this is a over simplification, but you get the jist.
You can have lots and lots of simple and fast businesses logic, but the compound effect will still cause latency issues. Also deserialzation/serialization is a huge bummer. You want a language/platform which minimizes its own memory/cpu overhead as much as possible to build huge complex programs with large data interop. Python does not scale well for old monolith projects sadly.
Language is the bottleneck when hot loops, heavy serialization, and thread contention dominate; Python nails all three. Real case: 30–40k rps JSON API-json + Pydantic ate >60% CPU. Switching to orjson and msgspec halved CPU, then moving validation to a small Rust pyo3 module fixed tail latency; gRPC/Protobuf beat REST/JSON for cross-service chatter. For data, Polars + Arrow avoided Python loops; multiprocessing with shared_memory handled parallelism better than threads. With FastAPI and gRPC, I’ve also used DreamFactory to spin up DB-backed APIs without hand-rolled serializers. The fix is isolating hot paths and changing the runtime, not just the code structure.
I got it from working on Instagram. Instagram is backed by Python on the Django framework. There’s an underlying connection to a blue server but most of it is written in Python, including a decent amount of front-end code using Bloks.
Here are a few naive un-optimised single-thread #8 programs transliterated line-by-line literal style into different programming languages from the same original.
My company regularly sees substantial performance and density (needing fewer servers for the same number of requests) increases by simple translations of Python to Go services.
Any CPU bound code that needs to be threaded. But that's not always the problem, most of the time the problem is memory allocation and deallocation, slow start times for mission critical services, and more.
Parsing NeTEx xml files for EU transit information is pretty slow in python. But it is one of the only languages that has a good enough xsd gen library available that works with the incredibly complicated xsd's of NeTEx.
Given how slow Python is, pretty much any non-trivial algorithm meets this criterion for a large n. There is a reason why libraries such as numpy are only wrappers over compiled code.
I mean 3.11 is pretty old, 3 years to be exact. Something I wonder is, while there are usually a lot of cpython changes every release, a few stdlib changes etc, it's not like the core language is changing a lot. So why are they 3 years behind exactly?
PyPy maintainer here. First there are less than five of us, where CPython has dozens of contributors. Second, each version has many internal changes to the interpreter that we need to transpile from C into python. And third, the stdlib changes iften are in C extensions that also need to be transpiled to pyhton. For instance, 3.12 introduces deep changes to f string parsing, new syntax around debugging typing and decorators, and more. All of this requires lots of work that is not very rewarding.
What are your impressions of why the CPython team is struggling to achieve much of a performance improvement with JIT compared to PyPy?
Obviously PyPy is much more mature, but still...progress on the CPython JIT is very slow. Are they just laying groundwork which will one day explode into rapid progress or is it going to be an extremely slow process over many years?
PyPy started around 2003 exploring jitting technologies and hit its stride around 2010, 7 years later. It takes quite a while to figure out how to write a JIT, and even linger to tune the heuristics (you could say PyPy never finished this part. Antonio Cunio recently about wrote some of the trickier parts in a presentation to the CPython core developer sprint https://antocuni.eu/2025/09/24/tracing-jits-in-the-real-world--cpython-core-dev-sprint/
PyPy is European based, and CPython was historically been USA based. Also CPython preferred, for much of its history, simplicity over performance. This has changed lately. Antonio Cuni recently attended the CPython developer sprint https://antocuni.eu/2025/09/24/tracing-jits-in-the-real-world--cpython-core-dev-sprint/ and the new Py-Ni project is built on many of the ideas from HPy. The python-based CPython REPL came from PyPy.
PyPy of course leans in CPython for the stdlib and for the language spec. And many of the changes CPython makes do take PyPy into account: for instance there is a policy that any stdlib module based on C should have a pure-python version as well. But the interpreters are wildly different, so not much can transfer directly from one to the other
Thanks, that's great insight on the f-strings in particular. I do now recall that change being made (I think allowing recursive strings among other things?), but had totally forgotten about it.
Thank you for your outstanding work. I wish there was more support for PyPy. I flirted with it a few times, but for my data science use cases, it was better to use CPython with Numba.
It's three occasions to get a free perf upgrade, some nifty features, etc. Two of those (because 3.14 just came out) most users have probably already taken. They would have to give that up for a change of interpreter
It can only run on pure Python code, so any library made with C or another compiled language wont run on PyPy. There are others incompatibilities but this one is the biggest reason.
They also are slower to release updates, so a PyPy code is always some versions behind.
Also, JIT has the issue of needing the code to run a few times to optimize and compile it, so small scripts that are executed once or twice and then killed actually run worse on PyPy than normal Python
Now, they are bringing JIT to the standard Python but doing without breaking any compatibility, so it will take time to reach there.
PyPy can use the CPython C API but it is slower since at every transition from python to C the objects passed across must be recreated in C and synchronized. And the JIT cannot look inside C code. In many data applications python is not the performance bottleneck. Most of the processing time is spent in NumPy, PyTorch, or pandas that all leverage C/C++ for the heavy lifting.
By far the biggest cost in executing python is that for any operation, you need to first figure out what that operation even means.
i.e. a = b + c actually executes something like:
b is a dynamically typed object. (PyObject *)
load the type of b from a pointer in the object. turns out b is an int.
from the int type object, load the add method pointer.
execute the add method, which does:
check what type c is
if c isn't an int, load the type object from c, and from that load its int method
execute that, to get an int object (or an exception)
do the actual integer addition (keeping in mind that integers in python have arbitrary precision so be sure to check for possible overflow etc).
allocate a new PyObject for the result.
return it
And then I'm still ignoring a lot of things. like radd also being a thing, the different type rules which might apply, class resolution which might happen for subclasses.
And this is all done at runtime. Which is pretty slow, but also flexible.
Now what pypy does is something akin to:
hey this method usually gets called with its arguments being int.
What if we compile a version of this function that at the start checks if its arguments are just of type int, and in that case, just skips all the type stuff at runtime and just does the addition.
and in case anything diverges from that path (like an overflow, or an error), we fall back to the slow interpreted version.
It turns out that most time in any program is just spent in a small subset of functions, and those functions generally get called with arguments of the same type. So such an approach can end up saving a lot of time.
Now this is a very oversimplified explanation, the actual method in which pypy accomplishes this effect is weird. Essentially pypy isn't the actual JIT compiler even, it's an algorithm that takes as input an interpreter and then creates a JIT compiler from that interpreter. Which is then run over a python interpreter to create the final thing.
name@PC:/mnt/d/Programming/Projects/Testing$ uv python upgrade 3.14
warning: `uv python upgrade` is experimental and may change without warning. Pass `--preview-features python-upgrade` to disable this warning
Installed Python 3.14.0 in 1.98s
+ cpython-3.14.0-linux-x86_64-gnu
name@PC:/mnt/d/Programming/Projects/Testing$ uv run --python 3.14 python -m venv pivenv
name@PC:/mnt/d/Programming/Projects/Testing$ ls pivenv/bin/
Activate.ps1 activate activate.csh activate.fish pip pip3 pip3.14 python python3 python3.14 𝜋thon
name@PC:/mnt/d/Programming/Projects/Testing$ pivenv/bin/𝜋t hon
Python 3.14.0 (main, Oct 7 2025, 15:35:21) [Clang 20.1.4 ] on linux
Type "help", "copyright", "credits" or "license" for more information.
Oh yeah! I never conectes the dots, the architecture seems similar, otherwise it would be super hard to JIT compile a bunch of languages within the same engine.
At the time it was groundbreaking research, and RPython was based on the latest version of Python, 2.7. It led to a bunch of published papers around 2010.
A better question for me is "How quickly can I implement a problem". Development speed matters a whole lot more than execution speed, 90% of the time.
When execution speed matters that much, use different language, write the core loop in C or change your algorithm (adding function caching or something).
UnmaintainedDonkey@reddit
Still slow. Python has many things, but speed never was a key tenet.
kaen_@reddit
I don't think anyone with tight performance considerations is using CPython anyway
Anthony356@reddit
That simply isnt true unfortunately. Everyone likes to say this, but everyone also likes to use python as a middleman for everything.
For example, do you debug C/C++ code with GDB or LLDB? Congratz, the speed at which your debugger displays variables is tied to CPython's performance due to the visualizer scripts.
Ameisen@reddit
Debugger visualization doesn't have tight performance requirements.
Anthony356@reddit
Yes it does. Debugging sucks. Waiting for your debugger sucks even more.
For small frames and small objects, the time spent in the visualizer doesn't matter much. For large frames, or visualizations of, say, a vector containing 10,000+ elements, that timewaste starts to become an issue. Especially if any of that data needs to be recalculated on a per-step basis.
Is this an acceptable wait time for a visualization to you?
mr-figs@reddit
Regrettably I am for a game I'm working on
https://store.steampowered.com/app/3122220/Mr_Figs/
It was a bad choice about 4 years ago.
Looking back I'd choose Defold (game engine) or roll my own in C# or Haxe
DynamicHunter@reddit
Python is probably fine for basic 2D games without tons of animations, but I’m sure there is some library or even using LLMs you could convert the code if you wish. Dunno if it would be worth the effort tho
mr-figs@reddit
Yeah it's mostly fine. There's a realtime rewind feature though that absolutely noms up any CPU it can find though. I'm hoping to move to pypy soon when pygame supports it and I should be in a better position
qualia-assurance@reddit
Performance improvements is a recent initiative though. Lua shows that interpreted languages can fast. It’s just a matter of it not previously being a priority. Partly because slow code could easily be implemented in C based modules.
BrofessorOfLogic@reddit
Comparing Lua and Python is like comparing C and C++. They are on completely different levels of complexity, and they have completely different use cases. One is a smartphone, the other is a dumb phone.
qualia-assurance@reddit
Lmao, my little BrofessorOfLogic, I think you underestimate just how similar C and C++ actually are. Not just at a keyword/syntax level where you can write programs that will compile in both, but that the resulting code will genuinely be identical.
Additionally. I am not writing an opinion piece for some influencer to read out on stream. This is the stated objective of the Python Foundation. As of Python 3.11 they have been working towards improving Python's performance. It's why they are interested in removing the GIL and implementing a JIT.
They are doing this because one thing is Smort and the other is Dumdum.
BrofessorOfLogic@reddit
Damn what a condescending, pompous, and uneducated comment.
If you think C and C++ are the same just because they happen to have similar syntax on the basic stuff, then you clearly don't know what you are talking about.
qualia-assurance@reddit
It's not the basic stuff. It is the core fundamentals of the language specifications that are identical. Until some recent stuff in C23 then C++ was essentially a superset of C. What you're confusing for language features are C++'s standard libraries as some kind of increase in complexity of the underlying code. But they aren't. They are quite literally compatible with C code.
https://stackoverflow.com/questions/2744181/how-to-call-c-function-from-c
This isn't because C++ has some special sandboxing feature to marshal C calls in to a special form that C++ code can understand. It's because C++ is the same language with extra additions to handle classes. It's why C++ was originally called "C with classes". It's why Bjarne Stroustrup frequently mentions his close friendship with Dennis Ritchie and how he wants to maintain compatibility between the two code bases wherever possible because their interoperability is not only a pragmatic and practical benefit but also of sentimental value to him.
Putnam3145@reddit
C99 was when C++ stopped being a superset of C, if nothing else than with the restrict keyword, which is pretty useful on its own.
Ameisen@reddit
Though basically every C++ implementation supports
__restrict
.Amazing-Royal-8319@reddit
Ehh, it’s not just a matter of it being a priority. The extremely dynamic nature of Python inherently makes many things difficult/impossible to optimize unless you are willing to tolerate significant breaking changes to the language. I think they learned the hard way that most people are not.
Don’t get me wrong, it’s great that people are trying to make Python faster, but I wouldn’t expect it to ever compete favorably against a language designed for performance without all the legacy code baggage Python has at this point.
I say this as someone whose favorite language is Python and who works primarily in Python (but has professional experience with many others).
chat-lu@reddit
It was the same deal with Javascript and it eventually got fast.
Python recently got a JIT which so far does not do much but in time could optimise by not bothering with what you could do with the afforded flexibility but what you did do.
A JIT will trace to figure out what patterns you are really using, then compile optimized code for that, and place a barrier to ensure that it’s assumptions are not broken. If they happen to be, it’ll give you back the slow code while it figures out something else.
runevault@reddit
Javascript got faster because Google spent an insane amount of money through top end engineering talent to make V8 fast. Unless some megacorp decides to make a similar investment into python that has community buy in to let them do it, the odds of python seeing similar improvements is low (assuming it is even possible, the reasons each language are slow likely do not map 1:1 so there's a risk issues in python may be more problematic to solve, but impossible to be sure without the investment).
axonxorz@reddit
Like, perhaps, Microsoft employing GVR and a whole team with the express goal of making Python faster?
inkjod@reddit
That's the most unfortunate example, ever!
That entire team got fired this year, and in a particularly ugly way.
axonxorz@reddit
Yes, I am aware of that event.
My point was that megacorps are sponsoring this stuff in general. Look at the Ruby world and Shopify for a preview of what's to come.
anders987@reddit
There's been several attempts at speeding up Python by various big companies. Google had unladen swallow, Dropbox had Pyston, Meta has Cinder, Microsoft had Faster CPython. As far as I know, only Cinder remains.
mr_birkenblatt@reddit
Because the actual bottlenecks don't show up in python code
UnmaintainedDonkey@reddit
Then why is numpy not written in python?
mr_birkenblatt@reddit
Okay, you completely misunderstood my comment...
Because numpy is not written in Python bottlenecks don't show up in Python. Because you only glue together library calls. The glue code doesn't bottleneck. The code in the libraries does
anders987@reddit
Instagram runs on Django.
mr_birkenblatt@reddit
DB is written in C
qualia-assurance@reddit
That's the one. I couldn't find it amidst discussion about the recent 3.14 release. Hopefully they didn't get hit too hard with all the recent Microsoft lay offs.
anders987@reddit
It was cancelled four months ago.
https://www.linkedin.com/posts/mdboom_its-been-a-tough-couple-of-days-microsofts-activity-7328583333536268289-p4Lp/
qualia-assurance@reddit
NO, GOD! PLEASE, NO! NO! ... NO! NOOOOOOOOOO!
https://www.youtube.com/watch?v=umDr0mPuyQc
runevault@reddit
They put time and money in, but I never got the impression it rivaled what V8 got.
pjmlp@reddit
Usually people that think this, never used languages like Smalltalk, SELF, Common Lisp, which are just as dynamic, with good compilation toolchains.
In case of Smalltalk and SELF, it was their research that eventually led up to the first JIT implementations in Java and JavaScript.
On Smalltalk, with its image based model (similarly in Lisps with same approach), anything can change at any time, after a break into the debugger and redo step.
Likewise the languages have the same capabilities as Python, to change on the fly any already compiled code during execution, even without stepping into the debugger.
Amazing-Royal-8319@reddit
I’m not saying the language can’t be made faster, I’m saying that I don’t think it’s practical to do that without subtly breaking little things in broadly used libraries. It would be one thing if the language was designed with APIs that were conducive to this, Python just wasn’t.
If it is possible to fix this, it will be because a major company decides to invest as much into Python as Google did into JavaScript and it will be a years-long effort for a very uphill battle for modest gains. I’m not saying it’s not theoretically impossible, but it would be a LOT easier if you could make breaking changes to the language. But that would result in no one using it.
Another problem with all of this is that if you really care about performance this much, it’s almost certainly better bang for your buck to just switch to Go or some other more performant language and go back to relying on Python for the glue.
amroamroamro@reddit
not necessarily true
look at a language like MATLAB, which is kind in the same weight class as python, with the same dynamic nature. when they introduced the JIT many years ago the improvement was very noticeable. all of a sudden you could write naive loops, and the JIT would run it at same speeds as if you had written cleverly vectorized code, among many other performance improvements in cases like dynamically expanding lists, etc.
this was also evident when you compare matlab to octave, a compatible open source implementation (and yes i know octave is also working on its own jit)
SkoomaDentist@reddit
Real world Matlab isn’t extremely dynamic like Python is.99% of things that affect speed are either doubles, vectors of doubles or matrices of doubles. The jit was massively beneficial because interpreted Matlab was extremely slow for scalar code. Now it’s merely slow but that ends up being good enough for use cases where Matlab is the right tool. It’s still an order of magnitude or two slower than eg. straightforward C++ code for scalars.
amroamroamro@reddit
matlab is just as dynamic really, i wouldn't consider it an "easier case" than python
to be clear, i am not referring to the typical use-case of handling matrices of numbers (which matlab excels at anyway), the equivalent being python + numpy filling that same role, in which case both are acting more or less as wrappers for a linear algebra library implemented in native code (BLAS, LAPACK, etc)
i am focusing on the part where both are interpreted and dynamic languages, and how a proper JIT can drastically improve performance
typically the bottleneck part in matlab code comes from the overhead of function calls. idiomatic matlab tends to be vectorized so as to reduce method calls; think SIMD (single instruction multiple data) with one call to process entire vector of data at once
so the Test1 example timing really came down to the tight-loop with millions of function calls to foo1 and foo2, and less so about the assignment itself.
and was shown in the result, when the "new" JIT backend was enabled, it picked up on the hotspot and just-in-time optimized those calls, and to be clear this blog post is from a decade ago so those benchmark numbers shown are not exactly up-to-date, their jit continued to improve since then
mr_birkenblatt@reddit
The very article disproves your statement since the pypy version is as fast as the node version
mr_birkenblatt@reddit
It's exactly the same with JavaScript. And JavaScript got fast. That's the whole point of a JIT
qualia-assurance@reddit
That's not what I mean. The Python Foundation themselves have stated that improving Python's performance is one of their immediate objectives. It has been a frequent subject at various conferences. There have been several initiatives towards this aim. Removing the GIL, adding JIT compilation, and many other such performance related system changes since Python 3.11.
LinkSea8324@reddit
Someone please summon Mike Pall, we're out LuaJIT ammo and he's the only manufacturing the bullets
UnmaintainedDonkey@reddit
Indeed. Python is still very hard to make fast, as its probably one of the most dynamic languges out there. Compare to, say, PHP thats basically a thin wrapper over C, that was really, really slow for decades (it only just recently got some perf improvements), or javascript thats also highly dynamic but has multi-billion corporations pouring money to optimize it as much as possible.
oronimbus@reddit
Relax
ltjbr@reddit
Speed is relative. Often something needs to be just fast enough to not be a bottleneck.
Rewrites can be risky and costly, switching languages even more so.
Speed improvements are valuable, even if there are faster options out there.
tilitatti@reddit
to a slug, indeed a tortoise does seem to go "insanely fast!". concept of a cheetah (c++) would be orders of magnitude in absurd.
UnmaintainedDonkey@reddit
Sure, but you need to compare to something, and "all we got" is the four main types of langauges (i know there are hybrids) of interpreted, compiled to byte code (usually always comes with a GC), compiled to machine code with GC, and compiled to machine code without GC.
I know this is a over simplification, but you get the jist.
DonaldStuck@reddit
Show me code where the programming language is the bottleneck instead of the code
frostbaka@reddit
You can have lots and lots of simple and fast businesses logic, but the compound effect will still cause latency issues. Also deserialzation/serialization is a huge bummer. You want a language/platform which minimizes its own memory/cpu overhead as much as possible to build huge complex programs with large data interop. Python does not scale well for old monolith projects sadly.
Key-Boat-7519@reddit
Language is the bottleneck when hot loops, heavy serialization, and thread contention dominate; Python nails all three. Real case: 30–40k rps JSON API-json + Pydantic ate >60% CPU. Switching to orjson and msgspec halved CPU, then moving validation to a small Rust pyo3 module fixed tail latency; gRPC/Protobuf beat REST/JSON for cross-service chatter. For data, Polars + Arrow avoided Python loops; multiprocessing with shared_memory handled parallelism better than threads. With FastAPI and gRPC, I’ve also used DreamFactory to spin up DB-backed APIs without hand-rolled serializers. The fix is isolating hot paths and changing the runtime, not just the code structure.
frostbaka@reddit
You switched all to rust and still saying language is not a problem?
globalaf@reddit
Someone who does not work with services that scale to billions of users.
MisinformedGenius@reddit
I mean... Instagram is written in Python.
globalaf@reddit
I don't know where you got this from. I work at Meta and almost the entire backend infrastructure across the entire org is HHVM or C++.
MisinformedGenius@reddit
I got it from working on Instagram. Instagram is backed by Python on the Django framework. There’s an underlying connection to a blue server but most of it is written in Python, including a decent amount of front-end code using Bloks.
pojska@reddit
Sure - https://programming-language-benchmarks.vercel.app/
igouy@reddit
etc etc https://benchmarksgame-team.pages.debian.net/benchmarksgame/
igouy@reddit
Here are a few naive un-optimised single-thread #8 programs transliterated line-by-line literal style into different programming languages from the same original.
DonaldStuck@reddit
It's funny: this started with 20 upvotes and then went down the drain 😂
ketralnis@reddit (OP)
My company regularly sees substantial performance and density (needing fewer servers for the same number of requests) increases by simple translations of Python to Go services.
BlueGoliath@reddit
Wow that crazy.
Mysterious-Rent7233@reddit
How many examples do you want? I could give you thousands. If you re-implemented these in Python, the language would be the bottleneck:
https://github.com/torvalds/linux
https://github.com/pytorch/pytorch
https://github.com/mozilla-firefox/firefox
https://github.com/mooman219/fontdue
ZjY5MjFk@reddit
ugh, I just got the ick thinking about a linux kernel rewritten Python, lol
skarrrrrrr@reddit
Any CPU bound code that needs to be threaded. But that's not always the problem, most of the time the problem is memory allocation and deallocation, slow start times for mission critical services, and more.
space_keeper@reddit
People maybe don't know that memory is getting slower (access time vs. clock cycles), not faster, when you get into the nuts and bolts.
But writing cache-friendly code is not something that's really taught, and it's not really possible in a language where everything is a dictionary.
danted002@reddit
ABS and ESC in cars.
Coffee_Ops@reddit
Any attempts to meaningfully improve this will involve C#, because powershell is slow.
lurgi@reddit
While it's possible to write slow code in any language, it's not always possible to write fast code.
YellowBunnyReddit@reddit
Ok, write slow code in 0 then.
HavicDev@reddit
Parsing NeTEx xml files for EU transit information is pretty slow in python. But it is one of the only languages that has a good enough xsd gen library available that works with the incredibly complicated xsd's of NeTEx.
CanadianTuero@reddit
I write tree search algorithms for my research, and you can get several orders of magnitude in increased speed by going to C++ (which is what I do)
msqrt@reddit
Anything you can do on a GPU; not all languages run there efficiently (or at all.)
CorrectProgrammer@reddit
Given how slow Python is, pretty much any non-trivial algorithm meets this criterion for a large n. There is a reason why libraries such as numpy are only wrappers over compiled code.
Keizojeizo@reddit
Seriously
TheCalming@reddit
With pypy being as fast what is blocking users in making it de defacto standard? Does it have compatibility issues?
ketralnis@reddit (OP)
Yes, it's not fully compatible with C extensions which is a huge part of the python library ecosystem
KarnuRarnu@reddit
I mean 3.11 is pretty old, 3 years to be exact. Something I wonder is, while there are usually a lot of cpython changes every release, a few stdlib changes etc, it's not like the core language is changing a lot. So why are they 3 years behind exactly?
pmatti@reddit
PyPy maintainer here. First there are less than five of us, where CPython has dozens of contributors. Second, each version has many internal changes to the interpreter that we need to transpile from C into python. And third, the stdlib changes iften are in C extensions that also need to be transpiled to pyhton. For instance, 3.12 introduces deep changes to f string parsing, new syntax around debugging typing and decorators, and more. All of this requires lots of work that is not very rewarding.
Mysterious-Rent7233@reddit
Thanks for your hard and often thankless work!
What are your impressions of why the CPython team is struggling to achieve much of a performance improvement with JIT compared to PyPy?
Obviously PyPy is much more mature, but still...progress on the CPython JIT is very slow. Are they just laying groundwork which will one day explode into rapid progress or is it going to be an extremely slow process over many years?
pmatti@reddit
PyPy started around 2003 exploring jitting technologies and hit its stride around 2010, 7 years later. It takes quite a while to figure out how to write a JIT, and even linger to tune the heuristics (you could say PyPy never finished this part. Antonio Cunio recently about wrote some of the trickier parts in a presentation to the CPython core developer sprint https://antocuni.eu/2025/09/24/tracing-jits-in-the-real-world--cpython-core-dev-sprint/
kloudrider@reddit
Why aren't CPython maintainers collaborating with you? pypy has insane speedups for pure python, and should be the default for Python
pmatti@reddit
PyPy is European based, and CPython was historically been USA based. Also CPython preferred, for much of its history, simplicity over performance. This has changed lately. Antonio Cuni recently attended the CPython developer sprint https://antocuni.eu/2025/09/24/tracing-jits-in-the-real-world--cpython-core-dev-sprint/ and the new Py-Ni project is built on many of the ideas from HPy. The python-based CPython REPL came from PyPy.
PyPy of course leans in CPython for the stdlib and for the language spec. And many of the changes CPython makes do take PyPy into account: for instance there is a policy that any stdlib module based on C should have a pure-python version as well. But the interpreters are wildly different, so not much can transfer directly from one to the other
kloudrider@reddit
Thank you.
KarnuRarnu@reddit
Thanks, that's great insight on the f-strings in particular. I do now recall that change being made (I think allowing recursive strings among other things?), but had totally forgotten about it.
QuickQuirk@reddit
Thanks for the insights. Appreciate the work, and I especially feel for you around doing the work that is often not rewarding.
-lq_pl-@reddit
Thank you for your outstanding work. I wish there was more support for PyPy. I flirted with it a few times, but for my data science use cases, it was better to use CPython with Numba.
polacy_do_pracy@reddit
3 years is not old
KarnuRarnu@reddit
It's three occasions to get a free perf upgrade, some nifty features, etc. Two of those (because 3.14 just came out) most users have probably already taken. They would have to give that up for a change of interpreter
WJMazepas@reddit
It can only run on pure Python code, so any library made with C or another compiled language wont run on PyPy. There are others incompatibilities but this one is the biggest reason.
They also are slower to release updates, so a PyPy code is always some versions behind.
Also, JIT has the issue of needing the code to run a few times to optimize and compile it, so small scripts that are executed once or twice and then killed actually run worse on PyPy than normal Python
Now, they are bringing JIT to the standard Python but doing without breaking any compatibility, so it will take time to reach there.
pmatti@reddit
PyPy can use the CPython C API but it is slower since at every transition from python to C the objects passed across must be recreated in C and synchronized. And the JIT cannot look inside C code. In many data applications python is not the performance bottleneck. Most of the processing time is spent in NumPy, PyTorch, or pandas that all leverage C/C++ for the heavy lifting.
Mysterious-Rent7233@reddit
That is definitely not true!
https://doc.pypy.org/en/latest/faq.html#do-c-extension-modules-work-with-pypy
BadlyCamouflagedKiwi@reddit
I don't think that's the case any more? Haven't used it for a while but they claim to support C extensions - maybe not quite completely and they are slower: https://doc.pypy.org/en/latest/faq.html#do-c-extension-modules-work-with-pypy
TheCalming@reddit
That makes sense. I don't know if there's anyone running python without a dependency that calls c or fortran.
chicknfly@reddit
Irrationally fast
BlueGrovyle@reddit
At last, Pithon has arrived.
Snoron@reddit
Wait, how the hell is pypy so fast!?
censored_username@reddit
Specializing JIT compiler.
By far the biggest cost in executing python is that for any operation, you need to first figure out what that operation even means.
i.e.
a = b + c
actually executes something like:And then I'm still ignoring a lot of things. like radd also being a thing, the different type rules which might apply, class resolution which might happen for subclasses.
And this is all done at runtime. Which is pretty slow, but also flexible.
Now what pypy does is something akin to:
hey this method usually gets called with its arguments being int.
What if we compile a version of this function that at the start checks if its arguments are just of type int, and in that case, just skips all the type stuff at runtime and just does the addition.
and in case anything diverges from that path (like an overflow, or an error), we fall back to the slow interpreted version.
It turns out that most time in any program is just spent in a small subset of functions, and those functions generally get called with arguments of the same type. So such an approach can end up saving a lot of time.
Now this is a very oversimplified explanation, the actual method in which pypy accomplishes this effect is weird. Essentially pypy isn't the actual JIT compiler even, it's an algorithm that takes as input an interpreter and then creates a JIT compiler from that interpreter. Which is then run over a python interpreter to create the final thing.
LinkSea8324@reddit
Luke gorrie (u/lukego) got a nice video on (lua/raptor)jit specialization (now unlisted)
https://www.youtube.com/watch?v=Kds7TUnWOvY
Side note, i was watching his video when I was a student, now i'm working in AI and he works for Anthropic, funny
JanEric1@reddit
This part isnt true, there is no implicit conversion in integer addition.
What it does iis that it tried the
__add__
method ofb
and if the returns (or raises?)NotImplemented
then it tries the__radd__
method ofc
.class A: def radd(self, other) -> int: return other + 5
class A: def int(self, other) -> int: return other + 5
censored_username@reddit
You're correct, I remember the conversation rules wrong.
JanEric1@reddit
Yeah, there was a talk at europython this year from a PyPy maintainer that went through exactly that.
Maybe-monad@reddit
JIT
coffeelibation@reddit
Pi thon
greebo42@reddit
Did they miss an opportunity to name it Pithon?
chat-lu@reddit
Adding a
pithon
executable was considered but rejected because of the long term maintenance burden.JanEric1@reddit
But there is a
πthon
just for this release iircchat-lu@reddit
Nope, this is all I have:
idle3, idle3.14, pip, pip3, pip3.14, pydoc3, pydoc3.14, python, python3, python3-config, python3.14, python3.14-config
But you can create the symlink yourself.
JanEric1@reddit
Aww ;(
I guess this got reverted/dropped then at some point
chat-lu@reddit
I guess it didn't work for me because I only looked at what uv gave me.
JanEric1@reddit
Yeah, just doing a "uv venv" also didnt work for me.
JanEric1@reddit
Also works in the proper released version:
romulof@reddit
PyPy is one of the weirdest projects ever.
It freaking works 🤣
pjmlp@reddit
See GraalVM, it grew out of research projects like MaximeVM and JikesRVM, with similar ideas.
romulof@reddit
Oh yeah! I never conectes the dots, the architecture seems similar, otherwise it would be super hard to JIT compile a bunch of languages within the same engine.
pmatti@reddit
At the time it was groundbreaking research, and RPython was based on the latest version of Python, 2.7. It led to a bunch of published papers around 2010.
csorfab@reddit
Jesus christ almighty
RealSharpNinja@reddit
Wait for 3.141, it will be more accurate.
wermaster1@reddit
Fast like a PI :-)
not_from_this_world@reddit
A well rounded version.
I'm heading out.
Surprised_Bunny_102@reddit
Infinite possibilities with this version.
-lq_pl-@reddit
Why is MacOS faster than Linux? I feel offended.
amroamroamro@reddit
two different laptops were used in the tests
romulof@reddit
It’s barely doing syscalls. Performance should be the same.
_xiphiaz@reddit
There is no attempt to have the hardware be equivalent, so comparison between these systems isn’t the point.
ketosoy@reddit
Have we all agreed to call the release pithon?
If not, can we?
aiij@reddit
We'd have to switch to TeX-style version numbering then.
dr_wtf@reddit
Mmmmmm 3.14 Pie (thon)
kalerne@reddit
Should have been Pithon for this release
Porkenstein@reddit
πthon
muntoo@reddit
πtuna
muntoo@reddit
*Pituna
LiberContrarion@reddit
Had to scroll FAR too far.
labbel987@reddit
So... PiThon?
TheDevilsAdvokaat@reddit
Python Pi ?
mkawick@reddit
Pi-thon
Sopel97@reddit
it's the last thing python should be marketing, yet it's pretty much the only one I ever see
sky3mia@reddit
π-thon
greg_d128@reddit
A better question for me is "How quickly can I implement a problem". Development speed matters a whole lot more than execution speed, 90% of the time.
When execution speed matters that much, use different language, write the core loop in C or change your algorithm (adding function caching or something).