Speeding up PyTest by removing big libraries
Posted by kesor@reddit | Python | View on Reddit | 24 comments
I've been working on a small project that uses "big" libraries, and it was extremely annoying to have pytest
to take 15–20 seconds to run 6 test cases that were not even doing anything.
Armed with the excellent PyInstrument I went ahead to search for what was the reason.
Turns out that biggish libraries are taking a lot of time to load, maybe because of the importlib
method used by my pytest
, or whatever.
But I don't really need these libraries in the tests … so how about I remove them?
# tests/conftest.py
import sys
from unittest.mock import MagicMock
def pytest_sessionstart():
sys.modules['networkx'] = MagicMock()
sys.modules['transformers'] = MagicMock()
And yes, this worked wonders! Reduced the tests to less than 1 second from pytest
command to results, usually less than half a second.
I would have loved to remove sqlalchemy
as well, but unfortunately sqlmodel
is coupled with it so much it is inseparable from the models based on SQLModel
.
Would love to hear your reaction to this kind of heresy.
Inside_Dimension5308@reddit
I cant even understand what you people are discussing. Are you writing unit tests or integration tests? We have 300 unit tests written for a service and it takes less than 5s to run without doing any of the things you mentioned.
If you are writing integration tests, I maybe wrong. It is better to profile your runtime using profilers to understand what is happening at the lower layers.
kesor@reddit (OP)
Unit tests. A unit test that tests just a single function, which is supposed to take 100ms to run from start to finish. But the function sits inside a file that has "
import transformers
" in it, and importing from that file makespytest
take 20 seconds instead of 100ms.If you don't know what I'm talking about, you are not using Python and libraries.
dubious_capybara@reddit
Doesn't matter what type of tests, just anything that loads large packages like matplotlib, numpy, ml libraries typically take seconds just to load the imports.
Inside_Dimension5308@reddit
That is where you are wrong. You should probably understand how unit tests are written. You are not testing the library but your code. No library should be loaded for unit tests. Everything should be mocked outside your code. Integration tests on the other hand might require libraries because you are not going to mock them and actually run it on actual models.
dubious_capybara@reddit
🙄 Nobody employed writes unit tests like that
Inside_Dimension5308@reddit
I mean I gave you the logic behind writing unit tests. You can disagree with it, doesn't change the facts.
dubious_capybara@reddit
It's not a fact, it's an opinion, and a fairly autistic one at that.
Inside_Dimension5308@reddit
Ok.
BossOfTheGame@reddit
Lazy imports could solve a lot of the startup speed problems.
Malcolmlisk@reddit
Can you explain further what do you mean by lazy imports?
Improvotter@reddit
Meta also has Cinder, a CPython fork, with lazy imports (and more). Lazy imports were proposed for CPython I think but not accepted. It was also an example of a JIT compiler. It influenced some Python 3.12 and 3.13 changes.
latkde@reddit
Instead of
you can often say:
This avoids importing the library until it's actually needed. Highly recommended if you have heavy-weight dependencies that you don't always needed. This is almost always a performance improvement.
Contra-indications:
Specifically for type annotations, it's possible to import a module only for type-checkers, but not at runtime:
However, that will break if you perform some kind of reflection that has to evaluate the type annotations. Notably, this cannot work with Pydantic.
frosty122@reddit
For that first contraindication , You can add a commented import statement at the top of your file as a stand in for your lazy imports.
clitoreum@reddit
I'm not certain but I think they mean importing libraries later in the code - rather than all at the start. You can use
library = __import__("library-name")
too, although I'm not sure if there's any benefit.BossOfTheGame@reddit
No benefit there. That effectively uses the same underline import mechanism. See my other post for how you can define a module with lazy imports.
BossOfTheGame@reddit
Well the other responses are fine I was thinking of
https://pypi.org/project/lazy-imports/
Which utilizes the module level getattr to only import a library when you need it.
I've made a reasonably popular library that helps defining such a lazy init file easy:
https://pypi.org/project/mkinit/
kesor@reddit (OP)
They might solve the startup speed problems, but having the big library there will still include the time to load it at some point in the tests. So assuming I don't really want/need to test this library during my tests, would it really matter much if I lazy load or not lazy load it? Unless I mock it away (and then lazy/non-lazy doesn't matter) the time is still going to be there.
BossOfTheGame@reddit
With lazy imports, the cost is only paid if it is explicitly used, in which case - ideally - they would be needing it.
If you're able to mock it out and it doesn't cause errors, then they aren't explicitly using it, so lazy imports would address the issue.
kesor@reddit (OP)
I think my point in the OP was that I don't need, or want (even inadvertently) to use these libraries. Which is why mocking them out solves the problem. I don't need to busy myself with moving all the import statements into all the methods/functions that might need to use them — or to include magic libraries that do magic things like described in PEP. 690.
BossOfTheGame@reddit
I'm saying that pytest should use lazy imports. That would mean you only pay the penalty as a user if you use a functionality explicitly.
wineblood@reddit
Or just refactor the code that uses those into its own module and mock that in all the tests that don't use it directly.
kesor@reddit (OP)
Why?
How is this:
different, or better than this:
?
Ok_Expert2790@reddit
Side effects could be crazy tho if not careful
kesor@reddit (OP)
I wouldn't recommend doing this for a big project that already has hundreds of tests. Things will break, for sure. But starting out with removed libraries, when you don't yet have tests, I guess it is a way to enforce not to use real LLM invocations during your unit tests (in case of transformers library).