Just published part 2 of my articles on Python Project Management and Packaging, illustrated with uv

Posted by ReinforcedKnowledge@reddit | Python | View on Reddit | 12 comments

Hey everyone,

Just finished the second part of my comprehensive guide on Python project management. This part covers both building packages and publishing.

It's like the first article, the goal is to dig in the PEPs and specifications to understand what the standard is, why it came to be and how. This is was mostly covered in the build system section of the article.

The article: https://reinforcedknowledge.com/a-comprehensive-guide-to-python-project-management-and-packaging-concepts-illustrated-with-uv-part-2/

I have tried to implement some of your feedback. I worked a lot on the typos (I believe there aren't any but I may be wrong), and I tried to divide the article into three smaller articles: - Just the high level overview: https://reinforcedknowledge.com/a-comprehensive-guide-to-python-project-management-and-packaging-part-2-high-level-overview/ - The deeper dive into the PEPs and specs for build systems: https://reinforcedknowledge.com/a-comprehensive-guide-to-python-project-management-and-packaging-part-2-source-trees-and-build-systems-interface/ - The deeper dive into PEPs and specs for package formats: https://reinforcedknowledge.com/a-comprehensive-guide-to-python-project-management-and-packaging-part-2-sdists-and-wheels/

In the parent article there are also two smalls sections about uv build and uv publish. I don't think they deserve to be in a separate smaller article and I included them for completeness but anyone can just go uv help <command> and read about the command and it'd be much better. I did explain some small details that I believe that not everyone knows but I don't think it replaces your own reading of the doc for these commands.

In this part I tried to understand two things:

1- How the tooling works, what is the standard for the build backend, what it is for the build frontend, how do they communicate etc. I think it's the most valuable part of this article. There was a lot to cover, the build environment, how the PEP considered escape hatches and how it thought of some use cases like if you needed to override a build requirement etc. That's the part I enjoyed reading about and writing. I think it builds a deep understand of how these tools work and interact with each other, and what you can expect as well.

There are also two toy examples that I enjoyed explaining, the first is about editable installs, how they differ when they're installed in a project's environment from a regular install.

The second is customising the build process by going beyond the standard with custom hooks. A reader asked in a comment on the first part about integrating Pyarmor as part of its build process so I took that to showcase custom hooks with the hatchling build backend, and made some parallels with the specification.

2- What are the package formats for Python projects. I think for this part you can just read the high level overview and go read the specifications directly. Besides some subsections like explaining some particular points in extracting the tarball or signing wheels etc., I don't think I'm bringing much here. You'll obviously learn about the contents of these package formats and how they're extracted / installed, but I copy pasted a lot of the specification. The information can be provided directly without paraphrasing or writing a prose about it. When needed, I do explain a little bit, like why installers must replace leading slashes in files when installing a wheel etc.

I hope you can learn something from this. If you don't want to read through the articles don't hesitate to ask a question in the comments or directly here on Reddit. I'll answer when I can and if I can 😅

I still don't think my style of writing is pleasurable or appealing to read but I enjoyed the learning, the understanding, and the writing.

And again, I'l always recommend reading the PEPs and specs yourself, especially the rejected ideas sections, there's a lot of insight to gain from them I believe.

[-]

PerformanceSad5698@reddit

This is very useful thx alot!

[-]

ReinforcedKnowledge@reddit (OP)

Thank you!

[-]

notParticularlyAnony@reddit

Very cool stuff I found out about it from the uv dev that I follow on Twitter: https://x.com/charliermarsh

Great work btw -- are you purposely doing it anonymously? This seems like something you could ... be less discrete about :)

[-]

ReinforcedKnowledge@reddit (OP)

Hey thank you a lot for your comment! That's very motivating!

I'm not being discrete on purpose, but I will stay that way on purpose hahaha.

If you look at the older posts, you'd see articles on papers in machine learning. When I started the blog, it was a way for me to retain information. I was reading a lot of paper and putting in a lot of effort but after a year or two I'd forget a lot about them. And writing is one way to remember things well, and having it in a blog was a way to motivate myself to write and structure my thoughts. I never intended to share it, that's why if you look at my older posts on Reddit you won't find anything about those ML articles. (I did ask questions that relate to them though). But as I kept doing that, my friends told me to share it on Reddit. And that's how I got here today. So I started my blog with a name that made sense to me, why give it my name since it's not going to be shared or anything.

As for why I'd like to stay anonymous. I didn't think about it before until I've read your comment.This way, my articles are not tied to who I am. Which should be the case whether I'm anonymous or not since we're in a scientific field, but it's harder if you're not anonymous. I think it's easy to get defensive in the face of criticism if you put your name on it, while if I'm anonymous, it's just an internet persona, I'm sure to take all the feedback and criticism with no personal feelings and be guaranteed to improve from it 😁

To be honest I don't think I write because I know, I write because I want to know. So if you guys see anything wrong or faulty in what I write please tell me, don't leave me ignorant hahaha.

[-]

notParticularlyAnony@reddit

Very cool I get it.

[-]

ReinforcedKnowledge@reddit (OP)

Thank you!

[-]

not_a_novel_account@reddit

A package is either purelib or platlib, can’t be both.

This isn't true

A wheel with “Root-Is-Purelib: false” with all its files in {name}-{version}.data/purelib is equivalent to a wheel with “Root-Is-Purelib: true” with those same files in the root, and it is legal to have files in both the “purelib” and “platlib” categories.

Source: https://packaging.python.org/en/latest/specifications/binary-distribution-format/#what-s-the-deal-with-purelib-vs-platlib

[-]

ReinforcedKnowledge@reddit (OP)

You're totally right, thank you for pointing that out and sorry for my mistake. I made the correct edits!

[-]

not_a_novel_account@reddit

Not all the files, if all the files are pure you just say Root-Is-Purelib: true.

But say you have a bunch of files that are 100% identical between the pure and plat versions of a wheel, a high level API that builds on top of lower level methods.

If those files are identical, you don't want them to be replicated in multiple places in the PYTHONPATH and possibly become out-of-sync / incorrectly versioned if someone does something silly like install both the pure-python and platform-specific versions of a package.

The way to avoid this problem, to force an error to occur if someone does something like that on a platform that separates purelib and platlib, is to ensure that these identical pure-python files are always installed to purelib for both the pure-python and platform-extension versions of the wheel.

[-]

ReinforcedKnowledge@reddit (OP)

Thank you for your answer.

I totally agree with if you have Root-Is-Purelib false then you might have two folders, purelib and platlib (which I didn't understand at first but now I do).

Ok so the mention of "all file" was what threw me off because it seemed weird to have a Python package where all the source code is Python while you still required plat specific stuff.

I totally agree with how Root-Is-Purelib if false, I don't think the confusion comes from here. I mean, we can just look at how Numpy is structured. It was more the "all files" that confused me.

So if I understand your example well, you have two different wheels right. One that is pure Python right while the other is platform-specific. An issue might arise if you install both wheels in a platform that separates purelib and platlib. So the idea is to put the pure code of the plat-specific wheel inside the purelib folder, while keeping the rest in the platlib folder, that way whether you install the pure-Python or platform wheel, the non-platform specific code will always go into purelib.

[-]

not_a_novel_account@reddit

You've got it

[-]

ReinforcedKnowledge@reddit (OP)

Thank you a lot for your time and detailed explanations!