`safer`: a tiny utility to avoid partial writes to files and streams

Posted by HommeMusical@reddit | Python | View on Reddit | 34 comments

What My Project Does

In 2020, I broke a few configuration files, so I wrote something to help prevent breaking a lot the next time, and turned it into a little library: https://github.com/rec/safer

It's a drop-in replacement for open that only writes the file when everything has completed successfully, like this:

with safer.open(filename, 'w') as fp:
    fp.write('oops')
    raise ValueError
 # File is untouched

By default, the data is cached in memory, but for large files, there's a flag to allow you to cache it as a file that is renamed when the operation is complete.

You can also use it for file sockets and other streams:

try:
    with safer.writer(socket.send) as send:
          send_bytes_to_socket(send)
except Exception:
     # Nothing has been sent
     send_error_message_to_socket(socket.send)

Target Audience

This is a mature, production-quality library for any application where partial writes are possible. There is extensive testing and it handles some obscure edge cases.

It's tested on Linux, MacOS and Windows and has been stable and essentially unchanged for years.

Comparison

There doesn't seem to be another utility preventing partial writes. There are multiple atomic file writers which solve a different problem, the best being this: https://github.com/untitaker/python-atomicwrites

Note

#noAI was used in the writing or maintenance of this program.

[-]

Wargazm@reddit

noAI was used in the writing or maintenance of this program.

haha is this a thing now?

[-]

HommeMusical@reddit (OP)

I mean, AI didn't exist when i wrote it, so it's a bit like putting "Low Fat!" on Corn Flakes.

But yes, mainly because everyone complains about the quality of the AI slop showcases here.

[-]

dj_estrela@reddit

Agentic AI ia making this obsole really fast

[-]

HommeMusical@reddit (OP)

I would ask you to explain, except I'm entirely certain you would be unable to.

Go away.

[-]

dj_estrela@reddit

Seems I hit a sensitive nerve here

Please, learn something: https://realpython.com/courses/getting-started-claude-code/

[-]

HommeMusical@reddit (OP)

Please note that I was entirely correct: you were complete unable to explain your comment.

Seems I hit a sensitive nerve here

🤡

Hardly! Tell me - why is it that AI enthusiasts seem to always want to annoy others? Do you think this is sane, or the sort of thing that makes the world better?

Please, learn something:

You are not a person who is going to teach me anything of use, and there's nothing in that article I didn't know years ago.

Have you ever read any code written by AIs? Have you not noticed that they makes heavy use of existing modules like this one?

Your combination of arrogance and ignorance is not felicitous. Please go away now.

[-]

BossOfTheGame@reddit

I'm not really sure what they meant by agentic coding making an existing module obsolete. But I wanted to comment about AI systems using modules like this. My experience is that they often underutilize existing libraries unless they are extremely mainstream. They seem to be biased towards stdlib only implementations, which I suppose can have advantages. It does lower the dependency surface, but also increases The amount of code that you have to trust has been implemented correctly. I often wish that agents would use third party libraries more often.

That being said, I suppose others would view me as an AI enthusiast. I also think there's a lot of negative baggage because it's able to be used blindly - among other reasons. I often feel like people assign that baggage to me and then shit on me for it if I give a hint of positivity towards LLMs. I also think that people who are appalled by the sociological implications of LLMs and thus refuse to use them are doing themselves of disservice. LLMs are amplifying pre-existing issues, and I think pro-social-minded people could benefit by using them to find ways to solve or mitigate the problems.

If you haven't used them extensively, they do have a non trivial learning curve, and I think the shallowness of that curve has tricked people into thinking it doesn't exist. I also think they haven't been around long enough for anyone to have found and climbed the steep part of that curve yet.

[-]

HommeMusical@reddit (OP)

I also think there's a lot of negative baggage because it's able to be used blindly

What about the fact that its supporters say that it's going to take all our jobs? That's negative baggage, surely.

The fact that many of the most important people in the field

[-]

BossOfTheGame@reddit

Yes, it's all negative baggage. There are too many people holding the entire topic in contempt because of the sociological issues it is intertwined with.

The environmental cost is on the order of magnitude of personal non-commute travel. It's real, it needs to be addressed. AI psychosis is a solvable problem.

For the power issue... I do feel somewhat powerless around it. I'm somewhat hopeful that open weight models will work to decentralize the power. Right now, I'm not happy with the centralization.

p(doom) is non-zero, but there is much more disagreement among professionals in the field: https://aiimpacts.org/wp-content/uploads/2024/01/EMBARGOED_-AI-Impacts-Survey-Release-Google-Docs.pdf

The "take our jobs" is a bit of a reduction. It's going to change the way we work, and what problems are important for us to spend our time on. That's not the bad thing. What's bad is that we have organized ourselves into a system that is willing to discard instead of support people. This has been bad before AI, but AI is exacerbating it, but might also finally force us to change.

So yes, negative baggage exists, but that doesn't imply that all use is bad or that thoughtful people shouldn't engage with the technology. If the only people willing to use or shape these systems are centralized firms and bad actors, that seems more likely to worsen the power problem than solve it. $0.02

I'd be happy to discuss more.

[-]

dj_estrela@reddit

Obviously, you are now right.

You lost the reason when you went to a personal attack

[-]

BossOfTheGame@reddit

honestly as an outside observer, when you said: "please learn something", that's when the conversation derailed. And I'm an advocate for agentic coding.

[-]

HommeMusical@reddit (OP)

Because you're rude. Go away.

[-]

glenrhodes@reddit

Atomic writes via tmp file + rename have saved me more than once on long pipeline outputs. The edge case worth watching: NFS mounts where the rename isn't atomic either. You're just trading one race for another on some shared filesystems.

[-]

dairiki@reddit

Tangential Note: atomicwrites is deprecated by its author. Its git repo has not seen any updates in four years. As far as I know, it still works, but the situation does not give warm fuzzies for use in new code.

[-]

Golle@reddit

Nice find.

I dont see the problem as a particularly advanced one either. If your program has a chance of crashing when it writes to the filez it is likely you are doing more in the code. Maybe do all the processing first and only write to file when all data has been processed?

Or, just write to a new file while the program is running. If write succeeds, remove old file and rename the new file to the old name.

Neither of these solutions require a third party library.

[-]

rachel_rig@reddit

A lot of tiny libs are really just paying down the boring edge cases once instead of every app half-reimplementing them. `write temp then rename` sounds simple right up until you want it to behave the same way across platforms and streams.

[-]

fireflash38@reddit

Why would it need to change?

[-]

bboe@reddit

It's a supply chain risk if the owner's Pypi account is compromised. It seems like they previously did not believe MFA is worth it on their account: https://github.com/untitaker/python-atomicwrites/issues/61

[-]

fireflash38@reddit

That's a change.

[-]

Grintor@reddit

Good point. I wanted to point out here though that you can eliminate the supply chain risk using version pinning with hashes. Using hashes also takes care of the supply chain risk for if pypi itself is compromised, so it's worth doing anyway.

[-]

fiskfisk@reddit

The main issue with abandoned packages are that the author might not be aware if a trojaned replacement gets published by their account being taken over. While it won't be installed in your current project because of the hash, you might discover (through an upgrade or something like dependabot) that a new package has arrived and then just install it .. and since nobody notices, maybe it survives out in the wild for a week or two or three.

The best thing would probably be for package systems like pypi to support a "this project has been abandoned, so no new versions can be published to its name".

[-]

grumps@reddit

How is abandoned label going to help? People just pip install and forget

[-]

fiskfisk@reddit

It means that anyone taking over the pypi account of an inactive maintainer won't be able to publish a new version of the package, since it has been marked as archived and dead. We're protecting against unmaintained packages becoming vectors by account takeover.

It would also allow pip to say "eeeeh, nobody maintains this package any longer, use at your own risk" in a systematic way, including giving you the option of scanning your dependency tree for such packages.

[-]

bboe@reddit

The best thing would probably be for package systems like pypi to support a "this project has been abandoned, so no new versions can be published to its name".

That approach seems like a great idea.

[-]

grumps@reddit

2022 and someone complains about SOC2 and MFA. Ya no fucking thanks I got zerooooo fucking interests in the code. Why publish? TBH the package should be removed and blocked by pypi it’s a disaster waiting to happen. Even more so since it’s now been made clear on social media that it’s ripe for the pickings. Bet this asshats password is IamChump

[-]

Rainboltpoe@reddit

Because your customer has stupid security rules that forbid you from using dependencies that are no longer being actively maintained, and stupid business politics prevents you from getting an exception approved.

[-]

ultrathink-art@reddit

Corrupted state files from partial writes are sneaky — the crash happens during the write but the error surfaces on the next run, often in a completely unrelated place. I started using this pattern for config files in long-running automation after a partial write created a valid-looking-but-truncated JSON file that caused a baffling 'unexpected EOF' error 3 runs later.

[-]

latkde@reddit

Interesting. I'm not entirely sure I understand the benefits of this library? What does this library do that the following approach does not (aside from handling both binary and text streams)?

@contextlib.contextmanager
def write_if_success(real_fp: io.Writer[bytes]) -> Generator[IO[bytes]]:
    b = io.BytesIO()
    yield b
    real_fp.write(b.getbuffer())

with (
    open(filename, "wb") as real_fp,
    write_if_success(real_fp) as f,
):
    f.write(...)
    ... # fail here, maybe
    f.write(...)

I'm not trying to diminish your effort, I'm trying to understand the tradeoffs of re-implementing something well-established versus adding yet another dependency.

It's tested on Linux, MacOS and Windows

There is however no link to test results on the GitHub page (I was trying to find test coverage data). There is a Travis CI configuration that claims to upload to Codecov, but the last results on both platforms are 4 years old. (Travis CI, Codecov).

[-]

ROFLLOLSTER@reddit

real_fp.write(b.getbuffer())

iirc over 4,096 bytes this will be broken up into multiple writes, breaking atomicity. There's also the general fact that even a single write is not guaranteed to be atomic in unix, some messy details here.

[-]

latkde@reddit

Absolutely, but OP's library is only about Python-level exception safety. It explicitly does not provide atomic writes.

OP's safer library is a bit more correct than my sketch in that it will perform multiple write() calls if necessary (unless the underlying stream is in nonblocking mode).

[-]

FiniteWarrior@reddit

So, I understand what is solves, and that it's shorter to write than the standard atomic pattern, along the lines of:

```
try:

with os.fdopen(fd, "w", encoding="utf-8") as f:

memory_parser.write(f)

f.flush()

os.replace(temp_path, my_file)

except Exception as e:

if os.path.exists(temp_path):

os.remove(temp_path)

raise e
```
But why is importing your library for that any better?

[-]

BossOfTheGame@reddit

I've been using safer for years. I use it whenever I'm writing a system that writes large files. I love never having to deal with corrupted data. Process crashed? Great, there are no artifacts that would confuse other code into thinking that it worked when it didn't. It let's me use exist checks in pipeline systems and feel confident about it.

It's a great library. Thank you for writing and maintaining it.

[-]

HommeMusical@reddit (OP)

Well, you have fair made my day. <3

You might also like https://github.com/rec/tdir, which I end up using in almost every project in tests somewhere or other.

If you are ever in Rouen, France, drop in and we'll share a beverage or sustenance!

[-]

BossOfTheGame@reddit

My design philosophy around temporary directories and tests is to use application cache sub directory, e.g. ~/.cache/{appname}/tests/{testname}, and I do this via passing explicit directory-paths around. I never assume running in a cwd (I dislike software that requires you run it from a specific directory). And to do this I use ubelt (my utility lib that I take everywhere) and the pattern dpath = ubelt.Path.appdir(appname, 'tests', testname).delete().ensuredir().

It's not the cleanest test paradigm, but it does make it a lot easier to inspect failures, and I probably should have a post test cleanup that just blows away ubelt.Path.appdir(appname, 'tests'), but I sort of just rely on CI to do that.

It also prevents extra indentation in doctests, and even though xdoctest makes indentation less painful, it's still non-zero pain.

There's a fair bit of water between me and France, but if I'm in the area, I'll reach out.