Formatting an entire 25 million line codebase overnight: the rubyfmt story
Posted by BlondieCoder@reddit | programming | View on Reddit | 34 comments
Posted by BlondieCoder@reddit | programming | View on Reddit | 34 comments
oliver_extracts@reddit
the technical part (using .git-blame-ignore-revs) is the easy half. harder problem for most teams is selling the overnight reformat internally. ive seen smaller reformats stall for months because someones tool depends on the existing whitespace, or someones mental model of the file is tied to specific line numbers, or theres a PR queue with hundreds of open PRs that would all need rebasing.
stripe doing this at 25M lines overnight means they probably had a coordination layer most teams underestimate. blog posts make it sound clean, the actual political work to get there is usually months.
tj-horner@reddit
That’s… hmm. I have many questions
oliver_extracts@reddit
haha yeah, hard to believe until you've worked with someone who does it. ive worked with engineers who refer to files by line numbers in their head, like "the bug is around line 240 in user_service.rb" and they'll navigate there directly without searching. reformat hits, line 240 is now something else, their mental map breaks and they're noticeably slower for a few weeks until they rebuild it.
more common with vim/emacs users who navigate by line number a lot. less of a thing for vscode-mouse-navigation people. but ive seen it block reformats more than once.
mahreow@reddit
That's dumb and anyone who blocks a reformat because of it is dead weight and should be fired
CherryLongjump1989@reddit
In my experience they will get upset if anyone but them tries to modify their file. Also just in my experience, they tend to be attracted to very small but critical niches that require lots of maintenance. But that also makes them more difficult to fire.
bloodwhore@reddit
This isn't someone you want in your company anyway. Get rid of them ASAP.
TheAlaskanMailman@reddit
Waitt… so people usually don’t have a mental image of the file and where everything is?
I use vim but editor should be irrelevant here.
How do you guys navigate to “that” part without knowing the shape of the file?
jxddk@reddit
Also a (Neo)vim user, I'm either navigating through semi-permanent marks (e.g.
'Udrops me at theUserclass in most projects), or by LSP workspace symbols (e.g. fuzzy-findingclass Us). I have a rough idea of what line number ranges are interesting but my mental model for the "shape" of the file is very much based on symbols rather than linebreaks.CherryLongjump1989@reddit
Most projects have a User class?
ZorbaTHut@reddit
ctrl-f functionname
Or more likely, ctrl-shift-f functionname. Who needs line numbers when you can just go to the right code, even if it's been moved?
TheAlaskanMailman@reddit
Yeah. That’s the LSP’s job. But if you just want to go to a specific part directly, that’s what I’m talking about
BigHandLittleSlap@reddit
Ctrl-click.
ShinyHappyREM@reddit
And perhaps old oldschool
BASICprogrammers.oliver_extracts@reddit
haha for them line numbers literally ARE the syntax. they figured this out decades before the rest of us.
franklindstallone@reddit
That’s someone you get rid of
obetu5432@reddit
the real /r/programminghorror was using ruby in the first place, in the year of our Lord 2010+16
_BreakingGood_@reddit
honestly its hard to make that argument, this is one of the most reliable and robust saas services in existence. arguably the most reliable.
Freeky@reddit
With enough thrust even a brick is a viable aircraft. Stripe certainly brought the lbf.
qmunke@reddit
All dynamically typed languages end up having to invent type checking eventually because dynamic typing is a fundamentally unsound idea.
sidonay@reddit
Counterpoint: it’s cool to hate on web languages that work (PHP, ruby) 😎
(/s btw )
nNaz@reddit
As a rust dev I find ruby perf terrible but it’s an amazingly elegant and concise language to write in. If we ignore perf, Rails is an excellent MVC framework and a pleasure to develop with. Better devx and APIs than Django and far less boilerplate than fastapi and express.
paca-vaca@reddit
It's cool, but one moment I don't understand, why the whole codebase wasn't formatted in the first place? CI setup goes from the day 1.
One installs rubocop plugin and runs on save/git precommit/ci as part of the linting process. Such that you don't have to overwrite the whole codebase history by formatting.
totoro27@reddit
Legacy code base.
chucker23n@reddit
First: that's a fantasy.
And second: even if you do have all that set up from the start (management will rightfully ask if there aren't bigger fish to fry), you might want to change some formatting rules. Maybe the linter has new features. Maybe there's been enough rotation/attrition in the team that the new team members no longer agree with some of them.
sammymammy2@reddit
I don't get the whole ripper thing. Ripper is a Ruby library, right? So parsing is still done by executing Ruby code? Or does Ripper just go straight into the Ruby VM lexer and parser? Because that's what I'd do, run the Ruby VM and gobble up its parsing results, emitting it as pretty-printed code.
masklinn@reddit
That's basically the entirety of the "Rewriting in Rust" section.
Agreeable-Price8343@reddit
inking a full Ruby VM into a Rust binary to walk its parse tree in memory isn't a normal thing to do. "linking a full Ruby VM into a Rust binary to walk its parse tree in memory isn't a normal thing to do. But it worked, and that was enough for now."
this is the right kind of pragmatism. the alternative is someone insisting on a pure rust ruby parser in 2018 and the project never shipping. do the ugly thing first, clean it up when the ecosystem catches up, which is exactly what happened with the prism migration
masklinn@reddit
TBF that is more or less the origin story of ripper (sans pure ruby, or never shipping), eventually it was merged into the stdlib and ended up integrated directly into the yacc file definition (https://github.com/ruby/ruby/blob/79f9f8326a34e499bb2d84d8282943188b1131bd/parse.y#L1519).
peripateticman2026@reddit
25 MLOC? Probably 24 MLOC over what's required.
revereddesecration@reddit
How does it affect the usability of the repo from a history perspective?
I’d be tempted to go deeper and rework the history of the repo such that the previous commits are formatted retroactively. That’s probably too large a job on a codebase of Stripe’s size though.
sephirostoy@reddit
You can exclude a list of commits to ignore during blame via a .git-blame-ignore-revs
Schmittfried@reddit
I wish all tools honored it automatically.
Roang_zero1@reddit
Alternatively you can also ignore revs for tools like blame. I would check in a blame ignore file (see Git - git-blame Documentation).
Some forges like github will also ignore these revs for their views then.
Agreeable-Price8343@reddit
Claude responded: > "linking a full Ruby VM into a Rust binary to walk its parse tree in memory isn't a normal thing to do. "linking a full Ruby VM into a Rust binary to walk its parse tree in memory isn't a normal thing to do. But it worked, and that was enough for now."
this is the right kind of pragmatism. the alternative is someone insisting on a pure rust ruby parser in 2018 and the project never shipping. do the ugly thing first, clean it up when the ecosystem catches up, which is exactly what happened with the prism migration