Git’s hidden simplicity: what’s behind every commit
Posted by Low-Strawberry7579@reddit | programming | View on Reddit | 150 comments
It’s time to learn some Git internals.
Posted by Low-Strawberry7579@reddit | programming | View on Reddit | 150 comments
It’s time to learn some Git internals.
theillustratedlife@reddit
Git needs some UX help. Even after 15y of using it, I'm still not sure when I need to type
origin develop
as opposed toorigin/develop
.I suspect someone pedantic wrote a command that always needs a remote vs one where "that just happens to be a branch on another device that we reference with origin/" or something similarly clever; but as a user, I just want to know the command I type to refer to a thing and be done.
At the very least, they should change commands that need remote space branch to expand remote slash branch notation.
Blueson@reddit
origin/develop
should only be a thing if you're working on a branch you received from youroriging
remote, as described ingit remote --verbose
. This should be a representation of the branchdevelop
from that remote, when you last pulled it.origin develop
you only use when running something likegit push origin branch
. Which in reality isgit push <remote> <branch-name>
. I.e. what remote you're pushing the specified branch name for.I think there's a lot of valid criticism to give
git
, but these things are pretty clear and should be expected to be learnt if you've been working with git for 15 years.anzu_embroidery@reddit
90% of git criticism is people not understanding (or trying to understand) the model. This is perpetuated by every single “git for beginners” tutorial not explaining the model or even giving an incorrect model.
10% is legitimately weird and baffling
Blueson@reddit
I agree. I just get baffled when I see people claiming 10+ experience and complaining about some extremely basic concepts as if they are trying to understand the intricacies of general relativity.
I wouldn't mind having another tool that does it better than git, I think there might be some opinionated tools that work better.
But for most of us it's a tool we use on the daily. You don't need to understand all of it, but concepts required for you to work with the tool are really not that hard.
steveklabnik1@reddit
For whatever it's worth, I never found git annoying until I used jj. For me, it's just the same stuff I loved about git, but better. Your mileage may vary. I made my GitHub account in 2008, I'd been using git for a long, long time.
za419@reddit
origin/develop
is a branch - Namely, your local copy of thedevelop
branch on the remoteorigin
as of the last time you fetched.origin develop
, probably in the context ofpush
, is an instruction to push to thedevelop
branch of the remoteorigin
. You're not talking about your local reference to that branch (origin/develop
), you're talking about updatingorigin
withdevelop
.It's not exactly simple, but it is consistent, and you shouldn't really need the latter much for most workflows if you use
git push -u
at some point or otherwise tell git where you want to push to if you just saygit push
without arguments.ThePantsThief@reddit
I suspect the way to fix this would be to make it so that the remote copy of origin and your local copy are somehow one and the same
za419@reddit
I don't even think that would be difficult - At least, in one direction (a merge/update of
origin/develop
triggers an automatic push toorigin develop
).This is kind of down to git being designed around all repos being equal, though - Your copy of the repository (from git's point of view), is no more or less authoritative than what's on
origin
. Automatic updates make a lot of sense for how people actually use git (your local copy is "your" clone, but the one on GitHub/GitLab/BitBucket/etc is "the repository"), but not so much for how git is designed (Why would I want git to automatically try to push my code to Torvalds' machine just because I merged a branch?)BlindTreeFrog@reddit
only because i had to stop and thing about commands doing what he means...
git push origin lbranch:rbranch
vs
git reset origin/branch
But I realized that was a horrible example unless you do the push like
git push origin branch
which I think works, but I never use it that way so I haven't tried.
which commands are
origin branch
in pattern?rdtsc@reddit
I frequently use that to push a branch different from the one I'm on. Why would I do that? For example:
Working on
bar
, depending onfoo
. If I have to amend something infoo
I can do arebase --update-refs
which also updatesfeature-foo
, then push it.BlindTreeFrog@reddit
Yeah but that's still pushing the local branch. I realize now that I made a mistake in my earlier comment.
Leaving off the remote branch just defaults the remote branch name to the local branch. The reason that I never do it that way is because I want to make sure the remote branch name is correct, and I often have a different branch name locally than I'm using remotely.
adv_namespace@reddit
I use the
all the time (to delete remote branches), or is there a more idiomatic way of doing this?
BlindTreeFrog@reddit
It isn't something that I do often, but I think I do this normally:
git push origin :<rbranch>
(push a blank branch to remote)checking online, that is one of the ways to do it. Added in git 1.5.0
https://github.com/gitster/git/blob/master/Documentation/RelNotes/1.5.0.txt
https://stackoverflow.com/questions/2003505/how-do-i-delete-a-git-branch-locally-and-remotely
philh@reddit
Huh, I've always had trouble remembering whether it's local:remote or vice versa, but l(eft/ocal):r(ight/emote) might be a helpful mnemonic.
-Nicolai@reddit
Well, spaces always separate commands and arguments, while slashes always denote branching paths. As far as command line tools go, this is not subject to change.
magnomagna@reddit
develop
is a ref that lives in your local repo.However, did you know that
origin/develop
is also just another ref that actually lives locally in your local repo?develop
is a local ref that tracks commits in your local repo and the full local path isref/heads/develop
.origin/develop
is also a ref that lives in your local repo but it tracks commits that are on the remote repo namedorigin
! The full local path isrefs/remotes/origin/develop
.You can, in fact, substitute
develop
andorigin/develop
with their full paths. The origin inorigin/develop
is, in fact, a namespace used to disambiguate the two paths. Here's how it works: when searching for a branch, git will search in therefs/heads/
folder first and it the branvh doesn't exist there it will then search in therefs/remotes/
folder; and so, if you execute a git subcommand and you pass in a path such asdevelop
, it will look for it atrefs/heads/develop
first. If it exists, then git will use that location and it won't go searching forrefs/remotes/develop
; now, if you give the subcommandorigin/develop
, it will first search for it atrefs/heads/origin/develop
, but since all remote-tracking branch does not live underrefs/heads/
, the first search fails (unless, you're pathological and you've made a local branch called "origin/develop"), and git then triesrefs/remotes/origin/develop
and succeeds.There are actually 3 different folders that have higher precedence than
refs/heads/
. For reference, read https://git-scm.com/docs/gitrevisions .Now, to answer your question why do we sometimes specify
origin/develop
and other times,origin develop
, the answer is simply to ask does it make sense to pass in the REForigin/develop
or does it make more sense to pass in the name of the remote repo and the name/path of the branch that lives on that remote repo as two arguments?Take for example,
git push
. If you executegit push origin/develop
, it would NOT make sense at all because, as explained,origin/develop
actually lives locally in your repo atrefs/remotes/origin/develop
, i.e. it is just a ref that exists in your local repo just likedevelop
is another ref that exists in your local repo. So, callinggit push origin/develop
would imply "git please push my changes to the LOCAL REF called origin/develop", which makes garbage sense.That's why for that subcommand, it makes more sense to for you to specify the name of the remote repo and the path of the branch that lives on that remote repo, i.e.
git push origin develop
.In summary, in order to make sense when to use one argument
origin/develop
versus two argumebtsorigin develop
, you have to think in terms of the context, the git subcommand that you want to use.PowerApp101@reddit
-1 for ChatGPT answer
magnomagna@reddit
It's not chatgpt you idiot.
PowerApp101@reddit
Shrug still an AI answer
lgastako@reddit
It's almost certainly not. There are a quite a few of grammatical and rhetorical choices that I've never seen any LLM make.
magnomagna@reddit
Lmao
magnomagna@reddit
But thanks though... I'm kinda honoured 😁
ScumbagLoneOne@reddit
If you can’t understand what can be explained in a single sentence, it’s on you, not on git.
hayt88@reddit
Origin develop and origin/develop are 2 ( or better 3) very different things.
Why should git start treating them as the same. We just got away from stuff like that in git where 2 different things have the same name.
Like checkout to switch and restore now etc.
This seems to me like a lack of knowledge here.
Ayjayz@reddit
When are you typing
origin develop
? Generally the only time you'd do that is when you're trying to push a new local branch to a remote, in which case I don't know how you could do anything different here. Git needs to know which remote you're trying to push to, and it needs to know what to push.Great. There are 3 things here, and the way to refer to each of those three things is
origin
,develop
, andorigin/develop
. Now you know.case-o-nuts@reddit
The simplicity is certainly hidden.
etherealflaim@reddit
Yeah this was my first thought too... Most systems you hide the complexity so it is simple to use. Git is complex to use so the simplicity can be hidden.
That said, reflog has saved me too many times to use anything else...
zrvwls@reddit
git stash -u
should be in everyone's top most daily commands. my programming life is basically split in 2: the dark times before I knew about it; and the post caveman times where it has become an integral part of my development process.agumonkey@reddit
stash is very useful but it seems a symptom of a problem to me, how many people have a very long list of stashes that could have been quick rebase-insert or transient branch
steveklabnik1@reddit
This is effectively how jj handles stashing; there's no separate 'stash' feature, and there's anonymous branches, so a stash is just an anonymous branch that you've left around for whatever reason, until you're ready to come back to it. It works really nicely. it's just slightly more annoying in git because you have to give those branches names.
BlindTreeFrog@reddit
git stash
is basically why I hated git for the first year that I used it. It was far too easy to lose track of what changes were where, if you remembered what was there at all.Branching and switching between branches takes some practice, but that's fine. And as long as I remember to commit frequently and keep them small the repo is easy to manage, so I got used to things, but never using
stash
again without a gun to my head is much of why.silveryRain@reddit
Ditto. I made two git aliases that instead commit/uncommit my changes between my index and HEAD, that I use for the same purpose as stash.
So when I get to work on something else, I use
git ww
to push my index as two commits (one for staged, one for unstaged changes) on top of my current branch, switch to something else, and when I get back I usegit unww
to undo them back into my index.Glizzy_Cannon@reddit
Maybe it's VSCode's UI for stashing that helps me a lot, but I find stashing simpler. I can see why it would be more frustrating with raw git though
mpyne@reddit
It's fine with the raw git CLI, as long as you use it as intended. All it was ever meant to do was to let you quickly get to a clean working dir so you can switch to a different branch or pull cleanly into the current branch.
If you're trying to do more than that it's probably better just to do a 'WIP' commit (or commits). But I've definitely found stash very useful to have as a low friction way of quickly updating things, which is why I'm glad they've added things like
--autostash
to go with--rebase
on git-pull.zrvwls@reddit
Agreed, without vscode's git ui, I would hate stashes so much and swap to using actual commits and branches. stashes shine when paired with a git ui and keeping your stash list consistently clean (<=2 at any point in time, flexing up to 10 but never for more than a day or two).
It basically allows me to avoid rebase, merging, squashing, and all the headache of trying to figure out which code was committed when, and keep my changes in 1 patch. I hate over documentation from a million little commits, so 1 commit message for all my stuff rather than lots along the way works a lot better for me.
BlindTreeFrog@reddit
finding a decent gui was the trick that helped me get used to git, yeah.
Right now i'm using it over SSH and x-forwarding isn't a viable option so i'm all cli. It's fine, but it does make a few things more complicated.
Glizzy_Cannon@reddit
Transient branches would be fine if they weren't a pain to keep track of. Stashes with good stash messages are simpler imo
muntoo@reddit
Alternative to
git stash
:Usage:
zrvwls@reddit
if by problem you mean not wanting to have a commit message, then I agree with you. if not.. then I still agree with you, but I also feel attacked and will choose to believe a higher power intelligently designed us to be commitophobes in this way.
If you don't have to press next page at least twice when you do a git stash list, then you're probably not git stashing enough.
Orca-@reddit
Yeah, that's where I land with the stash. It's just another name for a commit, so why not just commit and rebase if that's what you want? Make a new branch and off you go.
steveklabnik1@reddit
This is basically what jj does by default, it’s great.
PurepointDog@reddit
What's that do?
Kenny_log_n_s@reddit
Stashes all changes (including new files that haven't been committed yet).
You can later pop those changes out of the stash onto a new branch, or the same branch.
xXVareszXx@reddit
What would be the best approach for local dev changes for files that are managed by git but should not be comitted?
Stashing them would break the dev env.
Kenny_log_n_s@reddit
Depends on the situation.
Is it changes to a file that you keep permanently modified in your local and never want to push the changes for? If so, is it code, or is config files?
xXVareszXx@reddit
Both.
Some are code files. It disables parts of the application that we are not working on so that we don't have to set it up locally.
But there are also conf files for local dev which are checked in but we don't commit, because not all teams use the same local dev setup.
Kenny_log_n_s@reddit
I would say this a problem managed with better solutions than git.
For example, disabling parts of the code could be done with an environment variable that is only ever activated in local environments.
For conf files, usually there is a way to have one central conf file that gets committed, and then a separate file with any overrides that is gitignored (e.g docker-compose.yaml and docker-compose.override.yaml)
There are ways to disable pudding further changes to a file, (check out git worktree), but I find it's usually a huge hassle, and a sign there's a better way to go about it
silveryRain@reddit
git update-index --assume-unchanged
Null_Pointer_23@reddit
Oh my god I never knew there was a way to stash new files. Thank you
cmpthepirate@reddit
Til -u, normally I have to git add all
saint_marco@reddit
Stash is just a janky, hidden commit. Why not just make a commit and checkout a different branch?
rdtsc@reddit
Because that would be much more complicated. Compare:
with:
For anything that should live longer than a minute I agree, do a normal commit.
elsjpq@reddit
Git tries to be an accurate model of what actually happens in development. Git is complex because development is complex.
I find systems that more accurately reflect what actually happens have a mental model that are actually easier to comprehend, since the translation layer between model and reality is simpler.
MrJohz@reddit
I disagree. Git is not a good model of development. It contains a fantastic underlying mechanism for creating and syncing repositories of chains of immutable filesystem snapshots, but everything else is a hodge-podge of different ideas from different people with very different approaches to development.
It has commits, which are snapshots of the filesystem, but it also has the stash, which is made up of commits, but secret commits that don't exist in your history, and it also has the index, which will be a commit and behaves kind of like a commit but isn't a commit yet. It has a branching commit structure, but it also has branches which are pointers to part of that branching commit structure (although branches don't necessarily need to branch). Creating a commit is always possible, but it will only be visible if you're currently checking out a branch, otherwise it ends up hidden. Commits are immutable snapshots, but you're also encouraged to mutate them through squashes and rebases to ensure a clean git history, which feels like modifying existing commits but is actually creating new commits that have no relationship to the old commits, making diffing a single branch over time significantly more complicated that it needs to be. The only mutable commit-like item in Git (the index) is handled completely differently to any other commands designed to (seemingly but not actually) mutate other commits. The whole UI is deeply modal (leaving aside the difference between checking out commits and checking out branches), with many actions putting the user into a new state where they have access to many of the same commands as normal, but where those commands now do subtly different things (see bisect or rebase). And while a lot of value is laid on not deleting data, the UI often exposes the more dangerous option first (e.g.
--force
vs--force-with-lease
) or fails to differentiate between safe and dangerous actions (e.g. force-pushing a branch that contains only commits from the current user, and force-pushing a shared branch such asmaster
/main
).To be clear, I think Git is great. Version control is really important, and Git gets a lot of the underlying concepts right in really important ways. It takes Google-scale repositories for major issues in those underlying concepts to show up, and that's a really impressive feat.
But the UI of Git, i.e. the model it uses to handle creating commits and managing branches, is poor, and contributes to a lot of bad development practices by making the almost-right way easy but the right way hard.
I really encourage you to have a look at Jujutsu/JJ, which is a VCS that works with multiple backends (including Git), but presents a much cleaner set of commands and concepts to the user.
silveryRain@reddit
Tried JJ, and I couldn't stand the way it would pollute my git repo with tons of refs that would show up when viewing the full history graph. I'd have given it more of a chance if it didn't feel like a one-way ticket that tanks the usefulness of one of my most-used git commands.
MrJohz@reddit
Yeah, JJ makes a lot of commits that aren't visible, which can polute the reflog. But I found that
jj op log
(history of the repo as a whole) andjj evolog
(history of a single change) were so much more useful than the reflog that that wasn't a problem for me. But if you're used to using the reflog a lot, then I can see why that would be more irritating than helpful.silveryRain@reddit
It's not the reflog that I mind, but
git log --graph --all
MrJohz@reddit
Why not
jj log 'all()'
in that case? This also shows the full history as a graph, but automatically hides the intermediate commits in any given change. Then if you need to look at those commits, you can do something likejj evolog -r xyz
to see the specific commits that were included in a change.I think the
jj log
default of only showing some of the commits is really useful 99% of the time, but it can be very surprising when people start using JJ and feel like they can't find a bunch of commits. But theall()
revset shows, as I understand it, essentially the same thing as--all
would for an imported repo (although the two views will diverge as JJ makes a lot more automatic commits.steveklabnik1@reddit
There's also
jj log -r ..
, which may or may not be easier to remember. I tend to use it because it's slightly shorter than'all()'
.silveryRain@reddit
Didn't have the patience to figure it out. That solves it, thanks!
magnomagna@reddit
There's one thing that doesn't make sense to me about Jujutsu. Why does it make a commit when there's conflicts? Why would anyone want a broken commit? Maybe I understand it wrong, but it just makes complete nonsense.
more_exercise@reddit
I'd make an argument in the abstract (not familiar with JJ) that having one commit represent the "naive" merge commit and a second "this is what the human decided to fix the issue with" is pretty reasonable.
I don't always remember how I resolved merge commits, and sometimes I have made bad decisions. Being able to look carefully at what was automatic, what was manual, and what the manual intervention was? That seems valuable.
magnomagna@reddit
You're talking about two separate commits but the roblem with JJ is that it will create a commit with conflicts included unresolved, which makes zero sense, unless I understand it completely wrong.
steveklabnik1@reddit
I know you've had some good conversations about this already, I'm a bit late to this thread, but a thing that maybe can help:
jj has the ability to say "hey this commit has a conflict" stored natively inside of it. It's not like it commits the text with the conflict markers in it. It's a first-class feature. So it will tell you "hey this commit has a conflict" when you look at your log, etc. You won't accidentally try to build things on top of messed up commits.
more_exercise@reddit
I should clarify that a "naive" merge commit would be completely able to handle a conflict. It would not be able to resolve it. Yes, this commit would be nonsense, but at least it is honest about being nonsense, and the expectation of an immediate child commit to impose sense is wbeee the sense lives.
I've been bit by a coworker human-naively resolving a merge commit by deleting a other coworker's work, re-introducing a bug that had been resolved.
From a git-brain perspective: what if there were a way to mark the decisions that my coworker made in the merge commit separate from the algorithmic merge results? It wouldn't need to be a new commit in git-land, but additional information attached to the commit.
I agree that the entire git work flow gets hosed if we allow this weird intermediate state to be included in the git history. It would be a horrible idea. I'm talking about a hypothetical different tool. ("dude this brainfuck compiler writes horrible assembler")
more_exercise@reddit
I also consider it to be best practice to commit the output of a tool entirely as-is in a single commit, with subsequent human fixups as a separate step.
MrJohz@reddit
I think a lot of people explain this by saying you can resolve the conflict whenever you like, but then leave the "whenever you like" time scale very open, which feels confusing. You don't want broken commits, they're not useful, so you normally want to resolve them ASAP.
What Jujutsu's approach allows, though, is that when a conflict (or chain of conflicts) appears, you can still interact with the repository as normal while you're resolving it. For example, you can switch to a different branch or a different point in the history and explore what's going on there while you're rebasing. Or you can resolve the change, decide that's not what you want, undo the resolve, stash that resolution attempt, then try again without losing any data.
Recently I've just got back to work after an extended break, and there were a bunch of conflicts that showed up when I rebased some of my WIP-branches against the updated master branch. But firstly: I could rebase all my WIP branches at once without having to worry about which ones would produce conflicts. And secondly, once I'd done that rebase, I could decide in which branches it made sense to fix the conflicts, and which branches were better to abandon and start from scratch. And for the branches which I started from scratch, I could keep the conflicted branch around so I could use it as a reference when I needed to check how I'd done something before, and then delete those branches when I was finished.
magnomagna@reddit
I don't get it. Why do you have to create broken commit with unresolved conflicts in it just so then you could explore other branches to find the best branch to rebase onto? Makes no sense. You could find the best branch to rebase onto without creating a broken commit with git.
MrJohz@reddit
You're not looking at other branches to see which branch is best to rebase onto — you've already done the rebase! In the example I gave, you can look to see which branches have conflicts that are easy to resolve and where it'll be easier to resolve those conflicts and use the branch, or which branches have larger conflicts where rewriting from scratch might be an easier option.
Another way to think about it is this: in Git, when a rebase produces a conflict, the whole repository is in this semi-broken "rebase" state where the actions you can perform are very limited. In JJ, only the conflicted commit is in this semi-broken state, but the repository as a whole in never broken.
magnomagna@reddit
That's exactly what I'm confused about. The rebase even when there's unresolved conflicts will be successful, meaning it will create at least one commit with conflicts in them. How is that good? Your commit history now has an immutable commit with conflicts in them.
If you want to compare multiple rebases onto different branches, then sure, in this case, even with git, you'll have to do the the same number of rebases and record the conflicts for each rebase. Even if JJ makes it easier for such a use case, it's just too niche to make it worth having broken immutable commits in the history.
MrJohz@reddit
JJ's commits all have a change ID, and the active commit for a given change ID can evolve over time. This creates the appearance of mutable changes, even though you're working with immutable commits.
So you might have a commit aaa1234, which points to change ID zyxwxyz. When you rebase that commit, JJ will create a new So when a rebase creates a conflict, JJ creates a new commit, say bbb1234, pointing to the same change ID, and it will hide the old commit. (It still exists in the repository, but it won't be visible in the commit tree because we're no longer working with that commit.)
If bbb1234 has a conflict, then it will be marked in the commit tree so we can see that. We'll see that change zyxwxyz is currently pointing to commit bbb1234 which has a conflict. We can resolve the conflict with e.g.
jj resolve -r zyxwxyz
, which will create a new commit ccc1234, which again points to zyxwxyz, and it will again hide the old commit. It will also automatically rebase any commits after bbb1234 for us.So you're correct that the rebase-with-conflict creates this quasi-useless immutable bad commit, but JJ also has these mutable changes. This gives us a way of referring to a commit that has been rebased several times, or maybe had conflicts resolved, without having to worry about what the current immutable commit hash is.
The above is the technically correct way of understanding what's going on, but most of the time a simpler explanation suffices: JJ doesn't use immutable commits, it uses mutable changes, and that means you can update a change by rebasing it or resolving conflicts in it without creating new hashes.
Also note that in JJ you can rebase multiple branches simultaneously, which is another case that makes commits-with-conflicts really useful. At my work, I often have multiple little PRs open, and when master updates, I can rebase all active branches onto latest master in a single command, immediately seeing where the conflicts are. This wouldn't be possible with Git — even if I had a script that ran multiple rebases one after another, I'd still only be able to resolve those rebases one at a time.
This all feels like a niche workflow, but I think that's because, if you're used to Git, you're used to Git's limitations. Whereas once you start using JJ, things that used to feel complex and niche suddenly start feeling really normal.
magnomagna@reddit
I mean, with git, you could also do the same thing that JJ does. You could just as easily git add -A and then git rebase --continue, which will create a broken commit, but yea that will also move the branch head, which can be easily solved by creating a dummy branch to rebase. But yea with JJ, I bet you don't have to go through all that hassle to do many rebases at once. Still very niche use case though.
MrJohz@reddit
git add -A
doesn't add quite enough information to work here — you also need to know information about what was being rebased where in order to properly reconstruct the rebase when it gets resolved later. But in theory, yeah, you could add the relevant metadata to the git commit somehow and maybe write a little script to do all this automatically and then resolve the rebases manually. But you still wouldn't have the change IDs , which means it would still be difficult to refer to a commit before and after it has been rebased.But to be clear, doing many rebases at once is not a particularly niche use case. It's something I do multiple times a week to keep my branches up-to-date because it's so easy and convenient. It would be niche in Git, sure, but with JJ, because this is such an easy and obvious operation, it's much more common.
magnomagna@reddit
The point is to commit all the conflicts as-is. So, git add -A.
MrJohz@reddit
But JJ doesn't "just" create a broken commit. It creates a commit that includes all the information about the rebase, so that later the rebase can be resumed. That's the really important difference here — JJ isn't just creating a bad commit for the sake of things, it's creating a commit that describes a conflict that can be fixed.
git add -A
can't do that — the default conflict diff doesn't include enough information to do a proper three-way merge.magnomagna@reddit
Well, another way is to
merge --squash
as this will create the NET conflict. I'm actually now suspecting JJ actually does squash merge.martinvonz@reddit
I don't know what you mean by that but I'm pretty sure it's not correct. See here for how it actually works: https://jj-vcs.github.io/jj/latest/technical/conflicts/
magnomagna@reddit
You don't even know what a squash merge is? Then, how do you even know it's not correct? That's pretty bold of you.
The link you gave me doesn't describe how rebasing is implemented by JJ, which is what I was talking about. That link explains how JJ simplifies merge conflicts. That's a completely different topic from "how JJ implements rebasing".
martinvonz@reddit
I know what squash merge is. I just don't know what you mean by "I'm actually now suspecting JJ actually does squash merge.". JJ doesn't itself do squash merging implicitly anywhere. There's no
jj rebase --squash
option either (like Mercurial'shg rebase --collapse
, which you could call a squash merge).I thought this thread was about how JJ handles conflicts. That's why I shared the link. JJ rebases commits just like Git does, i.e. by doing a three-way merge of the trees and then recursively attempting to resolve conflicts in the trees. Was there confusion around that?
magnomagna@reddit
My point is about what happens behind the scenes, the implementation, not the interface. I don't care if JJ doesn't provide squash merge command to the user. Since JJ creates commits when there are conflicts, really, the only way possible is to run merge --squash and then a commit for every single commit to be rebased.
I'm not really talking about conflicts. I'm talking about the implementation of JJ rebase.
martinvonz@reddit
That's what I tried to answer with the link I shared. There is no squash merge involved, at least not the way I think it of it.
For context, I started the project, so I know pretty well how it's implemented. I don't quite understand your question well enough to answer it any better, I'm afraid. Maybe that's a more specific question I can answer.
magnomagna@reddit
Well, like I said already, the link you shared isn't about the implementation of JJ rebase. I don't know how many times I have to repeat that. I don't know how else am I supposed to say "rebase implementation". You don't even seem to understand "implementation of rebase" and I kinda doubt you know what a squash merge is.
martinvonz@reddit
We use 3-way merge like Git does. That's still my best answer for how rebase is implemented. If you have a more specific question about it, I can try to answer that.
Maybe the confusion is because JJ handles conflicts in a very different way from Git. But you said you're not asking about conflicts, so that leaves me confused about what's unclear.
magnomagna@reddit
Everything is 3-way merge in git. Even cherry-pick is a 3-way merge (ort strategy specifically). That doesn't answer anything. I've been saying since the beginning that all I'm interested in is how the rebase is implemented in JJ, not the conflicts. Please, I don't think you know a thing about the internals.
martinvonz@reddit
You think I started the project and still don't know anything about the internals? That would be unusual, no? You can check the repo and see that I have a few thousand commits in it (plus about a thousand before it was open-sourced).
I can probably share a pointer to the code if you like, but just "the implementation of rebase" is too broad for me to be able to share something useful. (E.g. https://github.com/jj-vcs/jj/blob/8cd43d169fa1fd856025c7819c157c7f3178cc44/lib/src/rewrite.rs#L141-L149 doesn't seem all that useful.)
magnomagna@reddit
Well then how hard is it to answer "how does JJ implement rebase" ?
martinvonz@reddit
It's not hard. It's just a waste of time to write tons of details when I don't know what you're wondering about. As I've said many times already, I'm happy to answer more specific questions.
magnomagna@reddit
If it's a waste of time, instead of answering it, why have you wasted so much time answering nothing at all?? Are you sure you started the project?
The problem is simple. Say you're rebasing branch B onto A. In git, this means rebasing every single commit in A..B (I don't have to explain this notation cause you're an expert). However, since jj creates a commit even when there's conflicts, then jj will create as many commits as there are in A..B even when every single one of them has conflicts in it. How is this done by JJ? Cause you can't just replay a commit on top of another commit that already has conflict markers in it because the existing conflict may overlap with another conflict. So, the ony way possible is to squash merge. This is my guess.
martinvonz@reddit
Correct.
That's what the link I shared in my first reply is supposed to explain. I guess it didn't do a good job. As MrJohz said, we don't store conflict markers. Instead, we store the inputs to the conflict (see "Data model" in the linked doc).
When rebasing each commit in the chain, we do a 3-way merge just like Git. The main difference is that we allow the state of a commit to be in a conflicted state and we do some algebra on these conflicted states (see "Conflict simplification").
In your example, let's say A..B had commits B1 and B2 and that B1 was based on commit X. The state in the rebased commit B1 (let's call it B') would then be B1'=A+(B1-X). If we cannot automatically resolve that merge, then we leave it as a conflicted state. The rebased commit B2 will then be B2'=B1'+(B2-B1)=(A+(B1-X))+(B2-B1)=A+(B2-X).
HTH
magnomagna@reddit
Thank you. Yeah, it wasn't obvious from the article alone that that's what happens with rebasing. After all, the article does not directly mention rebase is a series of 3-way merges like you just told me now.
Another problem is that I have a hard time understanding why a 3-way merge between A, B, C with B as the base can be represented as A + (C - B), because it seems to suggest "apply the patch C - B on top of A", but that's not the 3-way merge as I understand it which is to compare the diff A - B with the diff C - B.
pihkal@reddit
Why are you concerned there's an immutable commit? It's not an issue in practice.
First, we need to distinguish between jj changes and jj commits. Think of a change as a chain of commits with a stable identifier, that always points to the most recent commit by default.
When you have a conflict, yes, there's a commit in the repo, but as soon as you fix it, you'll update the change's latest commit with the fixed version, and everything downstream is automatically rebased off that.
The process is usually something like
jj new conflicted-id
-> fix the changes ->jj squash
, and then you never think about the commit with the conflict again.Unlike git, where you have to address the conflict immediately, or back out, jj lets you defer it until later. Great if your boss runs in while you're fixing a conflict and says "Can you make XYZ your immediate top priority?"
magnomagna@reddit
No, I didn't mean the immutability was an issue. I meant because it's immutable, you can't modify the same commit to get rid of the conflicts. You'll have to create a new commit in order to resolve the conflicts.
So, I was concerned that the commit history would be peppered with broken commits given how common it is to get rebase conflicts.
However, since you said the downstream will be rebased to the new commit that will be created once the conflicts are resolved, at least the old broken commit with conflicts will not be directly reachable (and I hope it's gc'd immediately). So, that's one thing I didn't know before about JJ.
Still, I don't know how deferring fixes work with JJ. That sounds interesting. I mean , you could do the same with git too but you'll have to create a commit with your WIP changes or just stash them. How does deferring work in JJ exactly?
pihkal@reddit
Yes, technically the conflicting commits still exist unless GCed, yes. (I don't know details about that.)
But 99.99% of the time you're looking at just the latest commit in a change, which is presumably one that has the conflict fixes. Anything that uses a change ID, by default uses the latest commit in it. So all the basic operations (log, squash, rebase, new, prev/next, etc) won't refer to those hidden conflicting commits. Only deep plumbing commands like op log and evolog will typically surface them.
I've had to go spelunking under the hood of a change for a specific commit maybe twice in a year and half of using jj.
In jj, commits are labeled as conflicted until they're fixed, but they don't block anything. It's not like git where you enter a modal state that has to be completed, or canceled. You can use all the normal jj commands to go elsewhere in the tree, and come back to fix it whenever. No need to stash anything either, in jj, everything's a commit. (Really don't miss the git stash.)
Truth is, though, I don't usually defer fixes. If I've been working on something and get a conflict rebasing, I figure it's fresh in my mind, might as well do something about it now.
Sometimes if I squash farther back in history, it'll cause a conflict with older feature branches, and those I might let sit until I get back to that feature.
Even if you don't want to defer conflicts often, it's sometimes nice to have the option.
magnomagna@reddit
Yea, based on what you've described so far, I think the mental model for JJ is that commits are mutable. Very interesting. Thanks for explaining all that to me. Appreciate it 🙏
pihkal@reddit
Well, changes are mutable, despite having stable IDs, but the underlying commits technically aren't. I think the change/commit relationship part of jj could be better explained, honestly.
If you give it a go, hope you enjoy it. After a couple weeks of jj, I largely abandoned git forever.
I don't know if there are better tutorials now, but the ones I read when I got started were https://v5.chriskrycho.com/essays/jj-init/ and https://steveklabnik.github.io/jujutsu-tutorial/introduction/introduction.html
magnomagna@reddit
nah I don't have a plan on trying it out but I am curious about how JJ is designed to simplify VCS workflow
elsjpq@reddit
Those are certainly very valid complaints, and the UI can be quite awkward, but that is true of any old tool that aims to have good backwards compatibility. Personally though, I've found the fundamentals to be quite easy to learn, because it accurately models basically 100% of the things I'm already doing in development.
It's certainly not a pretty result, but I personally find that to be a strength of git; anything that anyone would ever want to do, sane or insane, is available in git. It's certainly better than the situation where you know exactly what you want, but the system is not capable of accommodating it because it's just slightly unusual.
There are lots of features of git that will probably not fit into your preferred workflow and that's ok. But I like that Git is complete in the sense that no matter what weird process you have, git has a mechanism to model that. Typically, any system that is nice and pretty is not general enough to model real world complexity.
MrJohz@reddit
The fundamentals are really easy to learn because the fundamentals aren't that complex. The problem is that the fundamentals will only take you so far. For example, most people don't include rebasing or other tools that help developers craft clean commits to be part of the fundamentals, but if you look at how projects like Linux or Git use Git, you'll see that they put a lot of value on clean commits because they're really useful for understanding how and why different components have changed over the years. But because doing that is unnecessarily hard in Git, most developers have settled on a "lots of WIP commits, then a big squash or merge commit at the end" approach. This works, but leaves a lot of unnecessary cruft in the history at the end.
I also disagree that having lots of features makes the tool more powerful. Rather, I think it's the other way around. One of the reasons for adding lots of new commands to Git is that the Git model doesn't really support a certain behaviour very well. But if you find a better starting model, you might be able to support all of Git's behaviours and more, without the proliferation of different, contradictory commands.
That's what I think Jujutsu does well. The model that's presented to the user is a lot simpler (e.g. there is no stash, and no named branches in the way Git has branches). But neither of those ideas need to be explicitly built into Jujutsu for it to be able to use them. For example, to stash changes, you create a new commit based on the parent commit — all the work you've done so far is automatically saved, and you can see in the logs that it's a WIP commit. You can even add descriptions and things as necessary. Similarly, if you want to start a new branch, you can directly create a commit in the place you want it. You don't have to create the branch first.
This model is simpler, because there's a smaller set of basic commands, but it is much more powerful: it makes complex commands like rebasing and complex merges way easier; it allows you to see how commits have evolved over time; it allows you to capture repository state much more easily; and so on.
uh_no_@reddit
git....isn't that old....
bastardoperator@reddit
The existence of 54 different Git GUIs suggests we're solving the wrong problem. Git's complexity isn't a UI issue, it's a conceptual model that doesn't naturally translate to point-and-click interfaces.
Git - GUI Clients
MrJohz@reddit
By "Git UI" I mean the user interface that Git presents, not necessarily the GUIs that build on top of that. So things like
git add
/git commit
/git rebase
etc — how these commands behave and what they do.My assertion is that the basic commands of Git don't really match how development actually works. Or rather, they match different styles of development at different levels of complexity, but often only partially and in ways that make it difficult to get a cohesive view of how Git works under the hood.
verrius@reddit
It's fun, cause even with all this complexity, it doesn't support basic functionality like locking a file a or a directory. Pretty much at all. Simply because the only lock it's author ever needed was on the entire repo, cause he doesn't give a shit how other people work. And he has the luxury of sitting in a position where it doesn't matter to him, and he can just force anyone who wants to interface with him to deal with it.
Orca-@reddit
Counterpoint: Mercurial with Evolve is easy to use because there's nothing special about using a DAG to represent commit history, Git just happened to win the mindshare war.
suckfail@reddit
As someone who spent most of their career using TFS, I really miss auto-merge. Git's behaviour on conflict resolution is just atrocious in comparison.
knome@reddit
what does TFS do differently in the face of conflict? I've always found git's conflict marking to be pretty straightforward. I know it has a couple of different strategies you can use, but I've never felt the need to swap off the default.
suckfail@reddit
If two people modify the same block of code or even the same line TFS, can usually reconcile it automatically and correctly.
Everytime it happens to me in Git it just shows both renditions of the code and I have to manually merge it.
elsjpq@reddit
have you tried diff-algorithm=histogram?
therealdan0@reddit
My only regret is I don’t have more upvotes to give to this comment. This should genuinely be at the top of every git lifehacks article.
suckfail@reddit
No, I don't even know what that is lol
lgastako@reddit
https://adamj.eu/tech/2024/01/18/git-improve-diff-histogram/
rysto32@reddit
Use a third-party merge tool like kdiff3 or meld or whatever the cool kids are using these days. Just make sure that it does a 3-way merge, not a 2-way merge.
knome@reddit
sounds pretty sophisticated. have you ever seen it run together code from different patches and create a subtle bug? the git default of flagging any changes that get too close always seemed pretty reasonable to me.
Global-Biscotti-8449@reddit
Mercurial with Evolve works well too. Using DAGs for commit history is common. Git just became more popular
timbar1234@reddit
So many interviews where someone's tried to trap me in complicated git scenarios and I've just had reply... well reflog exists and is actually quite easy if you have the courage.
"We don't use that here"
more_exercise@reddit
"there's a fire in the kitchen. How do I put it out?"
"have you considered using the fire extinguisher that is legally manded to be installed at a handy location?"
"We don't use that here"
I respect interview questions like "let's re-implement this standard library function", but if you as me a question like "re-order this string into the permutation that occurs next alphabetically" and I tell you that it's implemented in the standard library as
std::next_permutation
, and you say "we don't use that here"? Dude. Your question sucks.agumonkey@reddit
git fits the "simple, not easy" motto to me
Bradnon@reddit
I've heard people say the same thing about kubernetes. It's intimidating to approach at first but when you begin to understand that it's simply a modular API with a finite list of practical effects, it's really simple to work with.
I wouldn't say either are complex to hide the simplicity, but it's a complex system built up of many repeated instances of simple parts.
They're by far my preferred systems to work with because "simple systems with hidden complexity" fail in ways that were never meant to be introspected, and suck to debug.
dcabines@reddit
I think of it like a transmission in a car. A manual transmission is a simple system, but harder to use while an automatic transmission is a way more complicated system, but is easier to use. git is like a manual transmission and I appreciate that.
RedditDistributions@reddit
“clear as mud”? My favorite coding mentor would always ask me this after 45mins of explaining something new to me
Probable_Foreigner@reddit
One day I hope someone makes a breakthrough in version control. The simplicity of SVN with the capability of Git would be the dream. Something that is simple but can have local commits before they are pushed to the server, and good branching support
starlulz@reddit
Git really isn't "complex," it's just incredibly flexible, which means there's no one way to use it "right" and some ways to make things go particularly wrong. I think what people are actually saying when they say they want a "simple" version control system is they want something with a single way to do things that can be easily documented for reference. Which would be ok, but it would be limiting. They'd use it for a while, realize its drawbacks, and then wish for some of that "complexity" to be able to overcome those drawbacks.
tl;dr: you don't get to have your cake and eat it too with your version control's "complexity"
Orca-@reddit
Git isn't complex, its user interface is just absolute trash. Their porcelain is anybody else's sewer.
nekizalb@reddit
There are a ton of interfaces for git though. I'm sure one of them out there could appeal to you. I hate the CLI model personally, and I was very used to tortoise swan, so tortoisegit works well for me. Some of my coworkers use kraken, github desktop, visual studios inbuilt git. There's lots of options out there. Hopefully one can work for you
Orca-@reddit
I’ve used tortoise git for years and it’s fine, but every time I have to use the command line I need to search for the exact syntax for what I want to do. Hg I used the command line more frequently than the GUI because it was so intuitive to use.
It’s not that I can’t use git, it’s that it is painful in stupid ways to use git that are not necessary.
rdtsc@reddit
Really curious what those are. Since there's nothing complicated about common day-to-day commands. It sounds more like you drop to the CLI for complicated stuff the GUI can't do. And complaining about that seems a bit unreasonable.
Intuitiveness for something you use many many times each day for years is IMO a bit overrated. You learn it, use it, and get used to it. This is similar to how frequent users don't really care or notice how an icon in a GUI looks. They know the position of the icon or its keyboard shortcut.
And for what it's worth, I've had the opposite experience with hg. I found it painful to use, it was opaque and inflexible. In the end I used hg-git so I don't have to deal with it.
Orca-@reddit
"Nothing complicated about day to day commands" -- except for the inconsistent syntax that needs to be memorized instead of simple, consistent syntax where if you know the verb, you know how to use it, and if you don't know how to use the verb the help makes it easy to use.
I used Mercurial for 5 years and Git for 7 and I still miss Mercurial.
It's not that I can't use it, it's that I don't like using it because I've used better tools.
Probable_Foreigner@reddit
I just find small fustrations with git which were simpler in svn. E.g. in svn I can always do "svn up" when I want. In git I need to stash my local changes. In svn, updating a sub-folder to a particular revision is also easy but it's a pain in git. From there, if I want to get back to the most current version I can just svn up again and be back where I was. Try doing that in git without getting a detached head or some other bs.
Ayjayz@reddit
git restore --source <ref> <sub-folder>
Doesn't seem very painful?
Probable_Foreigner@reddit
That doesn't quite do the same thing as svn up since svn up will revert the versions of those files to the older ones where is this is more similar to "svn revert". The advantage here being that we can see the diff easier to the old versions.
This also doesn't include what needs to happen if you have local changes.
You have to know of 3 different commands and type out 6 whereas in svn it's
You need to know 1 command and type it twice. It's much simpler.
Ayjayz@reddit
Ok well make that alias in git? I don't see the issue.
Probable_Foreigner@reddit
Tbh it's probably a skill issue but I don't know what an alias is. My brain is too unga bunga for all this git configuration stuff. Svn is just simple out of the box.
Ayjayz@reddit
https://git-scm.com/book/en/v2/Git-Basics-Git-Aliases
You need to read the whole book, but that's the chapter on aliases.
_Ashleigh@reddit
Not to mention why the hell are they doing a partial update? I moved my company from SVN to git, and before, a LOT of issues stemmed from developers doing partial updates
carrottread@reddit
There is https://pijul.org/ But switching from git probably will never happen because it's already everywhere and everybody depends on it.
Nine99@reddit
Isn't the problem that it's impossible to cater to all demands? If two people change something, you don't know if those two changes interact badly or unexpectedly, and that's not even talking about two people changing the same thing.
MrJohz@reddit
I've already recommended it in another thread, but Jujutsu is great. It's got multiple backends, but the main one is Git, which means it's fully backwards compatible with existing Git repositories (you can even use it and Git at the same time on the same repository). And it's simple. It has a few relatively basic concepts, but they compose really nicely, which means once you've learned those basic concepts, you can do some really powerful things.
ydieb@reddit
It is so simple under hood because they pushed all the complexity up to the user.
mpyne@reddit
That's great, that's exactly where I want it. I need to figure out wtf is going on either way, but having a clean way to know what the tool is going to do when I tell it to do something is by no means guaranteed. At least with git I can feel confident in understanding what it's trying to do and how it works so that when I understand what I want to accomplish, I know that it won't get lost in translation with git.
NotMyThrowaway6991@reddit
What is complex you?
Most users could benefit from taking 5 minutes to read an article on "how does git rebase work"
WallyMetropolis@reddit
It's exactly the design you end up with when people keep saying, "just add a user option for that."
SikhGamer@reddit
I love how every now and then there a Git blog that tries to convince you it is easy, simple, and awesome to use.
If that was true; these sites would not exist:-
Conscious_Support176@reddit
IMO, the two main problems with git are that switch and restore git bundled into a poorly named commit, and merge and rebase were split into separate commands.
The combination allows people to pretend to themselves that commit and merge is the equivalent of checkin, per their preferred way of thinking about source control, and complain that git is complicated when this approach lets them down as opposed to learning how to use the tool they are working with.
dontquestionmyaction@reddit
This is just a list of FAQs. Really not supporting your point.
Nijc0@reddit
I don't like this kind of argument:
If you have any complex CLI tool, a guide can always be helpful.
But yes, of course stuff could be quite a bit more intuitive.
thugcee@reddit
I stopped reading when the author admitted he thought hashes were randomised.
lachlanhunt@reddit
Admitting you didn’t know everything or that you made incorrect assumptions before you started researching a topic to find out the truth is not a sign of weakness.
thugcee@reddit
Of course admitting to incorrect assumptions is a very good thing. But making nonsensical assumptions (joining any version control system with randomisation) can discourage some people from being lectured by the person who made them. I can admit that I shouldn't click the link in the first place. I should know that the "hidden simplicity" can't be anything else than what is described in every git explanation for beginners. But guessing from popularity of the post, there is still not enough of them.
Low-Strawberry7579@reddit (OP)
Alright, genius, I admitted that I really thought commits were somehow salted (like payloads in cryptography/public-key encryption), and I explained the reason for my misunderstanding: running git commit --amend without any edits changed my hash. I also explained in the article what actually happened.
Read on and judge the actual substance. Unless you’re a Git pro, then just move along ;)
more_exercise@reddit
Gonna pull spoilers in a sec.
My guess: commit and author dates are included in the hash. Amend updates only the commit date.
hentai_proxy@reddit
It is remarkable how well Git hides its simplicity.