mirror of
https://github.com/microsoft/debugpy.git
synced 2025-12-23 08:48:12 +00:00
509 lines
17 KiB
Text
509 lines
17 KiB
Text
=pod
|
|
|
|
=for comment
|
|
DO NOT EDIT. This Pod was generated by Swim v0.1.48.
|
|
See http://github.com/ingydotnet/swim-pm#readme
|
|
|
|
=encoding utf8
|
|
|
|
=head1 Introducing Git Subrepos
|
|
|
|
There is a new git command called C<subrepo> that is meant to be a solid
|
|
alternative to the C<submodule> and C<subtree> commands. All 3 of these
|
|
commands allow you to include external repositories (pinned to specific
|
|
commits) in your main repository. This is an often needed feature for project
|
|
development under a source control system like Git. Unfortunately, the
|
|
C<submodule> command is severely lacking, and the C<subtree> command (an
|
|
attempt to make things better) is also very flawed. Fortunately, the
|
|
C<subrepo> command is here to save the day.
|
|
|
|
This article will discuss how the previous commands work, and where they go
|
|
wrong, while explaining how the new C<subrepo> command fixes the issues.
|
|
|
|
It should be noted that there are 3 distinct roles (ways people use repos)
|
|
involved in discussing this topic:
|
|
|
|
=over
|
|
|
|
=item * B<owner> — The primary author and repo owner
|
|
|
|
=item * B<collaborators> — Other developers who contribute to the repo
|
|
|
|
=item * B<users> — People who simply use the repo software
|
|
|
|
=back
|
|
|
|
=head2 Introducing C<subrepo>
|
|
|
|
While the main point is to show how subrepo addresses the shortcomings
|
|
of submodule and subtree, I'll start by giving a quick intro to the
|
|
subrepo command.
|
|
|
|
Let's say that you have a project repo called 'freebird' and you want to have
|
|
it include 2 other external repos, 'lynyrd' and 'skynyrd'. You would do the
|
|
following:
|
|
|
|
git clone git@github.com/you/freebird
|
|
cd freebird
|
|
git subrepo clone git@github.com/you/lynyrd ext/lynyrd
|
|
git subrepo clone git@github.com/you/skynyrd ext/skynyrd --branch=1975
|
|
|
|
What these commands do (at a high level) should be obvious. They "clone" (add)
|
|
the repos content into the subdirectories you told them to. The details of
|
|
what is happening to your repo will be discussed later, but adding new
|
|
subrepos is easy. If you need to update the subrepos later:
|
|
|
|
git subrepo pull ext/lynyrd
|
|
git subrepo pull ext/skynyrd --branch=1976
|
|
|
|
The lynyrd repo is tracking the upstream master branch, and you've changed the
|
|
skynyrd subrepo to the 1976 branch. Since these subrepos are owned by 'you',
|
|
you might want to change them in the context of your freebird repo. When
|
|
things are working, you can push the subrepo changes back:
|
|
|
|
git subrepo push ext/lynyrd
|
|
git subrepo push ext/skynyrd
|
|
|
|
Looks simple right? It's supposed to be. The intent of C<subrepo> is to do the
|
|
right things, and to not cause problems.
|
|
|
|
Of course there's more to it under the hood, and that's what the rest of this
|
|
article is about.
|
|
|
|
=head2 Git Submodules
|
|
|
|
Submodules tend to receive a lot of bad press. Here's some of it:
|
|
|
|
=over
|
|
|
|
=item * L<http://ayende.com/blog/4746/the-problem-with-git-submodules>
|
|
|
|
=item * L<http://somethingsinistral.net/blog/git-submodules-are-probably-not-the-answer/>
|
|
|
|
=item * L<http://codingkilledthecat.wordpress.com/2012/04/28/why-your-company-shouldnt-use-git-submodules/>
|
|
|
|
=back
|
|
|
|
A quick recap of some of the good and bad things about submodules:
|
|
|
|
Good:
|
|
|
|
=over
|
|
|
|
=item * Use an external repo in a dedicated subdir of your project.
|
|
|
|
=item * Pin the external repo to a specific commit.
|
|
|
|
=item * The C<git-submodule> command is a core part of the Git project.
|
|
|
|
=back
|
|
|
|
Bad:
|
|
|
|
=over
|
|
|
|
=item * Users have to know a repo has submodules.
|
|
|
|
=item * Users have to get the subrepos manually.
|
|
|
|
=item * Pulling a repo with submodules won't pull in the new submodule changes.
|
|
|
|
=item * A submodule will break if the referenced repo goes away.
|
|
|
|
=item * A submodule will break if a forced push removes the referenced commit.
|
|
|
|
=item * Can't use different submodules/commits per main project branch.
|
|
|
|
=item * Can't "try out" a submodule on alternate branch.
|
|
|
|
=item * Main repo can be pushed upstream pointing to unpushed submod commits.
|
|
|
|
=item * Command capability differs across Git versions.
|
|
|
|
=item * Often need to change remote url, to push submodule changes upstream.
|
|
|
|
=item * Removing or renaming a submodule requires many steps.
|
|
|
|
=back
|
|
|
|
Internally, submodules are a real mess. They give the strong impression of
|
|
being bolted on, well after Git was designed. Some commands are aware of the
|
|
existence of submodules (although usually half-heartedly), and many commands
|
|
are oblivious. For instance the git-clone command has a C<--recursive> option
|
|
to clone all subrepos, but it's not a default, so you still need to be aware
|
|
of the need. The git-checkout command does nothing with the submodules, even
|
|
if they are intended to differ across branches.
|
|
|
|
Let's talk a bit about how submodules are implemented in Git. Information
|
|
about them is stored in 3 different places (in the top level repo directory):
|
|
|
|
=over
|
|
|
|
=item * C<.gitmodules>
|
|
|
|
=item * C<.git/config>
|
|
|
|
=item * C<.git/modules> — The submodule repo's meta data (refs/objects)
|
|
|
|
=back
|
|
|
|
So some of the information lives in the repo history (.gitmodules), but other
|
|
info (.git/) is only known to the local repo.
|
|
|
|
In addition, the submodule introduces a new low level concept, to the
|
|
commitI<tree>blob graph. Normally a git tree object points to blob (file)
|
|
objects and more tree (directory) objects. Submodules have tree objects point
|
|
to B<commit> objects. While this seems clever and somewhat reasonable, it also
|
|
means that every other git command (which was built on the super clean Git
|
|
data model) has to be aware of this new possibility (and deal with it
|
|
appropriately).
|
|
|
|
The point is that, while submodules are a real need, and a lot of work has
|
|
gone into making them work decently, they are essentially a kludge to the Git
|
|
model, and it is quite understandable why they haven't worked out as well as
|
|
people would expect.
|
|
|
|
NOTE: Submodules I<are> getting better with each release of Git, but it's
|
|
still an endless catch up game.
|
|
|
|
=head2 Git Subtrees
|
|
|
|
One day, someone decided to think different. Instead of pointing to external
|
|
repos, why not just include them into the main repo (but also allow them to be
|
|
pulled and pushed separately as needed)?
|
|
|
|
At first this may feel like a wasteful approach. Why keep other repos
|
|
physically inside your main one? But if you think about it abstractly, what's
|
|
the difference? You want your users and collaborators to have all this code
|
|
because your project needs it. So why worry about how it happens? In the end,
|
|
the choice is yours, but I've grown very comfortable with this concept and
|
|
I'll try to justify it well. I should note that the first paragraph of the
|
|
C<submodule> doc suggests considering this alternative.
|
|
|
|
The big win here is that you can do this using the existing git model. Nothing
|
|
new is added. You are just adding commits to a history. You can do it
|
|
different on every branch. You can merge branches sensibly.
|
|
|
|
The git-subtree command seems to have been inspired by Git's subtree merge
|
|
strategy, which it uses internally, and possibly got its name from. A subtree
|
|
merge allows you to take a completely separate Git history and make it be a
|
|
subdirectory of your repo.
|
|
|
|
Adding a subtree was the easy part. All that needed to be done after that was
|
|
to figure out a way to pull upstream changes and push local ones back
|
|
upstream. And that's what the C<git-subtree> command does.
|
|
|
|
So what's the problem with git-subtree then?
|
|
|
|
Well unfortunately, it drops a few balls. The main problems come down to an
|
|
overly complicated commandline UX, poor collaborator awareness, and a fragile
|
|
and messy implementation.
|
|
|
|
Good:
|
|
|
|
=over
|
|
|
|
=item * Use an external repo in a dedicated subdir of your project.
|
|
|
|
=item * Pin the external repo to a specific commit.
|
|
|
|
=item * Users get everything with a normal clone command.
|
|
|
|
=item * Users don't need to know that subtrees are involved.
|
|
|
|
=item * Can use different submodules/commits per main project branch.
|
|
|
|
=item * Users don't need the subtree command. Only owners and collaborators.
|
|
|
|
=back
|
|
|
|
Bad:
|
|
|
|
=over
|
|
|
|
=item * The remote url and branch info is not saved (except in the history).
|
|
|
|
=item * Owners and collaborators have to enter the remote for every command.
|
|
|
|
=item * Collaborators aren't made aware that subtrees are involved.
|
|
|
|
=item * Pulled history is not squashed by default.
|
|
|
|
=item * Creates a messy historical view. (See below)
|
|
|
|
=item * Bash code is complicated.
|
|
|
|
=item * Only one test file. Currently is failing.
|
|
|
|
=back
|
|
|
|
As you can see, subtree makes quite a few things better, but after trying it
|
|
for a while, the experience was more annoying than submodules. For example,
|
|
consider this usage:
|
|
|
|
$ git subtree add --squash --prefix=foo git@github.com:my/thing mybranch
|
|
# weeks go by…
|
|
$ git subtree pull --squash --prefix=foo git@github.com:my/thing mybranch
|
|
# time to push local subtree changes back upstream
|
|
$ git subtree push --prefix=foo git@github.com:my/thing mybranch
|
|
|
|
The first thing you notice is the overly verbose syntax. It's justified in the
|
|
first command, but in the other 2 commands I really don't want to have to
|
|
remember what the remote and branch are that I'm using.
|
|
|
|
Moreover, my collaborators have no idea that subtrees are involved, let alone
|
|
where they came from.
|
|
|
|
Consider the equivalent subrepo commands:
|
|
|
|
$ git subrepo clone git@github.com:my/thing foo -b mybranch
|
|
$ git subrepo pull foo
|
|
$ git subrepo push foo
|
|
|
|
Collaborators see a file called 'foo/.gitrepo', and know that the subdir is a
|
|
subrepo. The file contains all the information needed by future commands
|
|
applied to that subrepo.
|
|
|
|
=head2 Git Subrepos
|
|
|
|
Now is a good time to dive into the techinical aspects of the C<subrepo>
|
|
command, but first let me explain how it came about.
|
|
|
|
As you may have surmised by now, I am the author of git-subrepo. I'd used
|
|
submodules on and off for years, and when I became aware of subtree I gave it
|
|
a try, but I quickly realized its problems. I decided maybe it could be
|
|
improved. I decided to write down my expected commandline usage and my ideals
|
|
of what it would and would not do. Then I set off to implement it. It's been a
|
|
long road, but what I ended up with was even better than what I wanted from
|
|
the start.
|
|
|
|
Let's review the Goods and Bads:
|
|
|
|
Good:
|
|
|
|
=over
|
|
|
|
=item * Use an external repo in a dedicated subdir of your project.
|
|
|
|
=item * Pin the external repo to a specific commit.
|
|
|
|
=item * Users get everything with a normal clone command.
|
|
|
|
=item * Users don't need to know that subrepos are involved.
|
|
|
|
=item * Can use different submodules/commits per main project branch.
|
|
|
|
=item * Meta info is kept in an obvious place.
|
|
|
|
=item * Everyone knows when a subdir is a subrepo.
|
|
|
|
=item * Commandline UX is minimal and intuitive.
|
|
|
|
=item * Pulled history is always squashed out locally.
|
|
|
|
=item * Pushed history is kept intact.
|
|
|
|
=item * Creates a clean historical view. (See below)
|
|
|
|
=item * Bash code is very simple and easy to follow.
|
|
|
|
=item * Comprehensive test suite. Currently passing on travis:
|
|
|
|
=back
|
|
|
|
=for html
|
|
<a href="https://travis-ci.org/ingydotnet/git-subrepo"><img src="https://travis-ci.org/ingydotnet/git-subrepo.png" alt="git-subrepo"></a>
|
|
|
|
Bad:
|
|
|
|
=over
|
|
|
|
=item * --Subrepo is very new.-- (no longer true)
|
|
|
|
=item * --Not well tested in the wild.-- (no longer true)
|
|
|
|
=back
|
|
|
|
This review may seem somewhat slanted, but I honestly am not aware of any
|
|
"bad" points that I'm not disclosing. That said, I am sure time will reveal
|
|
bugs and shortcomings. Those can usually be fixed. Hopefully the B<model> is
|
|
correct, because that's harder to fix down the road.
|
|
|
|
OK. So how does it all work?
|
|
|
|
There are 3 main commands: cloneI<pull>push. Let's start with the clone
|
|
command. This is the easiest part. You give it a remote url, possibly a new
|
|
subdir to put it, and possibly a remote branch to use. I say possibly, because
|
|
the command can guess the subdir name (just like the git-clone command does),
|
|
and the branch can be the upstream default branch.
|
|
|
|
Given this we do the following steps internally:
|
|
|
|
=over
|
|
|
|
=item * Fetch the remote content (for a specific refspec)
|
|
|
|
=item * Read the remote head tree into the index
|
|
|
|
=item * Checkout the index into the new subdir
|
|
|
|
=item * Create a new subrepo commit object for the subdir content
|
|
|
|
=item * Add a state file called .gitrepo to the new subrepo/subdir
|
|
|
|
=item * Amend the merge commit with this new file
|
|
|
|
=back
|
|
|
|
This process adds something like this to the top of your history:
|
|
|
|
* 9b6ddc9 git subrepo clone git@github.com:you/foo.git foo/
|
|
* 37c61a5 Previous head commit of your repo
|
|
|
|
The entire history has been squashed down into one commit, and placed on
|
|
top of your history. This is important as it keeps your history as clean
|
|
as possible. You don't need to have the subrepo history in your main
|
|
project, since it is immutably available elsewhere, and you have a pointer
|
|
to that place.
|
|
|
|
The new foo/.gitrepo file looks like this:
|
|
|
|
[subrepo]
|
|
remote = git@github.com:you/foo.git
|
|
branch = master
|
|
commit = 14c96c6931b41257b2d42b2edc67ddc659325823
|
|
parent = 37c61a5a234f5dd6f5c2aec037509f50d3a79b8f
|
|
cmdver = 0.1.0
|
|
|
|
It contains all the info needed now and later. Note that the repo url is the
|
|
generally pushable form, rather than the publically readable (L<https://…)>
|
|
form. This is the best practice. Users of your repo don't need access to this
|
|
url, because the content is already in your repo. Only you and your
|
|
collaborators need this url to pull/push in the future.
|
|
|
|
The next command is the pull command. Normally you just give it the subrepo's
|
|
subdir path (although you can change the branch with -b), and it will get the
|
|
other info from the subdir/.gitrepo file.
|
|
|
|
The pull command does these steps:
|
|
|
|
=over
|
|
|
|
=item * Fetch the upstream content
|
|
|
|
=item * Check if anything needs pulling
|
|
|
|
=item * Create a branch of local subrepo commits since last pull
|
|
|
|
=item * Rebase this branch onto the upstream commits
|
|
|
|
=item * Commit the HEAD of the rebased content
|
|
|
|
=item * Update/amend the .gitrepo file
|
|
|
|
=back
|
|
|
|
=head3 Clean History
|
|
|
|
I've talked a bit about clean history but let me show you a comparison between
|
|
subrepo and subtree. Let's run this command sequence using both methods. Note
|
|
the differences between I<both> the command syntax required, and the branch
|
|
history produced.
|
|
|
|
Subrepo first:
|
|
|
|
$ git subrepo clone git@github.com:user/abc
|
|
$ git subrepo clone git@github.com:user/def xyz
|
|
$ git subrepo pull abc
|
|
$ git subrepo pull xyz
|
|
|
|
The resulting history is:
|
|
|
|
* b1f60cc subrepo pull xyz
|
|
* 4fb0276 subrepo pull abc
|
|
* bcef2a0 subrepo clone git@github.com:user/def xyz
|
|
* bebf0db subrepo clone git@github.com:user/abc
|
|
* 64eeaa6 (origin/master, origin/HEAD) O HAI FREND
|
|
|
|
Compare that to B<subtree>. This:
|
|
|
|
$ git subtree add abc git@github.com:user/abc master
|
|
$ git subtree add xyz git@github.com:user/def master
|
|
$ git subtree pull abc git@github.com:user/abc master
|
|
$ git subtree pull xyz git@github.com:user/def master
|
|
|
|
Produces this:
|
|
|
|
* 739e45a (HEAD, master) Merge commit '5f563469d886d53e19cb908b3a64e4229f88a2d1'
|
|
|\
|
|
| * 5f56346 Squashed 'xyz/' changes from 08c7421..365409f
|
|
* | 641f5e5 Merge commit '8d88e90ce5f653ed2e7608a71b8693a2174ea62a'
|
|
|\ \
|
|
| * | 8d88e90 Squashed 'abc/' changes from 08c7421..365409f
|
|
* | | 1703ed2 Merge commit '0e091b672c4bbbbf6bc4f6694c475d127ffa21eb' as 'xyz'
|
|
|\ \ \
|
|
| | |/
|
|
| |/|
|
|
| * | 0e091b6 Squashed 'xyz/' content from commit 08c7421
|
|
| /
|
|
* | 07b77e7 Merge commit 'cd2b30a0229d931979ed4436b995875ec563faea' as 'abc'
|
|
|\ \
|
|
| |/
|
|
| * cd2b30a Squashed 'abc/' content from commit 08c7421
|
|
* 64eeaa6 (origin/master, origin/HEAD) O HAI FREND
|
|
|
|
This was from a minimal case. Subtree history (when viewed this way at least)
|
|
gets unreasonably ugly fast. Subrepo history, by contrast, always looks as
|
|
clean as shown.
|
|
|
|
The final command, push, bascially just does the pull/rebase dance above
|
|
described, and pushes the resulting history back. It does not squash the
|
|
commits made locally, because it assumed that when you changed the local
|
|
subrepo, you made messages that were intended to eventually be published
|
|
back upstream.
|
|
|
|
=head2 Conflict Resolution
|
|
|
|
The commands described above can also be done "by hand". If something fails
|
|
during a pull or push (generally in the rebasing) then the command will tell
|
|
you what to do to finish up.
|
|
|
|
You might choose to do everything by hand, and do your own merging strategies.
|
|
This is perfectly reasonable. The C<subrepo> command offers a few other helper
|
|
commands to help you get the job done:
|
|
|
|
=over
|
|
|
|
=item * C<fetch> - Fetch the upstream and create a C<< subrepo/remote/<subdir> >> ref.
|
|
|
|
=item * C<branch> - Create a branch of local subdir commits since the last pull, called C<< subrepo/<subdir> >>.
|
|
|
|
=item * C<commit> - Commit a merged branch's HEAD back into your repo.
|
|
|
|
=item * C<status> - Show lots of useful info about the current state of the subrepos.
|
|
|
|
=item * C<clean> - Remove branches, ref and remotes created by subrepo commands.
|
|
|
|
=item * C<help> - Read the complete documentation!
|
|
|
|
=back
|
|
|
|
=head2 Conclusion
|
|
|
|
Hopefully by now, you see that submodules are a painful choice with a dubious
|
|
future, and that subtree, while a solid idea has many usage issues.
|
|
|
|
Give C<subrepo> a try. It's painless, easily revertable and just might be what
|
|
the doctor ordered.
|
|
|
|
=head2 Reference Links
|
|
|
|
=over
|
|
|
|
=item * L<http://longair.net/blog/2010/06/02/git-submodules-explained/>
|
|
|
|
=item * L<http://blogs.atlassian.com/2013/05/alternatives-to-git-submodule-git-subtree/>
|
|
|
|
=back
|
|
|
|
=cut
|