After the previous commit, `MergedTree` and `MergedTreeId` are almost
identical, with the only difference being that `MergedTree` is attached
to a `Store` instance. `MergedTreeId` is also equivalent to
`Merge<TreeId>`, since it is just a wrapper around it.
In the future, `MergedTree` might contain additional metadata like
conflict labels. Therefore, I replaced `MergedTreeId` with `MergedTree`
wherever I think it would be required to pass this additional metadata,
or where the additional methods provided by `MergedTree` would be
useful. In any remaining places, I replaced it with `Merge<TreeId>`.
I also renamed some of the `tree_id()` methods to `tree_ids()` for
consistency, since now they return a merge of individual tree IDs
instead of a single "merged tree ID". Similarly, `MergedTree` no longer
has an `id()` method, since tree IDs won't fully identify a `MergedTree`
once it contains additional metadata.
This is part of a series of changes to make most methods on index traits
(i.e. `ChangeIdIndex`, `MutableIndex`, `ReadonlyIndex`, `Index`)
fallible. This will enable networked implementations of these traits,
which may produce I/O errors during operation. See #7825 for more
information.
- Introduced a few more instances of the existing anti-pattern, `TODO:
indexing error shouldn't be a "BackendError"`. We're tracking this
known issue in #7849.
- Converted `MutableRepo::merge_view` to return a `RepoLoaderError`
instead of a `BackendError`. The only caller, `MutableRepo::merge`,
already returns a `RepoLoaderError`.
- Added three "`fallible_`" iterator helpers to reduce the amount of
noise at `is_ancestor` call sites due to the method now being
fallible. Using these helpers seem to produce code that's a little
more readable than using `process_results` from itertools. One
consideration in this trade-off is that these helpers do not
themselves return iterators: if we find that we need more support for
fallible combinators mid-"chain" of iterator combinators, we might
either want to use `process_results` only in those cases, or switch to
use of `process_results` across the board (in lieu of these new
helpers).
This is part of a series of changes to make most methods on index traits
(i.e. `ChangeIdIndex`, `MutableIndex`, `ReadonlyIndex`, `Index`)
fallible. This will enable networked implementations of these traits,
which may produce I/O errors during operation. See #7825 for more
information.
This is part of a series of changes to make most methods on index traits
(i.e. `ChangeIdIndex`, `MutableIndex`, `ReadonlyIndex`, `Index`)
fallible. This will enable networked implementations of these traits,
which can produce I/O errors during operation. See #7825 for more
information.
This is part of a series of changes to make most methods on index traits
(i.e. `ChangeIdIndex`, `MutableIndex`, `ReadonlyIndex`, `Index`)
fallible. This will enable networked implementations of these traits,
which can produce I/O errors during operation. See #7825 for more
information.
Commit::is_empty() and ::parent_tree() will query index first to avoid merging
parent trees.
iirc, the Google backend has an index of changed paths, so I made the added
function return Result although the default index will never fail. I also
considered adding matcher argument, but I don't have any use case right now. I
think the Matcher API would be too abstract to construct a database query. Maybe
we should use Revset/FilesetExpression types for that purpose.
Tests of the indexed contents will be added with the revset engine integration.
We don't have a public interface to get the indexed changed paths right now.
I'm going to add changed-path index, and the operation link file will store a
list of segment files and a starting commit position. Suppose the link file is
small, we wouldn't need our own serialization format.
This patch adds new directory for proto-based operation link files. We could
reuse the existing directory, but that would make debugging a bit harder.
Using a range for this is consistent with the filter for parent count,
which also uses a `Range<u32>`. This could also be useful to implement
an `nth_parent()` revset in the future.
`gen` is now a reserved keyword; this also matches the surrounding use
of the word `generation` anyway, so IMO it's clearer.
Signed-off-by: Austin Seipp <aseipp@pobox.com>
- IndexPosition is renamed to GlobalCommitPosition because we have
LocalCommitPosition, and GlobalCommitPosition is no longer a public type.
- Private types such as ParentIndexPosition aren't renamed. I'll probably split
commit index modules instead.
- ReadonlyIndexLoadError isn't renamed because I'm not sure if we'll add a new
error type dedicated for new changed-paths index.
- IndexLevelStats is renamed, but IndexStats isn't because I'll probably add
stats of changed-paths index.
I tried to minimize this patch, but it seemed rather complicated than porting
most callers all at once. Remote management functions in git.rs are unchanged.
They'll be ported separately.
With this change, many non-template bookmark/remote name outputs should be
rendered in revset syntax.
Jujutsu's branches do not behave like Git branches, which is a major
hurdle for people adopting it from Git. They rather behave like
Mercurial's (hg) bookmarks.
We've had multiple discussions about it in the last ~1.5 years about this rename in the Discord,
where multiple people agreed that this _false_ familiarity does not help anyone. Initially we were
reluctant to do it but overtime, more and more users agreed that `bookmark` was a better for name
the current mechanism. This may be hard break for current `jj branch` users, but it will immensly
help Jujutsu's future, by defining it as our first own term. The `[experimental-moving-branches]`
config option is currently left alone, to force not another large config update for
users, since the last time this happened was when `jj log -T show` was removed, which immediately
resulted in breaking users and introduced soft deprecations.
This name change will also make it easier to introduce Topics (#3402) as _topological branches_
with a easier model.
This was mostly done via LSP, ripgrep and sed and a whole bunch of manual changes either from
me being lazy or thankfully pointed out by reviewers.
We had both `repo()` and `mut_repo()` on `Transaction` and I think it
was easy to get confused and think that the former returned a
`&ReadonlyRepo` but both of them actually return a reference to
`MutableRepo` (the latter obviously returns a mutable reference). I
hope that renaming to the more idiomatic `repo_mut()` will help
clarify.
We could instead have renamed them to `mut_repo()` and
`mut_repo_mut()` but that seemed unnecessarily long. It would better
match the `mut_repo` variables we typically use, though.
It's been more than 6 months since we added support for dynamically
selecting the working copy implementation. This patch drops support
for selecting the default implementation of that and other stores.
This helps to eliminate higher-ranked trait bounds from RevWalkRevset and
RevWalk combinators to be added. Since &CompositeIndex is now a real reference,
it can be passed to functions as index: &T.
The shortest change id prefix will become a few digits longer, but I think
that's acceptable. Entries included in the "revsets.short-prefixes" set are
unaffected.
The reachable set is calculated eagerly, but this is still faster as we no
longer need to sort the reachable entries by change id. The lazy version will
save another ~100ms in mid-size repos.
"jj log" without working copy snapshot:
```
% hyperfine --sort command --warmup 3 --runs 20 -L bin jj-0,jj-1,jj-2 \
-s "target/release-with-debug/{bin} -R ~/mirrors/linux debug reindex" \
"target/release-with-debug/{bin} -R ~/mirrors/linux \
--ignore-working-copy log -r.. -l100 --config-toml='revsets.short-prefixes=\"\"'"
Benchmark 1: target/release-with-debug/jj-0 -R ~/mirrors/linux --ignore-working-copy log -r.. -l100 --config-toml='revsets.short-prefixes=""'
Time (mean ± σ): 353.6 ms ± 11.9 ms [User: 266.7 ms, System: 87.0 ms]
Range (min … max): 329.0 ms … 365.6 ms 20 runs
Benchmark 2: target/release-with-debug/jj-1 -R ~/mirrors/linux --ignore-working-copy log -r.. -l100 --config-toml='revsets.short-prefixes=""'
Time (mean ± σ): 271.3 ms ± 9.9 ms [User: 183.8 ms, System: 87.7 ms]
Range (min … max): 250.5 ms … 282.7 ms 20 runs
Relative speed comparison
1.99 ± 0.16 target/release-with-debug/jj-0 -R ~/mirrors/linux --ignore-working-copy log -r.. -l100 --config-toml='revsets.short-prefixes=""'
1.53 ± 0.12 target/release-with-debug/jj-1 -R ~/mirrors/linux --ignore-working-copy log -r.. -l100 --config-toml='revsets.short-prefixes=""'
```
"jj status" with working copy snapshot (watchman enabled):
```
% hyperfine --sort command --warmup 3 --runs 20 -L bin jj-0,jj-1,jj-2 \
-s "target/release-with-debug/{bin} -R ~/mirrors/linux debug reindex" \
"target/release-with-debug/{bin} -R ~/mirrors/linux \
status --config-toml='revsets.short-prefixes=\"\"'"
Benchmark 1: target/release-with-debug/jj-0 -R ~/mirrors/linux status --config-toml='revsets.short-prefixes=""'
Time (mean ± σ): 396.6 ms ± 10.1 ms [User: 300.7 ms, System: 94.0 ms]
Range (min … max): 373.6 ms … 408.0 ms 20 runs
Benchmark 2: target/release-with-debug/jj-1 -R ~/mirrors/linux status --config-toml='revsets.short-prefixes=""'
Time (mean ± σ): 318.6 ms ± 12.6 ms [User: 219.1 ms, System: 94.1 ms]
Range (min … max): 294.2 ms … 333.0 ms 20 runs
Relative speed comparison
1.85 ± 0.14 target/release-with-debug/jj-0 -R ~/mirrors/linux status --config-toml='revsets.short-prefixes=""'
1.48 ± 0.12 target/release-with-debug/jj-1 -R ~/mirrors/linux status --config-toml='revsets.short-prefixes=""'
```
This basically means that the change ids are interned. We'll implement binary
search over the sorted change ids table. The table could be sorted differently
for better cache locality, but it is in lexicographical order for simplicity.
With my testing, the cost of the id lookup isn't dominant.
Unlike the parent entries, the size of the per-id overflow items isn't saved.
That's s because the number of the same-change-id commits is either 1 or many.
It doesn't make sense to allocate 8 bytes for each change id. Instead, we'll
pay extra indirection cost to determine the size.
Since IdIndex sorts the entries by using .sort_unstable_by_key(), the order of
the same-key elements is undefined. Perhaps, it's stable for short arrays, and
the test passes because of that.
I'm going to introduce breaking changes in index format. Some of them will
affect the file size, so version number or signature won't be needed. However,
I think it's safer to detect the format change as early as possible.
I have no idea if embedded version number is the best way. Because segment
files are looked up through the operation links, the version number could be
stored there and/or the "segments" directory could be versioned. If we want to
support multiple format versions and clients, it might be better to split the
tables into data chunks (e.g. graph entries, commit id table, change id table),
and add per-chunk version/type tag. I choose the per-file version just because
it's simple and would be non-controversial.
As I'm going to introduce format change pretty soon, this patch doesn't
implement data migration. The existing index files will be deleted and new
files will be created from scratch.
Planned index format changes include:
1. remove unused "flags" field
2. inline commit parents up to two
3. add sorted change ids table
Since hidden commits can be looked up by remote_branches() revset for example,
reindexing should traverse ancestors from all named refs in addition to the
visible heads.
change_id_index() is only used by Readonly/MutableRepo, so we don't need an
abstraction at Index. evaluate_revset() is somewhat similar, but the callers
rely on &dyn Repo.
We current have `Revset::change_id_index()` for creating a
`ChangeIdIndex` for a given revset. I think it will be hard to make it
performant for general revsets, especially in very large repos and
with custom index implementations, like the one we have at Google. If
we instead restrict it to including all ancestors of a set of heads, I
think it will be much easier to implement. We only use
`Revset::change_id_index()` with revsets including all visible commits
today, so we won't lose any current functionality by making it more
restricted.
I'm going to add a prefix resolution method to OpStore, but OpStore is
unrelated to the index. I think ObjectId, HexPrefix, and PrefixResolution can
be extracted to this module.