`Store::tree_builder()` returns a `TreeBuilder`. Almost all callers
should be using the `MergedTreeBuilder` these days. This patch
therefore removes `tree_builder()` to reduce the risk of accidentally
using it.
In 1b1edc7a90, I missed the importance of this comment:
```
// Whenever we add an entry to `self.pending_trees`, we also add an Ok() entry
// to `self.items`.
```
The `self.items` entry was there to make sure that we wait for the
pending tree to be polled to completion, thus resulting in its entries
getting added to `self.items`. After my commit, we no longer always
add an entry to `items`, which meant that we can end up emitting
entries from a parent tree before entries in a child tree, such as
`foo/baz` before `foo/bar/qux` even though `baz` comes after `bar`.
This patch fixes the bug by instead checking in `self.pending_trees`
that there are no directories that we need to emit first. Thanks to
@yuja for the suggestion to do it this way instead.
The next patch will update the tests to catch regressions.
The `TestBackend` methods currently return their data immediately (on
the first poll), which means that if multiple futures are created and
then they're polled "concurrently", they will always return their data
in the order they're being polled. That leads to poor testing of
algortihms that poll futures concurrently, such as `TreeDiffStream`.
This patch makes `TestBackend` spawn async work to run in a tokio
runtime instead. That's enough to show a bug I introduced with my
recent refactoring of `TreeDiffStream`, except that it's also covered
up by the caching we do in `Store`. I'll fix the bug and update tests
to work around the caching next.
This slows down the jj-lib tests from 2.8 s to 3.1 s. I don't think
that matter much, given that the jj-cli tests takes > 30 s.
I tried to add a small `tokio::time::sleep()` (random up to 5 ms) but
that slowed down the property-based tests of the diff editor very
significantly (took over a minute). Maybe we could have two different
kinds of test backend or maybe make the sleep configurable in some
way. We can improve that later. The async-ness added in this patch is
sufficient for catching the diff-stream bug.
It should genenerally be better to prioritize polling trees in the
order we're going to emit their entries. For example, if we have
pending trees `zzz/` and `dir/aaa/`, it's better to poll the latter
even though we inserted the former first.
This also prepares for fixing a bug related to the order we emit. We
will then want to look up in `pending_trees` by key found in `items`.
If we add changed-files index, MutableIndex::add_commit() will have to update
both MutableIndexSegment of commits and MutableChangedFileIndexSegment or
something. It won't make much sense to add high-level .add_commit() function to
MutableIndexSegment.
It's surprising that a symbol expression may be resolved to multiple revisions,
and that's one of the reason we require all: modifier in some places. Let's make
a symbol resolution fail in that case so we can deprecate the all: syntax.
The new error hints are a bit less informative, but I don't want to implement
ad-hoc formatting for resolve_some_revsets_default_single(). The user will have
to review the graph anyway in order to resolve divergence/conflicts.
Closes#5632
This reduces the overhead when loading a Tree object. I was doing an experiment
on indexing changed paths per commit, and noticed that backend::Tree::set() had
significant cost comparable to zlib decompression of Git tree objects.
This makes it clearer that the base non-conflicting tree data is the same. Since
there should be no duplicated entries, we don't need to remove "absent" entry
from the tree data.
git_repo.shallow_commits() had measurable cost when reindexing. I don't think
read_commit() should strip off parents matching the shallow roots, but we'll
need some caching to guarantee stable outputs.
This introduces an async version of `get_root_tree()` which reads
trees that are part of a conflict concurrently. I haven't noticed a
need for that at Google or elsewhere but it could be significant.
I updated the only caller I could find that was already in an async
context.
For most callers, the special sorting of directories before paths for
directory->file transitions is not needed. This patch changes
`diff_stream()` to not do that, and instead adds a new method
specifically for that behavior. Only `local_working_copy` uses it.
This refactors `MergedTree::diff_stream()` so `TreeDiffIterator` and
`TreeDiffStreamImpl` no longer handle directory->file transitions
specially. That's instead handled by a stream adapter afterwards. To
allow the adapter to handle such transitions, the inner streams now
include tree entries if the other side of the diff is a non-tree. I
think this makes the code easier to follow. It also makes it easier to
add a version of `diff_stream()` that doesn't emit directories after
files. I think that's a more natural order for most use cases.
I timed this patch by running `jj diff --ignore-working-copy -s --from
v5.0 --to v6.0` in the Linux repo. I didn't see any significant
difference.
This makes `path_value()` and related functions more consistent with
how the merged tree is presented to the user in the working copy and
when showing a diff.
In this implementation, we assume that predecessor commits created by old jj are
reachable from at least one of the historical views. However, there are a couple
of commands which create transitive predecessors. For example, "jj squash" into
grandparent will rebase a rewritten source, so the pre-rebase source commit
won't be visible to any views. To work around the problem, all immediate
predecessors of historically visible commits are also preserved.
Note that this change should be considered forward-incompatible change. The
stored commits may have unreachable predecessors once we run "jj op abandon &&
jj util gc".
WalkPredecessors::flush_commits() doesn't need to guard against unreachable
commits. I was wondering whether values (or old ids) of op.commit_predecessors
map should be preserved, and I decided to keep both keys and values. It's nice
that we can get rid of index.has_id() calls when we drop support for legacy
commit.predecessors.
Since we have "reverse hex" functions, it's easy to implement the same set of
functions for "forward hex". I believe our implementation is slower than
highly-optimized versions provided by e.g. faster-hex, but we don't use hex
encoding/decoding where the performance matters.
Basically, these functions work in the same way as bookmarks()/tags(). They
restrict the namespace to search the specified symbol, and unmatched symbol
isn't an error. One major difference is that ambiguous prefix triggers an error.
That's because ambiguous prefix should logically select all matching entries,
whereas the underlying functions don't provide this behavior. It's also unclear
whether we would want to get all matching commits by commit_id(prefix:'').
#5632
These functions will be called from change_id/commit_id() predicates, which will
decode hex strings while parsing the revset expression.
AmbiguousCommit/ChangeIdPrefix errors now include lowercase prefix strings
instead of the user-specified texts, which I don't think would matter.
Since we have to implement our own hex -> u8 translation, it doesn't make much
sense to convert u8 back to hex char. Let's directly construct decoded bytes.
I'm going to reimplement to_forward_hex() as (&str) -> Vec<u8> function, and the
current HexPrefix::new() isn't compatible with odd-length decoded bytes.