Commit graph

4385 commits

Author SHA1 Message Date
Martin von Zweigbergk
94b99b2460 revset: add naive version of bisect() function
This adds a version of the `bisect()` revset that simply takes the
midpoint of the input set when iterated over. That's correct in linear
history and probably usually good enough in non-linear history too. We
can improve it later. I think it's valuable to have this building
block even in an imperfect state.
2025-07-27 13:31:26 +00:00
phoebe
7d66d4f15e ssh-signing: allow inline key without ssh- prefix 2025-07-27 11:40:20 +00:00
adamnemecek
8a26df2897 cli lib: make use of Self consistent
Mostly done via `cargo clippy --fix -- -A clippy::all -W clippy::use_self`. Added a rule to clippy rules.
2025-07-27 00:12:02 +00:00
Scott Taylor
c151610099 revset_engine: skip evaluating reachable() predicate if unnecessary
Sometimes we have already found that a part of the graph is reachable,
so it's not necessary to continue evaluating the predicate on any
connected commits.

Benchmark results on Git repo compared to previous commit:

```
reachable(@, all())                 1.8% slower
reachable(@, v2.49.0..)             1.2% slower
reachable(author(peff), all())      93.4% faster
reachable(author(peff), v2.49.0..)  52.0% faster
```
2025-07-26 13:50:05 +00:00
Scott Taylor
63d8372a2b revset_engine: evaluate X as predicate in reachable(X, Y)
Benchmark results on Git repo:

```
reachable(@, all())                 no significant change
reachable(@, v2.49.0..)             no significant change
reachable(author(peff), all())      2.0% faster
reachable(author(peff), v2.49.0..)  94.3% faster
```
2025-07-26 13:50:05 +00:00
Yuya Nishihara
1853c1e42e index: use bit set in all_heads_pos()
The performance doesn't matter here, but it seems better to save memory usage.
2025-07-26 02:24:47 +00:00
Yuya Nishihara
ea8e121149 index: use bit set to omit visited ancestors in is_ancestor_pos()
This seems good for both lucky and unlucky cases. If the ancestor is reachable,
DFS tends to finish within a fewer steps. If the ancestor is unreachable, we'll
have to visit all candidates, so the visited set can be large.

jj bench is-ancestor --ignore-working-copy -R ~/mirrors/linux
```
group                       old                     new
-----                       ---                     ---
is-ancestor-v6.0-v6.10      2.45     93.0±0.46µs    1.00     37.9±0.19µs
is-ancestor-v6.0.1-v6.10    2.90     12.7±0.09ms    1.00      4.4±0.13ms
```
2025-07-26 02:24:47 +00:00
Yuya Nishihara
21cc09b401 index: remove unneeded entry lookup from is_ancestor_pos() 2025-07-26 02:24:47 +00:00
Martin von Zweigbergk
47868ddb2f rewrite: do reads concurrently in rebase_with_empty_behavior()
This updates `rebase_with_empty_behavior()` to read old and new parent
commits concurrently, and to read their trees (possibly including
merging parents) concurrently. It seems like it might make a
difference if these objects are not already cached somewhere.
2025-07-25 17:58:18 +00:00
Martin von Zweigbergk
c97a3df328 rewrite: make rebase_with_empty_behavior() async 2025-07-25 17:58:18 +00:00
Martin von Zweigbergk
e6e284dc01 commit: add function for reading all parents concurrently
This is just general work towards making code async. It's not
something I have a reason to believe makes a significant difference.
2025-07-25 17:58:18 +00:00
Martin von Zweigbergk
48ac6178ab rewrite: make merge_commit_trees() async
Just another step towards making our code base async.  This
unfortunately results in more `block_on()` calls than we had before.
2025-07-25 17:58:18 +00:00
Martin von Zweigbergk
8022a49d9b merged_tree: make resolve() and merge() async 2025-07-25 17:58:18 +00:00
Josh Steadmon
f584e4da68 merged_tree: fix clippy "needless_lifetimes" lint 2025-07-25 15:40:27 +00:00
Yuya Nishihara
3269c63bdf str_util: add is_match_bytes(), use it in diff_contains() revset
We don't have an intuitive way to search for non-UTF-8 strings with
diff_contains(), but this allows the user to search for UTF-8 patterns inside
files of arbitrary encoding (such as text logs.)

We could instead make .is_match() accept AsRef<[u8]>, but I think an explicit
bytes method is better. haystack is usually a string.
2025-07-25 10:44:34 +00:00
Yuya Nishihara
a8facad375 str_util: use bytes::Regex internally
We don't have a strong reason to disable unicode-incompatible patterns like
"(?-u)." This change will help fix bytes handling in diff_contains() revset.

According to the doc, the performance is on par with the unicode Regex.

https://docs.rs/regex/latest/regex/bytes/index.html#performance
2025-07-25 10:44:34 +00:00
Yuya Nishihara
ed931996d0 fileset: switch to globset
Unlike string patterns, backslash-escape syntax isn't forcibly enabled. Fileset
globs are constructed from platform-native path inputs.
2025-07-25 08:22:35 +00:00
Yuya Nishihara
182daa1dfa str_util: switch to globset
Since we already have globset in transitive dependencies, this change helps
reduce the amount of dependencies. Another reason is that globset provides a
function to convert glob to regex. This is nice because we use globs to match
against strings or internal repository paths instead of platform-native paths.

The GlobPattern wrapper is boxed because globset::Glob type is relatively big.
2025-07-25 08:22:35 +00:00
Yuya Nishihara
0c0582840f index: do not pre-allocate whole bit-set buffer
In mid-size repositories, it can be costly to allocate zeroed buffer for the
entire history, so let's make the buffer grow by 4kB. I also tried something
like Vec<Vec<u64>>, but it wasn't as fast as the current version probably
because of the added indirection. Vec<[u64; PAGE_SIZE]> was good, but wasn't so
different than Vec<u64>.

- jj-0: HashSet
- jj-1: BitSet, pre-allocated
- jj-2: BitSet, lazy (this patch)

mid-size repo;
```
% hyperfine --sort command --warmup 3 --runs 10 -L bin jj-0,jj-1,jj-2 \
  'target/release-with-debug/{bin} -R ~/mirrors/linux --ignore-working-copy \
  log -r"tags(v6)|tags(v5)"'
Benchmark 1: target/release-with-debug/jj-0 -R ~/mirrors/linux --ignore-working-copy log -r"tags(v6)|tags(v5)"
  Time (mean ± σ):      3.359 s ±  0.086 s    [User: 2.891 s, System: 0.467 s]
  Range (min … max):    3.232 s …  3.460 s    10 runs

Benchmark 2: target/release-with-debug/jj-1 -R ~/mirrors/linux --ignore-working-copy log -r"tags(v6)|tags(v5)"
  Time (mean ± σ):      4.097 s ±  0.075 s    [User: 3.609 s, System: 0.487 s]
  Range (min … max):    4.027 s …  4.288 s    10 runs

Benchmark 3: target/release-with-debug/jj-2 -R ~/mirrors/linux --ignore-working-copy log -r"tags(v6)|tags(v5)"
  Time (mean ± σ):      1.678 s ±  0.023 s    [User: 1.224 s, System: 0.452 s]
  Range (min … max):    1.656 s …  1.739 s    10 runs

Relative speed comparison
        2.00 ±  0.06  target/release-with-debug/jj-0 -R ~/mirrors/linux --ignore-working-copy log -r"tags(v6)|tags(v5)"
        2.44 ±  0.06  target/release-with-debug/jj-1 -R ~/mirrors/linux --ignore-working-copy log -r"tags(v6)|tags(v5)"
        1.00          target/release-with-debug/jj-2 -R ~/mirrors/linux --ignore-working-copy log -r"tags(v6)|tags(v5)"
```

small repo:
```
% hyperfine --sort command --warmup 3 --runs 10 -L bin jj-0,jj-1,jj-2 \
  'target/release-with-debug/{bin} -R ~/mirrors/git --ignore-working-copy \
  log -r"tags()"'
Benchmark 1: target/release-with-debug/jj-0 -R ~/mirrors/git --ignore-working-copy log -r"tags()"
  Time (mean ± σ):      2.606 s ±  0.069 s    [User: 2.390 s, System: 0.216 s]
  Range (min … max):    2.480 s …  2.669 s    10 runs

Benchmark 2: target/release-with-debug/jj-1 -R ~/mirrors/git --ignore-working-copy log -r"tags()"
  Time (mean ± σ):      1.304 s ±  0.022 s    [User: 1.100 s, System: 0.203 s]
  Range (min … max):    1.274 s …  1.327 s    10 runs

Benchmark 3: target/release-with-debug/jj-2 -R ~/mirrors/git --ignore-working-copy log -r"tags()"
  Time (mean ± σ):      1.452 s ±  0.102 s    [User: 1.259 s, System: 0.193 s]
  Range (min … max):    1.327 s …  1.575 s    10 runs

Relative speed comparison
        2.00 ±  0.06  target/release-with-debug/jj-0 -R ~/mirrors/git --ignore-working-copy log -r"tags()"
        1.00          target/release-with-debug/jj-1 -R ~/mirrors/git --ignore-working-copy log -r"tags()"
        1.11 ±  0.08  target/release-with-debug/jj-2 -R ~/mirrors/git --ignore-working-copy log -r"tags()"
```
2025-07-25 00:54:05 +00:00
Yuya Nishihara
bbaaad0d1f index: store bit set data in reverse position order
This will help grow the buffer on demand.
2025-07-25 00:54:05 +00:00
Yuya Nishihara
76016ae4e3 index: reuse input edges Vec in remove_transitive_edges()
Extra allocation cost wouldn't matter, but the new code looks simpler.
2025-07-25 00:39:56 +00:00
Yuya Nishihara
512ff14ae3 index: use bit set to deduplicate/remove transitive edges
With experimental changed-path index, I noticed "jj log PATH" spends a fair
amount of time for testing known/unwanted ancestor edges. Allocating BitSet
without a known lower bounds can be wasteful, but it's still faster than using
HashSet or BTreeSet. I think we can split BitSet data into e.g. 4kB chunks to
mitigate the initial allocation cost.

```
% hyperfine --sort command --warmup 3 --runs 5 -L bin jj-0,jj-1 \
  'target/release-with-debug/{bin} -R ~/mirrors/git --ignore-working-copy log -r"tags()"'
Benchmark 1: target/release-with-debug/jj-0 -R ~/mirrors/git --ignore-working-copy log -r"tags()"
  Time (mean ± σ):      2.709 s ±  0.035 s    [User: 2.494 s, System: 0.215 s]
  Range (min … max):    2.670 s …  2.747 s    5 runs

Benchmark 2: target/release-with-debug/jj-1 -R ~/mirrors/git --ignore-working-copy log -r"tags()"
  Time (mean ± σ):      1.322 s ±  0.023 s    [User: 1.121 s, System: 0.199 s]
  Range (min … max):    1.308 s …  1.363 s    5 runs

Relative speed comparison
        2.05 ±  0.05  target/release-with-debug/jj-0 -R ~/mirrors/git --ignore-working-copy log -r"tags()"
        1.00          target/release-with-debug/jj-1 -R ~/mirrors/git --ignore-working-copy log -r"tags()"
```
2025-07-25 00:39:56 +00:00
Yuya Nishihara
00303356c8 index: extract basic bit set interface from AncestorsBitSet 2025-07-25 00:39:56 +00:00
Yuya Nishihara
8b35249d7d index: add helper to map between GlobalCommitPosition and bit set index
Since I'm going to add self.min/max_pos parameter, they aren't free functions.
2025-07-25 00:39:56 +00:00
Yuya Nishihara
3760fa58d3 index: extract AncestorsBitSet to new module
I'll add more primitive BitSet type there.
2025-07-25 00:39:56 +00:00
Martin von Zweigbergk
191369f06f merged_tree: make merge() take all args by value
This avoids unnecessary cloning. Most of the callers don't need copies
of the trees anyway.
2025-07-24 21:28:40 +00:00
Martin von Zweigbergk
f7e4264861 merged_tree: make resolve() take self by value
This avoids cloning any trees when the conflict is trivial. I think it
will also help when making `merge_trees()` concurrent and
non-recursive.
2025-07-24 21:28:40 +00:00
Jade Lovelace
1acd2a04f8 str_util: rename "matches" to "is_match"
This makes it clear that it doesn't give a list of matches.
2025-07-24 04:57:58 +00:00
Daniel Luz
f135aa3dec annotate: add iterator that tracks line numbers
This is useful for tools that wish to show the exact place where a
specific line was introduced, like gitk's "Show origin of this line".

Fixes #6809.
2025-07-22 20:29:05 +00:00
Daniel Luz
4b73df378b annotate: settle on calling the starting commit/file as "starting"
"original" is a term often associated with the commit that introduced a
certain line (hence its origin). To avoid any confusion, any variable
that relates to the starting point of the annotation process is changed
to consistently use "starting" instead.
2025-07-22 20:29:05 +00:00
Daniel Luz
f938e6b8f0 annotate: point to commit outside of domain when origin cannot be tracked
As a consequence, in case of error, rather than pointing at the last
commit where this line was seen, it points at the first commit where
the annotation process should continue if the domain were expanded.

This is better aligned with what `git blame` returns.
2025-07-22 20:29:05 +00:00
Kaiyi Li
9d817b6491 config: move fsmonitor settings out of core 2025-07-21 03:38:34 +00:00
Yuya Nishihara
e725c0e8e2 index: pass index to DefaultMutableIndex::incremental() instead of segment
This seems better since DefaultMutableIndex isn't an IndexSegment type.
2025-07-21 03:00:23 +00:00
Yuya Nishihara
9f951f6239 index: construct DefaultReadonlyIndex by load function
For the same reason as the previous patch. DefaultReadonlyIndex will hold both
commits and changed-paths index segments.
2025-07-21 03:00:23 +00:00
Yuya Nishihara
b4a9477271 index: make .save_mutable_index() return DefaultReadonlyIndex
I'll add changed-paths index segments, and it would be tedious to pass (commits,
changed_paths) segments pairs around. Since this function receives
DefaultMutableIndex, it makes sense that the return value is an Index, not an
IndexSegment.
2025-07-21 03:00:23 +00:00
Yuya Nishihara
561b9950e7 index: keep commit segment file name in hex-decoded form
I'm going to change the operation link/association file to store structured data
so that we can add a separate changed-paths index file. I think it makes sense
to use raw bytes there. It's also nice that the segment file id is typed.
2025-07-21 03:00:23 +00:00
Yuya Nishihara
e8bce6f14b index: rename remaining "commit" index types as prep for changed-paths segments
- IndexPosition is renamed to GlobalCommitPosition because we have
  LocalCommitPosition, and GlobalCommitPosition is no longer a public type.
- Private types such as ParentIndexPosition aren't renamed. I'll probably split
  commit index modules instead.
- ReadonlyIndexLoadError isn't renamed because I'm not sure if we'll add a new
  error type dedicated for new changed-paths index.
- IndexLevelStats is renamed, but IndexStats isn't because I'll probably add
  stats of changed-paths index.
2025-07-21 02:45:58 +00:00
Yuya Nishihara
2c96aaaba1 index: rename CompositeIndex to CompositeCommitIndex, pass around wrapper type
Since revset engine will use changed-paths index to evaluate files() predicate,
we need to pass &CompositeIndex wrapper around instead of &CompositeCommitIndex.

Index is now implemented for non-reference CompositeIndex type. This works
because the CompositeIndex type is now Sized, so the reference type can be
converted to a trait object.
2025-07-21 02:45:58 +00:00
Yuya Nishihara
ccd4e8cd07 index: introduce owned composite index type
This is ugly, but avoids CompositeIndex<'a> lifetime. If CompositeIndex had
a reference, trait bounds in the revset engine would become quite messy.

FWIW, I think a composite index type could be a pair of (Vec<Readonly>,
Option<Mutable>). I'm not going to reorganize the commit index at the moment,
but I assume the changed-paths index will be structured in that way.
Changed-paths index won't have data dependency between segments, and it will
have a commit offset field which only applies to the first segment. So it will
probably make sense to manage segments as an array, not as a linked list.

    commit offset
    segment #0:
      for each commit
        changed paths (interned)
      sstable of paths
    segment #1:
      for each commit
        changed paths (interned)
      sstable of paths
    ...
2025-07-21 02:45:58 +00:00
Yuya Nishihara
78c5d41b46 index: make both Arc<Readonly/MutableIndexSegment implement Clone/Debug
This will help wrap Arc<ReadonlyIndexSegment> and MutableIndexSegment in an
enum. We won't clone MutableIndexSegment, but the Clone impl should be harmless.
2025-07-21 02:45:58 +00:00
Yuya Nishihara
ea7488c2de index: move Index impls of CompositeIndex to non-trait methods
I'll insert a wrapper type that holds the "commit" index and a new changed-paths
index. The Index trait will be implemented on that wrapper.

It's unclear whether the non-trait methods should be pub or pub(super), but that
shouldn't matter since the CompositeIndex type isn't public. I just copied
visibility of similar methods.
2025-07-21 02:45:58 +00:00
Martin von Zweigbergk
f27c55ab15 cleanup: switch to async closures in trivial cases
This replaces `|| async` by `async ||` since the latter is presumably
more idiomtic.

There may be other places where we can now use async closures but I
don't remember where they are and I don't know of a good way to
identify them.
2025-07-20 23:52:04 +00:00
Martin von Zweigbergk
721daef0b4 store: inline tree_builder() function to callers
`Store::tree_builder()` returns a `TreeBuilder`. Almost all callers
should be using the `MergedTreeBuilder` these days. This patch
therefore removes `tree_builder()` to reduce the risk of accidentally
using it.
2025-07-18 21:36:13 +00:00
Kaiyi Li
f1f1556731 local working copy: add support for EOL conversion 2025-07-17 15:36:28 +00:00
Kaiyi Li
74fb5a6096 working copy: pass UserSettings to WorkingCopyFactory
... so that later `TreeState` can query the EOL settings on
construction.
2025-07-17 15:36:28 +00:00
Martin von Zweigbergk
e982db8fd0 test_merged_tree: clear store caches before calling diff_stream()
This catches the bug introduced in 1b1edc7a90 (fixed in patch just
before this one).
2025-07-14 16:09:41 +00:00
Martin von Zweigbergk
25723d6956 merged_tree: finish polling tree before emitting path
In 1b1edc7a90, I missed the importance of this comment:

```
// Whenever we add an entry to `self.pending_trees`, we also add an Ok() entry
// to `self.items`.
```

The `self.items` entry was there to make sure that we wait for the
pending tree to be polled to completion, thus resulting in its entries
getting added to `self.items`. After my commit, we no longer always
add an entry to `items`, which meant that we can end up emitting
entries from a parent tree before entries in a child tree, such as
`foo/baz` before `foo/bar/qux` even though `baz` comes after `bar`.

This patch fixes the bug by instead checking in `self.pending_trees`
that there are no directories that we need to emit first. Thanks to
@yuja for the suggestion to do it this way instead.

The next patch will update the tests to catch regressions.
2025-07-14 16:09:41 +00:00
Martin von Zweigbergk
5d35eadd4e tests: make TestBackend async for more realistic testing
The `TestBackend` methods currently return their data immediately (on
the first poll), which means that if multiple futures are created and
then they're polled "concurrently", they will always return their data
in the order they're being polled. That leads to poor testing of
algortihms that poll futures concurrently, such as `TreeDiffStream`.

This patch makes `TestBackend` spawn async work to run in a tokio
runtime instead. That's enough to show a bug I introduced with my
recent refactoring of `TreeDiffStream`, except that it's also covered
up by the caching we do in `Store`. I'll fix the bug and update tests
to work around the caching next.

This slows down the jj-lib tests from 2.8 s to 3.1 s. I don't think
that matter much, given that the jj-cli tests takes > 30 s.

I tried to add a small `tokio::time::sleep()` (random up to 5 ms) but
that slowed down the property-based tests of the diff editor very
significantly (took over a minute). Maybe we could have two different
kinds of test backend or maybe make the sleep configurable in some
way. We can improve that later. The async-ness added in this patch is
sufficient for catching the diff-stream bug.
2025-07-14 16:09:41 +00:00
Martin von Zweigbergk
1a89ac8d53 merged_tree: poll trees in the order we're going to emit them
It should genenerally be better to prioritize polling trees in the
order we're going to emit their entries. For example, if we have
pending trees `zzz/` and `dir/aaa/`, it's better to poll the latter
even though we inserted the former first.

This also prepares for fixing a bug related to the order we emit. We
will then want to look up in `pending_trees` by key found in `items`.
2025-07-14 16:09:41 +00:00
Martin von Zweigbergk
ed6fa71835 merged_tree: avoid destructuring only to construct the same values 2025-07-14 16:09:41 +00:00