Since we have "reverse hex" functions, it's easy to implement the same set of
functions for "forward hex". I believe our implementation is slower than
highly-optimized versions provided by e.g. faster-hex, but we don't use hex
encoding/decoding where the performance matters.
This adds the proptest crate for property-based testing as well as the
proptest-state-machine crate as direct dev dependencies of jj-cli and as
dependencies of the internal testutils crate.
Within testutils, a `proptest` module provides a reference state
machine which models the working copy as a map from path to `DirEntry`.
Directories are not represented explicitly, but are implicit in the
ancestors of entries.
The possible transitions of this state machine are for now limited to
the creation of new files (including replacements of existing files
or directories) and a `Commit` operation which the SUT can use to
snapshot a reference state. Additional transitions (moving files,
modifying file contents incrementally, ...) and states (symlinks,
submodules, conflicts, ...) may be added in the future.
This reference state machine is then applied to the builtin merge-tool's
test suite:
- The initial state is always an empty root directory.
- The `Commit` operation creates `MergedTree` from the current state.
- Each step of the way, the same test logic as in the manual
`test_edit_diff_builtin*` tests is run to check that splitting off
none or all of the changes results in the left or right tree,
respectively. The "right" tree corresponds to the current state,
whereas the "left" tree refers to the last "committed" tree.
Co-authored-by: Waleed Khan <me@waleedkhan.name>
The `Backend::read_file()` method is async but it returns a `Box<dyn
Read>` and reading from that trait is blocking. That's fine with the
local Git backend but it can be slow for remote backends. For example,
our backend at Google reads file chunks 1 MiB at a time from the
server. What that means is that reading lots of small files
concurrently works fine since the whole file contents are returned by
the first `Read::read()` call (it was fetched when
`Backend::read_file()` was issued). However, when reading files that
are larger than one chunk, we end up blocking on the next
`Read::read()` call. I haven't verified that this actually is a
problem at Google, but fixing this blocking is something we should do
eventually anyway.
This patch makes `Backend::read_file()` return a `Pin<Box<dyn
AsyncRead>>` instead, so implementations can be async in the read part
too.
Since `AsyncRead` is not yet standardized, we have to choose between
the one from `futures` and the one from `tokio`. I went with the one
from `tokio`. I picked that because an earlier version of this patch
used `tokio::fs` for some reads. Then I realized that doing that means
that we have to use a tokio runtime, meaning that we can't safely keep
our existing `pollster::FutureExt::block_on()` calls. If we start
depending on tokio's specific runtime, I think we would first want to
remove all the `block_on()` calls. I'll leave that for later. I think
at this point, we could equally well use `futures::io::AsyncRead`, but
I also don't know if there's a reason to prefer that.
The gix feature "blocking-network-client" was configured in cd6141693
(cli/tests: add gitoxide helpers, 2025-02-03) most likely because
we needed to clone using gix in tests, but 071e724c1 (cli/tests:
move test_git_colocated_fetch_deleted_or_moved_bookmark to gitoxide,
2025-02-25) changed the test to clone by subprocessing out to system git
(not by using gitoxide, as a cursory read of the commit description may
indicate), meaning that we no longer need that feature.
Therefore, remove that feature.
These helpers are going to be needed to port the git2 code in the lib
tests to gitoxide. Since the cli tests already depend on testutils, this
helps with avoiding duplicating the code
Some backends, like the one we have at Google, can restrict access to
certain files. For such files, if they return a regular
`BackendError::ReadObject`, then that will terminate iteration in many
cases (e.g. when diffing or listing files). This patch adds a new
error variant for them to return instead, plus handling of such errors
in diff output and in the working copy.
In order to test the feature, I added a new commit backend that
returns the new `ReadAccessDenied` error when the caller tries to read
certain objects.
The commit backend at Google is cloud-based (and so are the other
backends); it reads and writes commits from/to a server, which stores
them in a database. That makes latency much higher than for disk-based
backends. To reduce the latency, we have a local daemon process that
caches and prefetches objects. There are still many cases where
latency is high, such as when diffing two uncached commits. We can
improve that by changing some of our (jj's) algorithms to read many
objects concurrently from the backend. In the case of tree-diffing, we
can fetch one level (depth) of the tree at a time. There are several
ways of doing that:
* Make the backend methods `async`
* Use many threads for reading from the backend
* Add backend methods for batch reading
I don't think we typically need CPU parallelism, so it's wasteful to
have hundreds of threads running in order to fetch hundreds of objects
in parallel (especially when using a synchronous backend like the Git
backend). Batching would work well for the tree-diffing case, but it's
not as composable as `async`. For example, if we wanted to fetch some
commits at the same time as we were doing a diff, it's hard to see how
to do that with batching. Using async seems like our best bet.
I didn't make the backend interface's write functions async because
writes are already async with the daemon we have at Google. That
daemon will hash the object and immediately return, and then send the
object to the server in the background. I think any cloud-based
solution will need a similar daemon process. However, we may need to
reconsider this if/when jj gets used on a server with a custom backend
that writes directly to a database (i.e. no async daemon in between).
I've tried to measure the performance impact. That's the largest
difference I've been able to measure was on `jj diff
--ignore-working-copy -s --from v5.0 --to v6.0` in the Linux repo,
which increases from 749 ms to 773 ms (3.3%). In most cases I've
tested, there's no measurable difference. I've tried diffing from the
root commit, as well as `jj --ignore-working-copy log --no-graph -r
'::v3.0 & author(torvalds)' -T 'commit_id ++ "\n"'` (to test a
commit-heavy load).
The VS Code "Better TOML" plugin (which I think most of our VS Code developers use?) doesn't support the `x.y = z` syntax at the top level, even though it's valid TOML.
This is also useful if we ever want to add additional properties in different sub-crates (although unlikely for the near future).
Summary: There's no need to go around specifying `rust-version` or `edition` or
`version` several times, now that we have a global workspace. Instead, inherit
workspace metadata from the top-level Cargo.toml file.
Signed-off-by: Austin Seipp <aseipp@pobox.com>
Change-Id: Iaf905445978ed2b3377239dcdb8a6c32
Summary: This moves all dependencies across the jj-lib and jj-cli crates into
the top-level Cargo file; with that, we can change each crate instead to just
inherit the workspace version, with the toggled features enabled, by setting
a dependency such as:
dep.workspace = true
in the relevant Cargo.toml file.
This doesn't actually change any of the build semantics (from what I can tell)
nor the lockfile, and seems to respond normally. There are more cleanups that
can follow.
Two notes:
- Dependabot seems to work fine, based on what I've seen in other repos.
- `cargo add` doesn't seem to know how to add packages to a top-level
`workspace.dependencies` field; instead you can `cargo add -p jj-cli`
and move the entries, at least.
Signed-off-by: Austin Seipp <aseipp@pobox.com>
Change-Id: I307827e5f15c0d8ea8e2a80ec793d3c7
Almost everyone calls the project "jj", and there seeems to be
consensus that we should rename the crates. I originally wanted the
crates to be called `jj` and `jj-lib`, but `jj` was already
taken. `jj-cli` is probably at least as good for it anyway.
Once we've published a 0.8.0 under the new names, we'll release 0.7.1
versions under the old names with pointers to the new crates names.
It's been about 10 weeks and 730 commits since 0.6.0, compared to
about 7 weeks and 350 commits between 0.5.0 and 0.6.0, so it's time
for a new release. There's been significant user-visible changes and
code-quality improvements. Thanks, everyone!
The `testutils` module should ideally not be part of the library
dependencies. Since they're used by the integration tests (and the CLI
tests), we need to move them to a separate crate to achieve that.