`test_init_load_non_utf8_path` and
`test_init_additional_workspace_non_utf8_path` now early-return on
strict UTF-8 filesystems because there's no way to report a test as
"skipped" at runtime.
Closes https://github.com/jj-vcs/jj/issues/8118
This adds support for tracking ignored and oversized files with `jj file track`.
Previously, `jj file track` would silently fail to track files that were ignored by
`.gitignore` or larger than `snapshot.max-new-file-size`. This commit introduces an
`--include-ignored` flag that allows users to explicitly track these files.
## Implementation
Added a `force_tracking_matcher` field to `SnapshotOptions` that overrides ignore rules
and size limits. When `--include-ignored` is specified, the file pattern matcher is
passed as `force_tracking_matcher`, allowing three checks in `FileSnapshotter` to bypass
their usual restrictions for directory ignores, file ignores, and file size limits.
## Tests
- `test_track_ignored_with_flag`: Verifies `.gitignore`d files can be tracked
- `test_track_large_file_with_flag`: Verifies oversized files can be tracked
- `test_track_ignored_directory`: Verifies ignored directories can be tracked recursively
# Checklist
If applicable:
- [ ] I have updated `CHANGELOG.md`
- [x] I have updated the documentation (`README.md`, `docs/`, `demos/`)
- [ ] I have updated the config schema (`cli/src/config-schema.json`)
- [x] I have added/updated tests to cover my changes
After the previous commit, `MergedTree` and `MergedTreeId` are almost
identical, with the only difference being that `MergedTree` is attached
to a `Store` instance. `MergedTreeId` is also equivalent to
`Merge<TreeId>`, since it is just a wrapper around it.
In the future, `MergedTree` might contain additional metadata like
conflict labels. Therefore, I replaced `MergedTreeId` with `MergedTree`
wherever I think it would be required to pass this additional metadata,
or where the additional methods provided by `MergedTree` would be
useful. In any remaining places, I replaced it with `Merge<TreeId>`.
I also renamed some of the `tree_id()` methods to `tree_ids()` for
consistency, since now they return a merge of individual tree IDs
instead of a single "merged tree ID". Similarly, `MergedTree` no longer
has an `id()` method, since tree IDs won't fully identify a `MergedTree`
once it contains additional metadata.
Currently, creating a `MergedTree` requires reading all of its root
trees from the store. However, this is often not actually required. For
instance, if the only reason to read the trees is to call
`MergedTree::merge`, and the merge is trivial, then there was no need to
read the trees. Changing `MergedTree` to only require a `Merge<TreeId>`
instead of a `Merge<Tree>` will make it possible to avoid reading trees
unnecessarily in these cases.
One benefit of this approach is that `Commit::tree` no longer requires
reading from the store, so it can be made synchronous and infallible,
which simplifies a lot of code.
A sibling team of my team sometimes runs into panics caused by cycles
in the commit graph. This patch removes the panics from `dag_walk` by
having all the callers pass in a `cycle_fn`. For now, the callers
panic instead.
I'm planning to try to add conflict labels to `MergedTree` and
`MergedTreeId`, and it will be easier to add them if both are structs
with similar methods. Since we don't support reading/writing legacy
conflicts anymore (as far as I'm aware), I think it should be safe to
delete the `MergedTreeId::Legacy` variant now.
`set_env` is, for various reasons, fundamentally unsafe on approximately
~all modern unicies, and seems like it will never ever be fixed. The
long and short of this is that it will result in segfaults or UB. Rust
2024 therefore marks this function (correctly) as `unsafe`.
The correct solution for 98% of use cases is just to use `envp` during
calls to `execv`, but for our simple cases here of making Git hermetic
there shouldn't be an issue, and a larger refactoring would be needed
for an alternative anyway.
Signed-off-by: Austin Seipp <aseipp@pobox.com>
`Store::tree_builder()` returns a `TreeBuilder`. Almost all callers
should be using the `MergedTreeBuilder` these days. This patch
therefore removes `tree_builder()` to reduce the risk of accidentally
using it.
The `TestBackend` methods currently return their data immediately (on
the first poll), which means that if multiple futures are created and
then they're polled "concurrently", they will always return their data
in the order they're being polled. That leads to poor testing of
algortihms that poll futures concurrently, such as `TreeDiffStream`.
This patch makes `TestBackend` spawn async work to run in a tokio
runtime instead. That's enough to show a bug I introduced with my
recent refactoring of `TreeDiffStream`, except that it's also covered
up by the caching we do in `Store`. I'll fix the bug and update tests
to work around the caching next.
This slows down the jj-lib tests from 2.8 s to 3.1 s. I don't think
that matter much, given that the jj-cli tests takes > 30 s.
I tried to add a small `tokio::time::sleep()` (random up to 5 ms) but
that slowed down the property-based tests of the diff editor very
significantly (took over a minute). Maybe we could have two different
kinds of test backend or maybe make the sleep configurable in some
way. We can improve that later. The async-ness added in this patch is
sufficient for catching the diff-stream bug.
Since we have "reverse hex" functions, it's easy to implement the same set of
functions for "forward hex". I believe our implementation is slower than
highly-optimized versions provided by e.g. faster-hex, but we don't use hex
encoding/decoding where the performance matters.
This turns off the `clippy::cloned_ref_to_slice_refs` lint in some tests
and fixes it in others, for Rust 1.89+. This seems to make `cargo clippy
--workspace --all-targets --all-features` work in stable, beta, and
nightly (1.89).
This depends on the `rustversion` crate. Other than that, it's based on
Austin's https://github.com/jj-vcs/jj/pull/6705.
Co-authored-by: Austin Seipp <aseipp@pobox.com>
This adds the proptest crate for property-based testing as well as the
proptest-state-machine crate as direct dev dependencies of jj-cli and as
dependencies of the internal testutils crate.
Within testutils, a `proptest` module provides a reference state
machine which models the working copy as a map from path to `DirEntry`.
Directories are not represented explicitly, but are implicit in the
ancestors of entries.
The possible transitions of this state machine are for now limited to
the creation of new files (including replacements of existing files
or directories) and a `Commit` operation which the SUT can use to
snapshot a reference state. Additional transitions (moving files,
modifying file contents incrementally, ...) and states (symlinks,
submodules, conflicts, ...) may be added in the future.
This reference state machine is then applied to the builtin merge-tool's
test suite:
- The initial state is always an empty root directory.
- The `Commit` operation creates `MergedTree` from the current state.
- Each step of the way, the same test logic as in the manual
`test_edit_diff_builtin*` tests is run to check that splitting off
none or all of the changes results in the left or right tree,
respectively. The "right" tree corresponds to the current state,
whereas the "left" tree refers to the last "committed" tree.
Co-authored-by: Waleed Khan <me@waleedkhan.name>
This macro in the style of `assert_eq!` compares two trees based
on their `MergedTreeId`s. In case they do not compare equal, the
corresponding trees are dumped in the panic message.
Like `assert_eq!`, the macro accepts a custom format string which will
be included in the panic message.
Update `gix` to 0.72.1, and adapt to its breaking changes.
1. The signature of `gix::reference::iter::Platform::prefixed` changed
in a way that seems to confuse Rust compiler (and does confuse me).
2. `git_object::Tree::EntryMode` API changed; `entry.mode()` now has a
`value()` method.
3. Most significantly, the meaning and API of `gix::actor::SignatureRef`
changed.
## Details about `gix::actor::SignatureRef`
The API for `gix::actor::Signature` and `gix::actor::SignatureRef`
changedd. The latter now contains an unparsed string time field, while
the former still contains a parsed time. So, the conversions between
`gix::actor::SignatureRef` and either `gix::actor::Signature` or jj's
`Signature` types can now fail.
We use the epoch for the time if the timestamp is unreadable, like gix
did before.
Cc: https://github.com/GitoxideLabs/gitoxide/pull/1935,
https://github.com/GitoxideLabs/gitoxide/pull/2038
This patch adds a `TreeValue::File::copy_id` field. The copy ids are
always empty for now. I preserved the copy id where it was easy to do
so, plus in a few non-trivial cases. In other places, however, I made
the code use a new copy id. I added a `CopyId::placeholder()`
function for creating a new copy id where we need one. We should
eventually fix all callers to either preserve an existing copy or to
generate a new one.
This patch implements the methods only in the test backend. That
should enough for us to start implementing diffing and merging on top,
including tests for that functionality. Support in the Git backend can
come once we've seen that the model works.
The original form of `create_tree()` is limited to creating (valid
UTF-8) text files but cannot create binary files, executable
files, or symlinks. Dedicated helpers like `write_executable_file()` or
`write_symlink()` partially compensated for this, but required manually
assembling the tree in the test code.
This commit introduces `TestTreeBuilder` which provides an API to
successively add entries to a tree which can represent all of the above.
`TestTreeBuilder` can then create either a single `Tree`, or a resolved
`MergedTree`.
In addition to using `TestTreeBuilder` directly, `create_tree_with()`
and `create_single_tree_with()` accept a closure which receives a
`TestTreeBuilder`. This allows test code to quickly describe the tree
without requiring the a named builder at caller scope. Riffing off
the familiar function names should help in discovering the new builder
facilities. However, it is completely possible to use `TestTreeBuilder`
directly, if preferred.
This is mostly for consistency with `read_file()` at this point. I'm
not sure if we need this for Google in the near future.
For now, I wrapped the file-reading in `local_working_copy.rs` in a
new `BlockingAsyncReader`. We should switch to using async I/O in the
future.
The `Backend::read_file()` method is async but it returns a `Box<dyn
Read>` and reading from that trait is blocking. That's fine with the
local Git backend but it can be slow for remote backends. For example,
our backend at Google reads file chunks 1 MiB at a time from the
server. What that means is that reading lots of small files
concurrently works fine since the whole file contents are returned by
the first `Read::read()` call (it was fetched when
`Backend::read_file()` was issued). However, when reading files that
are larger than one chunk, we end up blocking on the next
`Read::read()` call. I haven't verified that this actually is a
problem at Google, but fixing this blocking is something we should do
eventually anyway.
This patch makes `Backend::read_file()` return a `Pin<Box<dyn
AsyncRead>>` instead, so implementations can be async in the read part
too.
Since `AsyncRead` is not yet standardized, we have to choose between
the one from `futures` and the one from `tokio`. I went with the one
from `tokio`. I picked that because an earlier version of this patch
used `tokio::fs` for some reads. Then I realized that doing that means
that we have to use a tokio runtime, meaning that we can't safely keep
our existing `pollster::FutureExt::block_on()` calls. If we start
depending on tokio's specific runtime, I think we would first want to
remove all the `block_on()` calls. I'll leave that for later. I think
at this point, we could equally well use `futures::io::AsyncRead`, but
I also don't know if there's a reason to prefer that.
A pattern has emerged where a integration tests check for the
availability of an external tool (`git`, `taplo`, `gpg`, ...) and skip
the test (by simply passing it) when it is not available. To check this,
the program is run with the `--version` flag.
Some tests require that the program be available at least when running
in CI, by calling `ensure_running_outside_ci` conditionally on the
outcome. The decision is up to each test, though, the utility merely
returns a `bool`.
The previous implementation of `assert_no_forgotten_test_files`
hard-coded the name of the `runner` integration test and required all
other source files to appear in matching `mod` declarations. Thus, this
approach cannot handle multiple integration tests.
However, additional integration tests may be desirable
- to support tests using a custom test harness (see upcoming commits)
- to balance the trade-off between test run time and compile time as
the test suite grows in the future.
The new implementation first uses `taplo` to parse the `[[test]]`
sections of the manifest to identify integration test main modules,
and then searches in those for `mod` declarations. This is then compared
to the list of source files in the tests directory. Like the previous
implementation, the new one does not attempt to recurse into submodules
or to handle directory-style modules; just like before it only treats
source files without a module declaration as an error and relies on the
compiler to complain about the other way around.
When `taplo` is not installed, the check is skipped unless it is running
in CI where we require `taplo` to be available.
We ran into a crash on our server at Google today because we
accidentally called `RepoPathBuf::from_internal_string()` with a
string starting with a '/', which resulted in a the assertion in that
function failing. This patch changes that constructor and its siblings
to return a `Result` instead.
Fix GHSA-794x-2rpg-rfgr.
`gix::Repository::work_dir` was renamed to `workdir` (though strangely
not the `gix::ThreadSafeRepository` version), and `lossy_config`
is now off by default in all configurations.
Previously, this was calling `git_repository_open()`,
which is equivalent to `git_repository_open_ext()` with
`flags = GIT_REPOSITORY_OPEN_NO_SEARCH` and `ceiling_dirs =
NULL`. This changes `ceiling_dirs` to an empty string, and adds
`GIT_REPOSITORY_OPEN_FROM_ENV` to `flags` when we’re in test code.
`GIT_REPOSITORY_OPEN_FROM_ENV` is used to respect the Git configuration
path environment variables, which is what we want for the test
hermeticity code. It works like this:
* `config_path_system` will use `$GIT_CONFIG_SYSTEM` because `use_env`
will be set.
* `config_path_global` will use `$GIT_CONFIG_GLOBAL` because `use_env`
will be set.
* `git_config__find_xdg` and `git_config__find_programdata` will find
impure system paths and load them even when `$GIT_CONFIG_GLOBAL` is
set, contrary to Git behaviour, so we need to set `$XDG_CONFIG_HOME`
and `$PROGAMDATA`.
It has a few other effects, which I will exhaustively enumerate to
show that they are benign:
* It respects `$GIT_WORK_TREE` and `$GIT_COMMON_DIR`. These would
already break our tests, I think, so we’re assuming they’re
not set. (Possibly we should set them explicitly.)
* When opening a repository, it will:
* Set the starting path for the search to `$GIT_DIR` if it’s
`NULL`, but we do set it, so no change.
* Initialize `ceiling_dirs` to `$GIT_CEILING_DIRECTORIES` if it’s
`NULL`, but we do set it, so no change.
* Respect `$GIT_DISCOVERY_ACROSS_FILESYSTEM` and set the
`GIT_REPOSITORY_OPEN_CROSS_FS` flag appropriately. However,
this is only checked on subsequent iterations of the loop in
`find_repo_traverse`, and we set `GIT_REPOSITORY_OPEN_NO_SEARCH`
which causes it to never enter a second iteration.
* Use `ceiling_dirs` in `find_ceiling_dir_offset`, but the result is
ignored when `GIT_REPOSITORY_OPEN_NO_SEARCH` is set, so changing
from `NULL` to the empty string doesn’t affect behaviour. (It
also would always return the same result for either value, anyway.)