Commit graph

4385 commits

Author SHA1 Message Date
Yuya Nishihara
adef815d1d tests: try both DOS and hashed NT short file names
For some unknown reasons, hashed 8.3 file name is chosen for ".jj" on Github
CI. Hashed ".git" short name is also added for consistency.
2024-11-07 13:38:04 +09:00
Yuya Nishihara
dedab69eaa local_working_copy: lstat() path to test file existence if creation failed
Appears that file creation fails for other unknown reasons on Windows CI.
2024-11-07 13:38:04 +09:00
Martin von Zweigbergk
c697ee7d80 tests: work around codespell suggesting dows->does 2024-11-07 13:38:04 +09:00
Yuya Nishihara
ded48ff6e7 local_working_copy: do not create file or write in directory named .jj or .git
I originally considered adding deny-list-based implementation, but the Windows
compatibility rules are super confusing and I don't have a machine to find out
possible aliases. This patch instead adds directory equivalence tests.

In order to test file entity equivalence, we first need to create a file or
directory of the requested name. It's harmless to create an empty .jj or .git
directory, but materializing .git file or symlink can temporarily set up RCE
situation. That's why new empty file is created to test the path validity. We
might want to add some optimization for safe names (e.g. ASCII, not contain
"git" or "jj", not contain "~", etc.)

That being said, I'm not pretty sure if .git/.jj in sub directory must be
checked. It's not safe to cd into the directory and run "jj", but the same
thing can be said to other tools such as "cargo". Perhaps, our minimum
requirement is to protect our metadata (= the root .jj and .git) directories.

Despite the crate name (and internal use of std::fs::File),
same_file::is_same_file() can test equivalence of directories. This is
documented and tested, so I've removed my custom implementation, which was
slightly simpler but lacks Windows support.
2024-11-06 15:03:41 -08:00
Yuya Nishihara
eaafde7119 cargo: add same-file dependency 2024-11-06 15:03:41 -08:00
Yuya Nishihara
f10c5db739 local_working_copy: skip existing symlinks consistently
If new file would overwrite an existing regular file, the file path is skipped.
It makes sense to apply the same rule to existing symlinks. Without this patch,
check out would fail if an existing path was a dead symlink or a symlink to
a directory.
2024-11-06 15:03:41 -08:00
Yuya Nishihara
24ccfda781 local_working_copy: do not try to remove old file traversing symlinks
I'm not sure if this was attackable before, but it should be better to not
try to remove file across symlinks.

The disk_path is now returned from create_parent_dirs() to clarify that the
path is identical.
2024-11-06 15:03:41 -08:00
Yuya Nishihara
8540536ea2 local_working_copy: detect error of file removal earlier
This should be safer than relying on file open error. It's scary to continue
processing if the file was a symlink.

I'll add a few more sanity checks to remove_old_file(), so it's extracted as a
function.
2024-11-06 15:03:41 -08:00
Yuya Nishihara
1c30f3b3e8 repo_path: reject invalid path components by to_fs_path/name()
This addresses a simple path traversal attack.

I don't have a Windows machine, so the added Windows tests aren't checked
locally.
2024-11-06 15:03:41 -08:00
Yuya Nishihara
739bf8decf repo_path: add stub for checked to_fs_path(), rename unchecked functions
I'm going to add "checked" version of to_fs_path(), but all callers can't be
migrated to it. For example, an error message should be produced even if the
path is malformed.

This patch also adds error variants to propagate InvalidRepoPathError. They
don't use ::Other { .. } so the errors can be distinguished in tests.
2024-11-06 15:03:41 -08:00
Yuya Nishihara
e819cec305 revset: inline resolve/evaluate_programmatic() in tests
I'm going to replace the current .evaluate_programmatic() which does minimal
commit-ref resolution. The new .evaluate_programmatic() will be implemented on
a "resolved" expression.
2024-11-06 09:45:09 +09:00
Yuya Nishihara
0a73245b82 revset: move RevsetCommitRef::Root to RevsetExpression
For the same reason as the previous patch. It's nice if root() is considered
a "resolved" expression. With this change, most of the evaluate_programmatic()
callers won't have to do symbol resolution at all.
2024-11-04 09:20:46 +09:00
Yuya Nishihara
0e8f1ce579 revset: move RevsetCommitRef::VisibleHeads to RevsetExpression
I'm going to add RevsetExpression<State = Resolved|User> type parameter to
detect API misuse at compile time. VisibleHeads is similar to All, and appears
in generic expression substitution function where a concrete State type
shouldn't be known.
2024-11-04 09:20:46 +09:00
Yuya Nishihara
e38f7b0594 revset: add RevsetExpression::present() as there's an external caller 2024-11-04 09:20:46 +09:00
Yuya Nishihara
a740eaeb86 revset: add convenient method that extracts symbol name from expression 2024-11-04 09:20:46 +09:00
Yuya Nishihara
12eb5c5515 revset: split "resolved" variant from RevsetExpression::AtOperation
This ensures that a symbol-resolved at_operation() expression won't be resolved
again when it's intersected with another expression, for example.

    # in CLI
    let expr1 = parse("at_operation(..)").resolve_user_symbol();
    # in library
    let expr2 = RevsetExpression::ancestors().intersection(&expr1);
    expr2.evaluate_programmatic()
2024-11-04 09:20:46 +09:00
Yuya Nishihara
f251f08ce7 diff: impl Clone, Debug for DiffHunkIterator
For consistency with the ranges iterator.
2024-11-02 10:09:10 +09:00
Yuya Nishihara
de2a8a579a diff: extract hunk_ranges() iterator
This will help construct file content based on diff hunks. For example, "jj
absorb" will first calculate annotation of the source parent (within mutable
ancestors), calculate diff, then "squash" hunks into ancestor commits of the
surrounding ranges.
2024-11-02 10:09:10 +09:00
Yuya Nishihara
9f1d2abd76 testutils: move global TestBackendData mapping to TestEnvironment
This unblocks the use of TestBackend in long-running processes such as fuzzer.
It should also be safer because TempDir doesn't guarantee that the path is never
reused.
2024-11-02 08:39:02 +09:00
Yuya Nishihara
7b5df93fe4 testutils: move default_store_factories() to TestEnvironment
It will capture the TestBackendData mapping.
2024-11-02 08:39:02 +09:00
Yuya Nishihara
d4786a3256 testutils: move load_repo_at_head() to TestEnvironment
It will depend on the TestBackendData mapping.
2024-11-02 08:39:02 +09:00
Yuya Nishihara
22f2393322 testutils: add stub TestEnvironment that will manage in-memory backend data
TestBackendData instances persist in memory right now, but they should be
discarded when the corresponding temp_dir gets dropped. The added struct will
manage the TestBackendData mapping.
2024-11-02 08:39:02 +09:00
Martin von Zweigbergk
30ab71d340 bookmarks: add support for git.auto-local-bookmark (to match docs)
We had documented that we support `git.auto-local-bookmark` but we
don't. The documentation has been incorrect since d9c68e08b1. This
patch fixes it by adding support for `git.auto-local-bookmark` with
fallback to the old/current `git.auto-local-branch`.
.
2024-10-30 08:01:02 -07:00
Yuya Nishihara
e464c0e607 annotate: rename AnnotateResults to FileAnnotation
The name "Results" was a bit misleading because Result<T, E> aliases are often
called FooResult.
2024-10-29 23:33:46 +09:00
Yuya Nishihara
ab10b7c0a0 annotate: do not collect result lines into Vec, return Iterator instead
We might want to calculate (commit_id, range) pairs of consecutive lines in
order to "absorb" changes, for example.

This should also be cheaper since Vec<u8> doesn't have to be allocated per line.
2024-10-29 23:33:46 +09:00
Yuya Nishihara
bd1024547d annotate: use sorted Vec<(usize, usize)> to propagate lines to ancestors
This isn't so complicated compared to the HashMap version, and we can handle
multiple (cur, orig1), (cur, orig2) pairs. It's also cheaper to access.
2024-10-29 14:57:57 +09:00
Yuya Nishihara
1fd628a0cf annotate: omit building intermediate same_line_map
Perhaps, get_same_line_map() could return an iterator, but implementing an
iterator to be "pull"-ed is much harder than writing a function to "push",
especially when lifetime is involved.
2024-10-29 14:57:57 +09:00
Yuya Nishihara
0eedc0cbae annotate: simply use Vec<_> for list of originating commit IDs
Since we're going to fill the list at all, it doesn't make sense to keep it as
a sparse HashMap.
2024-10-29 14:57:57 +09:00
Yuya Nishihara
53af8a1fbc annotate: simplify condition when to exit early from process_commits() loop 2024-10-29 14:57:57 +09:00
Yuya Nishihara
89a0f46986 annotate: remove redundant .is_absent() test from get_file_contents()
Here we shouldn't care whether the file value is absent or a tree, for example.
2024-10-28 12:40:01 +09:00
Yuya Nishihara
d6026e46e9 annotate: inline process_files_in_commits()
The doc comment describes what the caller should do, not the function would do.
2024-10-28 12:40:01 +09:00
Yuya Nishihara
db239536da annotate: inline mark_lines_from_original()
This function was short, and this change makes it clear that !.is_empty() was
redundant. Duplicated doc comment is also removed. I feel the inline comment is
easier to follow here.
2024-10-28 12:40:01 +09:00
Yuya Nishihara
52d842b8df annotate: use "let else" and "continue" where makes sense 2024-10-28 12:40:01 +09:00
Yuya Nishihara
7b9c90d8e6 annotate: remove unneeded commit object lookup 2024-10-28 12:40:01 +09:00
Yuya Nishihara
8249e9fee8 annotate: inline get_initial_commit_line_map()
It no longer makes sense to initialize Source line_map and build
HashMap<Commit, Source> in one function. Let's extract the line_map
initialization to a function instead.
2024-10-28 12:40:01 +09:00
Yuya Nishihara
84cc6e2c2f annotate: merge file line map and content cache into a single HashMap
Their lifetimes are identical. We can remove .unwrap()s that were needed in
order to re-borrow cached contents after loading.
2024-10-28 12:40:01 +09:00
Yuya Nishihara
b485881d50 tests: add basic tests for annotation function 2024-10-27 22:51:54 +09:00
Yuya Nishihara
f6db2426e8 annotate: impl Debug on AnnotateResults type, use BString for readability
I'm going to add snapshot tests. Debug isn't strictly needed, but it should
help printf debugging.
2024-10-27 22:51:54 +09:00
Yuya Nishihara
8c0ab6af7a local_working_copy: import std::io for short 2024-10-23 23:51:21 +09:00
Yuya Nishihara
4187847c60 revset: propagate errors from filter predicates
All intermediate nodes are changed to RevWalk of Result<IndexPosition, _> type
to pass BackendError around from filter predicates. Leaf ancestors/descendants
computation is unchanged, and mapped to Result at revset_engine layer. This is
simpler than converting all RevWalk impls to Result<_, _>.
2024-10-23 09:30:51 +09:00
Yuya Nishihara
e37378dc18 revset: implement RevWalk::map() and filter_map(), remove filter()
We'll need to propagate error from predicate function, so .filter() will no
longer be usable. .map() will be used in order to wrap infallible ancestry
lookup with Ok(_).

Some RevsetImpl methods are migrated to .map() as example.
2024-10-23 09:30:51 +09:00
Yuya Nishihara
ad676cdf7e revset: collect RevWalkBuilder arguments by caller
This will make error propagation simpler in the following patches. The input
iterators will be changed to Item = Result<IndexPosition, _>.
2024-10-23 09:30:51 +09:00
Yuya Nishihara
e6e5c7412c revset_graph: use VecDeque for look ahead buffer
I don't see measurable performance difference, but VecDeque is theoretically
simpler than BTreeSet. The input is sorted, so we never do random insertion.
2024-10-23 09:30:51 +09:00
Yuya Nishihara
a38e59f447 revset: fix is_empty() doc, the logic was inverted 2024-10-23 09:30:51 +09:00
Benjamin Tan
0a38fdc9d3 rewrite: move_commits: add MoveCommitsTarget enum to specify roots or commits to move
This also allows some minor optimizations to be performed, such as
avoiding recomputation of the connected target set when
`MoveCommitsTarget::Roots` is used since the connected target set is
identical to the target set (all descendants of the roots).
2024-10-22 20:39:50 +08:00
Benjamin Tan
95b7a60979 rewrite: extract compute_internal_parents_within function 2024-10-22 20:39:50 +08:00
Benjamin Tan
9927a9856e rewrite: make MoveCommitStats derive Default 2024-10-22 20:39:50 +08:00
Yuya Nishihara
a493913000 revset: propagate evaluation errors from other Revset methods
is_empty() could also return Result<bool, _>, but I think the current definition
is also good. If an error occurred, revset.iter() would return at least one
item, so it's not empty.
2024-10-22 09:03:53 +09:00
Yuya Nishihara
825b6670b3 revset: move containing_fn() type alias to lib
If the return type were changed to Result<bool, _>, clippy would complain about
the type complexity.
2024-10-22 09:03:53 +09:00
Yuya Nishihara
ac08f995d8 graph: add GraphNode<N> type alias instead of Graph/NextNodeResult<N, E>
It can be used in more places, and the resulting types are short enough to
silence clippy.
2024-10-22 09:03:53 +09:00