The main reason for this change is that we now give variables different names
based on their types. This helps avoid confusion and makes intent clearer.
However, the type name `FileExecutableFlag` doesn't have a good shortening
(`file_exec_flag` is annoyingly long), so I also renamed the type to something
shorter, which makes the code more legible: easier to mentally parse and
quicker to type.
I removed `File` from the name both for length and because it doesn't really
help distinguish from the executable field in `TreeValue` (because that field
is nested under `TreeValue::File`). Instead, in the upcoming commits I update
comments to consistently use the terms 'on-disk' and 'in-repo' to respectively
refer to the fields in the `FileState` and `TreeValue` structs, which I find
is better for keeping the difference clear in my head.
I went with `Bit` in the new name just because I'm already changing it and I
prefer `exec_bit` slightly over `exec_flag` as the variable name.
The default patterns are still saved to and loaded from .git/config. Maybe we
can add default fetch patterns to jj's configuration, but I'm not sure whether
we should deprecate .git/config fallback.
This allows us to express negative refspecs in jj's syntax. We'll need something
like `git.fetch-tags = <patterns>` to fetch tags into remote views and merge
them accordingly #7528. The default tag patterns can't be set to .git/config
because Git processes remote tags differently.
If a submodule was created in a commit C on a remote repo, switching from any
commit after C to any commit before C (eg. `jj new C-`) will result in jj
starting to track the files introduced in the submodule.
This issue has popped up very frequently for chromium developers, who
get issues when attempting to check out an older version of chromium.
Fixes#4349
Our detection of stale working copy is based on the tree id since a
long time ago, at least since Feb 2022 (e098c01935) depending on how
you count. Since Sep 2022 (443e73f346), we keep the last operation
recorded in the working copy up to date. However, we don't update it
when the tree id matches. That's inconsistent, so I think we should
always keep it up to date. This patch fixes that.
Thanks to @kevincliao for spotting this. We noticed this at Google
because it meant that we sometimes didn't notice the new operation id
in our distributed file system, which led to the machine creating
divergent operations. (The machine is supposed to detect operations
recorded in the operation log but this is sometimes flaky for
unrelated reasons.)
`test_init_load_non_utf8_path` and
`test_init_additional_workspace_non_utf8_path` now early-return on
strict UTF-8 filesystems because there's no way to report a test as
"skipped" at runtime.
Closes https://github.com/jj-vcs/jj/issues/8118
After filing https://github.com/jj-vcs/jj/issues/7685 I ran some perf traces to try to understand just what was taking so long during these slow operations. The changes in this PR reduces clone time for my large repo from about 10 minutes to 4m30s.
You can see my thought process in the comments of the above task but to summarize:
During checkout we check files/directories being created to ensure that we are not attempting to write to a reserved directory (`.jj/`, `.git/`). `same_file::is_same_file()` is an expensive check that invokes _at least 4_ syscalls when called in a naive manner (`open()` and `close()` for each path -- plus possibly more for getting file info? I haven't counted).
There are a few optimization gaps here that are causing significant slowdowns. The following checklist reflects what I've optimized in this PR, and what still remains:
- [x] `create_parent_dirs` will be called for each file/directory and for each parent dir in a path **try to create it and check if the dir is an illegal name via `reject_reserved_existing_path()`**. There is no caching of directories which have already been created.
- [ ] `reject_reserved_existing` calls `same_file::is_same_file()` in a loop for all reserved names, but the path which _has maybe been created_ isn't going to change, so its handle could be cached.
- [ ] `can_create_new_file` attempts to create the file then just uses the result as an indicator of whether or not the file is created. However, since we _have a `File`_ that `File` can be directly converted to a `same_file::Handle` and avoid a syscall that currently occurs when converting the `Path` to a `same_file::Handle`.
- [ ] `can_create_new_file` deletes the file immediately after. There's probably an opportunity here to **not** delete the file and re-use it for file write operations.
- [ ] Say we have 1000 files in `foo/`. For each file that's written, `reject_reserved_existing` is going to make at least `RESERVED_DIR_NAMES.len() * 1000` syscalls constructing `foo/{reserved_dir_name}` paths, testing their existence, etc. Maybe `jj` might create this dir? But I don't think that should ever happen -- so why not cache the handle **if** it's created and use a lookup table in `reject_reserved_existing` to only conduct these types of checks if the handle is resolved? Or alternatively cache that the file _does not_ exist after the first check.
Here are some perf traces of running a `jj git clone` of my large repo before:
Release: https://share.firefox.dev/4oiSTBw
Debug: https://share.firefox.dev/4qmJBX1
And after:
Release: https://share.firefox.dev/4nK66mH
Debug: https://share.firefox.dev/470W1ed
As a follow-up to #8115, this moves all references in the codebase to use the new website.
I didn't update the older CHANGELOG entries because I figured they're intended
to be immutable.
Glob patterns will be enabled by default globally. Since this will be a big
breaking change in revsets, this patch adds a config knob to turn the new
default on/off.
Deprecation warnings will be emitted for default "substring:" patterns. This
change will suppress them. Since "glob:" will be the new default, I made these
tests use "glob:" when both "exact:" and "glob:" work.
Tests for the revset filter functions aren't updated.
Suppose the default is changed to "glob:", literal strings would be parsed by
glob() function. It's still better to treat trivial strings as "exact" patterns.
str_util::is_glob_char() includes backslash unconditionally because we enable
backslash escapes in string patterns.
The default patterns differ between revsets and command arguments, but they'll
be unified to "glob" later. For now, parse_string_expression() should be used
only for command arguments.
The new parse_program() will allow us to parse top-level string patterns with no
parentheses. This patch also replaces a few callers of the old parse_program().
When we drop support for the all: modifier syntax, parse_program_with_modifier()
will be replaced entirely.
This paves the way to deprecate `git.auto-local-bookmark` without
adding lots of deprecation warnings to test output snapshots.
The behavior of some tests is slightly changed, because
auto-track-bookmarks also tracks bookmarks that were created locally.
I think it just shows up in output snapshots as absent-tracked
bookmarks, without affecting what the test is about.
This configuration allows users to express a set of bookmarks that
should be automatically tracked when first encountered. This includes
on clone, fetch, create and set.
Until now, the configuration values `git.push-new-bookmarks` and
`git.auto-local-bookmark` fulfilled parts of those use cases. However,
both options represent an "all or nothing" approach. By turning them on,
users risk tracking and pushing more bookmarks than desired.
By using a bookmark pattern, users can express that they want to
auto-track bookmarks that belong to them (e.g. `glob:my-name/*`).
I'm going to fix parsing of CLI string patterns to use revset parser, and it
would be annoying if inner quotes were required in addition to shell quotes:
$ jj bookmark list 'glob:"push-*"'
There's also a plan to enable glob matching globally. This will mean that we'll
have to use either `subject(*foo*)` or `subject(substring:foo)` for substring
search.
https://github.com/jj-vcs/jj/issues/6971#issuecomment-3067038313
This adds support for tracking ignored and oversized files with `jj file track`.
Previously, `jj file track` would silently fail to track files that were ignored by
`.gitignore` or larger than `snapshot.max-new-file-size`. This commit introduces an
`--include-ignored` flag that allows users to explicitly track these files.
## Implementation
Added a `force_tracking_matcher` field to `SnapshotOptions` that overrides ignore rules
and size limits. When `--include-ignored` is specified, the file pattern matcher is
passed as `force_tracking_matcher`, allowing three checks in `FileSnapshotter` to bypass
their usual restrictions for directory ignores, file ignores, and file size limits.
## Tests
- `test_track_ignored_with_flag`: Verifies `.gitignore`d files can be tracked
- `test_track_large_file_with_flag`: Verifies oversized files can be tracked
- `test_track_ignored_directory`: Verifies ignored directories can be tracked recursively
# Checklist
If applicable:
- [ ] I have updated `CHANGELOG.md`
- [x] I have updated the documentation (`README.md`, `docs/`, `demos/`)
- [ ] I have updated the config schema (`cli/src/config-schema.json`)
- [x] I have added/updated tests to cover my changes
Any keyword arguments given to the `coalesce()` and `concat()` functions
were being silently ignored because `FunctionCallNode.args` was being
accessed directly without checking `keyword_args` at all.
Since file names don't usually include glob meta characters, it's probably okay
to enable globs by default. There's also a plan to change the default of string
patterns to globs.
The short forms "cwd:"/"root:" are still aliased to literal patterns. I don't
have a strong reason to rename these.
Closes#6971
"prefix-glob:" will be the default. I don't think these prefix glob patterns
are useful, but they are added for completeness.
It's odd that literal patterns do prefix matching by default whereas globs match
exactly. We can instead rename the existing "glob:" to "file-glob:" and add
prefix "glob:", but I'm not sure which is better.
This will be used as a default matcher for patterns including glob characters. I
don't think prefix matching on globs would be particularly useful, but the
default behavior should be compatible with PrefixMatcher.
Regarding the implementation, a glob regex is first translated to a prefix
pattern by replacing $ with (/|$). We can instead use low-level regex_automata
API to compute prefix matching, but that would be more involved.
GlobMatcher::visit() is slightly optimized as we know descendant paths always
match.