language-servers/ruff - Forgejo: Beyond coding. We Forge.

mirror of https://github.com/astral-sh/ruff.git synced 2025-08-04 02:39:12 +00:00

Author	SHA1	Message	Date
Charlie Marsh	cc822082a7	Refactor `noqa` directive parsing away from regex-based implementation (#5554 ) ## Summary I'll write up a more detailed description tomorrow, but in short, this PR removes our regex-based implementation in favor of "manual" parsing. I tried a couple different implementations. In the benchmarks below: - `Directive/Regex` is our implementation on `main`. - `Directive/Find` just uses `text.find("noqa")`, which is insufficient, since it doesn't cover case-insensitive variants like `NOQA`, and doesn't handle multiple `noqa` matches in a single like, like ` # Here's a noqa comment # noqa: F401`. But it's kind of a baseline. - `Directive/Memchr` uses three `memchr` iterative finders (one for `noqa`, `NOQA`, and `NoQA`). - `Directive/AhoCorasick` is roughly the variant checked-in here. The raw results: ``` Directive/Regex/# noqa: F401 time: [273.69 ns 274.71 ns 276.03 ns] change: [+1.4467% +1.8979% +2.4243%] (p = 0.00 < 0.05) Performance has regressed. Found 15 outliers among 100 measurements (15.00%) 3 (3.00%) low mild 8 (8.00%) high mild 4 (4.00%) high severe Directive/Find/# noqa: F401 time: [66.972 ns 67.048 ns 67.132 ns] change: [+2.8292% +2.9377% +3.0540%] (p = 0.00 < 0.05) Performance has regressed. Found 15 outliers among 100 measurements (15.00%) 1 (1.00%) low severe 3 (3.00%) low mild 8 (8.00%) high mild 3 (3.00%) high severe Directive/AhoCorasick/# noqa: F401 time: [76.922 ns 77.189 ns 77.536 ns] change: [+0.4265% +0.6862% +0.9871%] (p = 0.00 < 0.05) Change within noise threshold. Found 8 outliers among 100 measurements (8.00%) 1 (1.00%) low mild 3 (3.00%) high mild 4 (4.00%) high severe Directive/Memchr/# noqa: F401 time: [62.627 ns 62.654 ns 62.679 ns] change: [-0.1780% -0.0887% -0.0120%] (p = 0.03 < 0.05) Change within noise threshold. Found 11 outliers among 100 measurements (11.00%) 1 (1.00%) low severe 5 (5.00%) low mild 3 (3.00%) high mild 2 (2.00%) high severe Directive/Regex/# noqa: F401, F841 time: [321.83 ns 322.39 ns 322.93 ns] change: [+8602.4% +8623.5% +8644.5%] (p = 0.00 < 0.05) Performance has regressed. Found 5 outliers among 100 measurements (5.00%) 1 (1.00%) low severe 2 (2.00%) low mild 1 (1.00%) high mild 1 (1.00%) high severe Directive/Find/# noqa: F401, F841 time: [78.618 ns 78.758 ns 78.896 ns] change: [+1.6909% +1.8771% +2.0628%] (p = 0.00 < 0.05) Performance has regressed. Found 3 outliers among 100 measurements (3.00%) 3 (3.00%) high mild Directive/AhoCorasick/# noqa: F401, F841 time: [87.739 ns 88.057 ns 88.468 ns] change: [+0.1843% +0.4685% +0.7854%] (p = 0.00 < 0.05) Change within noise threshold. Found 11 outliers among 100 measurements (11.00%) 5 (5.00%) low mild 3 (3.00%) high mild 3 (3.00%) high severe Directive/Memchr/# noqa: F401, F841 time: [80.674 ns 80.774 ns 80.860 ns] change: [-0.7343% -0.5633% -0.4031%] (p = 0.00 < 0.05) Change within noise threshold. Found 14 outliers among 100 measurements (14.00%) 4 (4.00%) low severe 9 (9.00%) low mild 1 (1.00%) high mild Directive/Regex/# noqa time: [194.86 ns 195.93 ns 196.97 ns] change: [+11973% +12039% +12103%] (p = 0.00 < 0.05) Performance has regressed. Found 6 outliers among 100 measurements (6.00%) 5 (5.00%) low mild 1 (1.00%) high mild Directive/Find/# noqa time: [25.327 ns 25.354 ns 25.383 ns] change: [+3.8524% +4.0267% +4.1845%] (p = 0.00 < 0.05) Performance has regressed. Found 9 outliers among 100 measurements (9.00%) 6 (6.00%) high mild 3 (3.00%) high severe Directive/AhoCorasick/# noqa time: [34.267 ns 34.368 ns 34.481 ns] change: [+0.5646% +0.8505% +1.1281%] (p = 0.00 < 0.05) Change within noise threshold. Found 5 outliers among 100 measurements (5.00%) 5 (5.00%) high mild Directive/Memchr/# noqa time: [21.770 ns 21.818 ns 21.874 ns] change: [-0.0990% +0.1464% +0.4046%] (p = 0.26 > 0.05) No change in performance detected. Found 10 outliers among 100 measurements (10.00%) 4 (4.00%) low mild 4 (4.00%) high mild 2 (2.00%) high severe Directive/Regex/# type: ignore # noqa: E501 time: [278.76 ns 279.69 ns 280.72 ns] change: [+7449.4% +7469.8% +7490.5%] (p = 0.00 < 0.05) Performance has regressed. Found 3 outliers among 100 measurements (3.00%) 1 (1.00%) low mild 1 (1.00%) high mild 1 (1.00%) high severe Directive/Find/# type: ignore # noqa: E501 time: [67.791 ns 67.976 ns 68.184 ns] change: [+2.8321% +3.1735% +3.5418%] (p = 0.00 < 0.05) Performance has regressed. Found 6 outliers among 100 measurements (6.00%) 5 (5.00%) high mild 1 (1.00%) high severe Directive/AhoCorasick/# type: ignore # noqa: E501 time: [75.908 ns 76.055 ns 76.210 ns] change: [+0.9269% +1.1427% +1.3955%] (p = 0.00 < 0.05) Change within noise threshold. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high severe Directive/Memchr/# type: ignore # noqa: E501 time: [72.549 ns 72.723 ns 72.957 ns] change: [+1.5881% +1.9660% +2.3974%] (p = 0.00 < 0.05) Performance has regressed. Found 15 outliers among 100 measurements (15.00%) 10 (10.00%) high mild 5 (5.00%) high severe Directive/Regex/# type: ignore # nosec time: [66.967 ns 67.075 ns 67.207 ns] change: [+1713.0% +1715.8% +1718.9%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 1 (1.00%) low severe 3 (3.00%) low mild 2 (2.00%) high mild 4 (4.00%) high severe Directive/Find/# type: ignore # nosec time: [18.505 ns 18.548 ns 18.597 ns] change: [+1.3520% +1.6976% +2.0333%] (p = 0.00 < 0.05) Performance has regressed. Found 4 outliers among 100 measurements (4.00%) 4 (4.00%) high mild Directive/AhoCorasick/# type: ignore # nosec time: [16.162 ns 16.206 ns 16.252 ns] change: [+1.2919% +1.5587% +1.8430%] (p = 0.00 < 0.05) Performance has regressed. Found 4 outliers among 100 measurements (4.00%) 3 (3.00%) high mild 1 (1.00%) high severe Directive/Memchr/# type: ignore # nosec time: [39.192 ns 39.233 ns 39.276 ns] change: [+0.5164% +0.7456% +0.9790%] (p = 0.00 < 0.05) Change within noise threshold. Found 13 outliers among 100 measurements (13.00%) 2 (2.00%) low severe 4 (4.00%) low mild 3 (3.00%) high mild 4 (4.00%) high severe Directive/Regex/# some very long comment that # is interspersed with characters but # no directive time: [81.460 ns 81.578 ns 81.703 ns] change: [+2093.3% +2098.8% +2104.2%] (p = 0.00 < 0.05) Performance has regressed. Found 4 outliers among 100 measurements (4.00%) 2 (2.00%) low mild 2 (2.00%) high mild Directive/Find/# some very long comment that # is interspersed with characters but # no directive time: [26.284 ns 26.331 ns 26.387 ns] change: [+0.7554% +1.1027% +1.3832%] (p = 0.00 < 0.05) Change within noise threshold. Found 6 outliers among 100 measurements (6.00%) 5 (5.00%) high mild 1 (1.00%) high severe Directive/AhoCorasick/# some very long comment that # is interspersed with characters but # no direc... time: [28.643 ns 28.714 ns 28.787 ns] change: [+1.3774% +1.6780% +2.0028%] (p = 0.00 < 0.05) Performance has regressed. Found 2 outliers among 100 measurements (2.00%) 2 (2.00%) high mild Directive/Memchr/# some very long comment that # is interspersed with characters but # no directive time: [55.766 ns 55.831 ns 55.897 ns] change: [+1.5802% +1.7476% +1.9021%] (p = 0.00 < 0.05) Performance has regressed. Found 2 outliers among 100 measurements (2.00%) 2 (2.00%) low mild ``` While memchr is faster than aho-corasick in some of the common cases (like `# noqa: F401`), the latter is way, way faster when there _isn't_ a match (like 2x faster -- see the last two cases). Since most comments _aren't_ `noqa` comments, this felt like the right tradeoff. Note that all implementations are significantly faster than the regex version. (I know I originally reported a 10x speedup, but I ended up improving the regex version a bit in some prior PRs, so it got unintentionally faster via some refactors.) There's also one behavior change in here, which is that we now allow variable spaces, e.g., `#noqa` or `# noqa`. Previously, we required exactly one space. This thus closes #5177.	2023-07-06 16:03:10 +00:00
Charlie Marsh	634ed8975c	Add pip to the ecosystem-ci check (#5521 )	2023-07-05 02:06:21 +00:00
Micha Reiser	f7969cf23c	ecosystem: Run `git` command with no human interaction flag (#5435 )	2023-06-29 09:19:11 +02:00
konstin	03694ef649	More stability checker options (#5299 ) ## Summary This contains three changes: * repos in `check_ecosystem.py` are stored as `org:name` instead of `org/name` to create a flat directory layout * `check_ecosystem.py` performs a maximum of 50 parallel jobs at the same time to avoid consuming to much RAM * `check-formatter-stability` gets a new option `--multi-project` so it's possible to do `cargo run --bin ruff_dev -- check-formatter-stability --multi-project target/checkouts` With these three changes it becomes easy to check the formatter stability over a larger number of repositories. This is part of the integration of integrating formatter regressions checks into the ecosystem checks. ## Test Plan ```shell python scripts/check_ecosystem.py --checkouts target/checkouts --projects github_search.jsonl -v $(which true) $(which true) cargo run --bin ruff_dev -- check-formatter-stability --multi-project target/checkouts ```	2023-06-22 15:48:11 +00:00
Charlie Marsh	d99b3bf661	Add some projects to the ecosystem CI check (#5258 )	2023-06-21 12:42:58 -04:00
konstin	30e90838d0	Don't assume unique repo names in ecosystem checks (#4628 ) This fixes a bug where previously repositories with the same name would have been overwritten. I tested with `scripts/check_ecosystem.py -v --checkouts target/checkouts_main .venv/bin/ruff target/release/ruff` and ruff 0.0.267 that changes are shown. I confirmed with `scripts/ecosystem_all_check.sh check --select RUF008` (next PR) that the checkouts are now complete.	2023-05-24 16:26:12 +02:00
konstin	625849b846	Ecosystem CI: Optionally diff fixes (#4193 ) * Generate fixes when using --show-fixes Example command: `cargo run --bin ruff -- --no-cache --select F401 --show-source --show-fixes crates/ruff/resources/test/fixtures/pyflakes/F401_9.py` Before, `--show-fixes` was ignored: ``` crates/ruff/resources/test/fixtures/pyflakes/F401_9.py:4:22: F401 [] `foo.baz` imported but unused \| 4 \| __all__ = ("bar",) 5 \| from foo import bar, baz \| ^^^ F401 \| = help: Remove unused import: `foo.baz` Found 1 error. [] 1 potentially fixable with the --fix option. ``` After: ``` crates/ruff/resources/test/fixtures/pyflakes/F401_9.py:4:22: F401 [] `foo.baz` imported but unused \| 4 \| __all__ = ("bar",) 5 \| from foo import bar, baz \| ^^^ F401 \| = help: Remove unused import: `foo.baz` ℹ Suggested fix 1 1 \| """Test: late-binding of `__all__`.""" 2 2 \| 3 3 \| __all__ = ("bar",) 4 \|-from foo import bar, baz 4 \|+from foo import bar Found 1 error. [] 1 potentially fixable with the --fix option. ``` Also fixes git clone	2023-05-19 09:49:57 +00:00
Tyler Yep	01b372a75c	Implement `flake8-future-annotations` FA100 (#3979 )	2023-05-14 03:00:06 +00:00
konstin	6a52577630	Ecosystem CI: Allow storing checkouts locally (#4192 ) * Ecosystem CI: Allow storing checkouts locally This adds a --checkouts options to (re)use a local directory instead of checkouts into a tempdir * Fix missing path conversion	2023-05-11 17:36:44 +02:00
Calum Young	b76b4b6016	List rule changes in ecosystem (#4371 ) * Count changes for each rule * Handle case where rule matches were found in a line * List and sort by changes * Remove detail from rule changes * Add comment about leading : * Only print rule changes if rule changes are present * Use re.search and match group * Remove dict().items() * Use match group to extract rule code	2023-05-11 16:33:15 +02:00
konstin	454c6d9c2f	Extended ecosystem check with scraped data (#3858 )	2023-04-06 22:39:48 +00:00
Jonathan Plasse	7f3b748401	Fix Ruff pre-commit hook errors (#3706 )	2023-03-23 22:52:24 -04:00
Henry Schreiner	4bdb2dd362	ci(check_ecosystem): add PyPa/build (#3569 )	2023-03-18 19:09:22 -04:00
Henry Schreiner	53a4743631	ci: fix check_ecosystem (#3602 )	2023-03-18 19:03:08 -04:00
Charlie Marsh	16a350c731	Reduce usage of ALL in ecosystem CI (#3590 )	2023-03-18 13:13:09 -04:00
Henry Schreiner	d9ed0aae69	ci(check_ecosystem): add cibuildwheel (#3567 )	2023-03-16 22:34:56 -04:00
Henry Schreiner	bbc87b7177	ci(check_ecosystem): add scikit-build-core (#3563 )	2023-03-16 19:46:42 -04:00
Xuehai Pan	e99e1fae2b	ci: add `python/typeshed` to ecosystem check (#3559 )	2023-03-16 14:19:48 -04:00
Calum Young	6c576872d4	List changes for all ecosystem repos (#3461 )	2023-03-12 14:30:38 -04:00
Charlie Marsh	7062d1db16	Run ecosystem CI checks without --isolated (#3445 )	2023-03-10 19:03:51 -05:00
Samuel Cormier-Iijima	cfa2924664	Setup ecosystem CI (#3390 ) This PR sets up an "ecosystem" check as an optional part of the CI step for pull requests. The primary piece of this is a new script in `scripts/check_ecosystem.py` which takes two ruff binaries as input and compares their outputs against a corpus of open-source code in parallel. I used ruff's `text` reporting format and stdlib's `difflib` (rather than JSON output and jsondiffs) to avoid adding another dependency. There is a new ecosystem-comment workflow to add a comment to the PR (see [this link](https://securitylab.github.com/research/github-actions-preventing-pwn-requests/) which explains why it needs to be done as a new workflow for security reasons).	2023-03-10 17:39:07 -05:00

21 commits