ruff/crates/ruff_python_ast/src
Thomas de Zeeuw e3c12764f8
Only use a single cache file per Python package (#5117)
## Summary

This changes the caching design from one cache file per source file, to
one cache file per package. This greatly reduces the amount of cache
files that are opened and written, while maintaining roughly the same
(combined) size as bincode is very compact.

Below are some very much not scientific performance tests. It uses
projects/sources to check:

* small.py: single, 31 bytes Python file with 2 errors.
* test.py: single, 43k Python file with 8 errors.
* fastapi: FastAPI repo, 1134 files checked, 0 errors.

Source   | Before # files | After # files | Before size | After size
-------|-------|-------|-------|-------
small.py | 1              | 1             | 20 K        | 20 K
test.py  | 1              | 1             | 60 K        | 60 K
fastapi  | 1134           | 518           | 4.5 M       | 2.3 M

One question that might come up is why fastapi still has 518 cache files
and not 1? That is because this is using the existing package
resolution, which sees examples, docs, etc. as separate from the "main"
source code (in the fastapi directory in the repo). In this future it
might be worth consider switching to a one cache file per repo strategy.

This new design is not perfect and does have a number of known issues.
First, like the old design it doesn't remove the cache for a source file
that has been (re)moved until `ruff clean` is called.

Second, this currently uses a large mutex around the mutation of the
package cache (e.g. inserting result). This could be (or become) a
bottleneck. It's future work to test and improve this (if needed).

Third, currently the packages and opened and stored in a sequential
loop, this could be done parallel. This is also future work.


## Test Plan

Run `ruff check` (with caching enabled) twice on any Python source code
and it should produce the same results.
2023-06-19 17:46:13 +02:00
..
source_code Only use a single cache file per Python package (#5117) 2023-06-19 17:46:13 +02:00
visitor Run rustfmt on nightly to clean up erroneous comments (#5106) 2023-06-15 00:19:05 +00:00
all.rs Allow re-assignments to __all__ (#4967) 2023-06-08 17:19:56 +00:00
call_path.rs Refactor range from Attributed to Nodes (#4422) 2023-05-16 06:36:32 +00:00
cast.rs Upgrade RustPython (#4900) 2023-06-08 05:53:14 +00:00
comparable.rs Fix erroneous kwarg reference (#5068) 2023-06-14 00:01:52 +00:00
docstrings.rs Move Python whitespace utilities into new ruff_python_whitespace crate (#4993) 2023-06-10 00:59:57 +00:00
function.rs Format Function definitions (#4951) 2023-06-08 16:07:33 +00:00
hashable.rs Create a rust_python_ast crate (#3370) 2023-03-07 15:18:40 +00:00
helpers.rs Use matches! for insecure hash rule (#5141) 2023-06-16 04:18:32 +00:00
identifier.rs Always use identifier ranges to store bindings (#5110) 2023-06-15 18:43:19 +00:00
imports.rs Run rustfmt on nightly to clean up erroneous comments (#5106) 2023-06-15 00:19:05 +00:00
lib.rs Always use identifier ranges to store bindings (#5110) 2023-06-15 18:43:19 +00:00
node.rs Fix a number of formatter errors from the cpython repository (#5089) 2023-06-15 11:24:14 +00:00
prelude.rs Improve ruff_parse_simple to find UTF-8 violations (#5008) 2023-06-12 12:10:23 -04:00
relocate.rs Refactor range from Attributed to Nodes (#4422) 2023-05-16 06:36:32 +00:00
statement_visitor.rs Refactor range from Attributed to Nodes (#4422) 2023-05-16 06:36:32 +00:00
str.rs Include f-string prefixes in quote-stripping utilities (#5039) 2023-06-12 18:25:47 -04:00
token_kind.rs Bring pycodestyle rules into full compatibility (on SciPy) (#4472) 2023-05-17 16:51:55 +00:00
types.rs Replace parents statement stack with a Nodes abstraction (#4233) 2023-05-06 16:12:41 +00:00
typing.rs Upgrade RustPython (#4747) 2023-05-31 08:26:35 +00:00
visitor.rs Upgrade RustPython (#4900) 2023-06-08 05:53:14 +00:00
whitespace.rs Move Python whitespace utilities into new ruff_python_whitespace crate (#4993) 2023-06-10 00:59:57 +00:00