Commit graph

17 commits

Author SHA1 Message Date
Micha Reiser
2cf00fee96
Remove parser dependency from ruff-python-ast (#6096) 2023-07-26 17:47:22 +02:00
Zanie Blue
3000a47fe8
Include file permissions in key for cached files (#5901)
Reimplements https://github.com/astral-sh/ruff/pull/3104
Closes https://github.com/astral-sh/ruff/issues/5726

Note that we will generate the hash for a cache key twice in normal
operation. Once to check for the cached item and again to update the
cache. We could optimize this by generating the hash once in
`diagnostics::lint_file` and passing the `u64` into `get` and `update`.
We'd probably want to wrap it in a `CacheKeyHash` enum for type safety.

## Test plan

Unit tests for Windows and Unix.

Manual test with case from issue

```
❯ touch fake.py
❯ chmod +x fake.py
❯ ./target/debug/ruff --select EXE fake.py
fake.py:1:1: EXE002 The file is executable but no shebang is present
Found 1 error.
❯ chmod -x fake.py
❯ ./target/debug/ruff --select EXE fake.py
```
2023-07-25 17:06:47 +00:00
Thomas de Zeeuw
1c638264b2
Keep track of when files are last seen in the cache (#5214)
## Summary

And remove cached files that we haven't seen for a certain period of
time, currently 30 days.

For the last seen timestamp we actually use an `u64`, it's smaller on
disk than `SystemTime` (which size is OS dependent) and fits in an
`AtomicU64` which we can use to update it without locks.

## Test Plan

Added a new unit test, run by `cargo test`.
2023-06-23 15:40:35 +02:00
Charlie Marsh
6b8b318d6b
Use mod tests consistently (#5278)
As per the Rust documentation.
2023-06-22 01:50:28 +00:00
Thomas de Zeeuw
17f1ecd56e
Open cache files in parallel (#5120)
## Summary

Open cache files in parallel (again), brings the performance back to be roughly equal to the old implementation.

## Test Plan

Existing tests should keep working.
2023-06-20 17:43:09 +02:00
Thomas de Zeeuw
e3c12764f8
Only use a single cache file per Python package (#5117)
## Summary

This changes the caching design from one cache file per source file, to
one cache file per package. This greatly reduces the amount of cache
files that are opened and written, while maintaining roughly the same
(combined) size as bincode is very compact.

Below are some very much not scientific performance tests. It uses
projects/sources to check:

* small.py: single, 31 bytes Python file with 2 errors.
* test.py: single, 43k Python file with 8 errors.
* fastapi: FastAPI repo, 1134 files checked, 0 errors.

Source   | Before # files | After # files | Before size | After size
-------|-------|-------|-------|-------
small.py | 1              | 1             | 20 K        | 20 K
test.py  | 1              | 1             | 60 K        | 60 K
fastapi  | 1134           | 518           | 4.5 M       | 2.3 M

One question that might come up is why fastapi still has 518 cache files
and not 1? That is because this is using the existing package
resolution, which sees examples, docs, etc. as separate from the "main"
source code (in the fastapi directory in the repo). In this future it
might be worth consider switching to a one cache file per repo strategy.

This new design is not perfect and does have a number of known issues.
First, like the old design it doesn't remove the cache for a source file
that has been (re)moved until `ruff clean` is called.

Second, this currently uses a large mutex around the mutation of the
package cache (e.g. inserting result). This could be (or become) a
bottleneck. It's future work to test and improve this (if needed).

Third, currently the packages and opened and stored in a sequential
loop, this could be done parallel. This is also future work.


## Test Plan

Run `ruff check` (with caching enabled) twice on any Python source code
and it should produce the same results.
2023-06-19 17:46:13 +02:00
Charlie Marsh
19c4b7bee6
Rename ruff_python_semantic's Context struct to SemanticModel (#4565) 2023-05-22 02:35:03 +00:00
Jonathan Plasse
c10a4535b9
Disallow unreachable_pub (#4314) 2023-05-11 18:00:00 -04:00
Micha Reiser
8969ad5879
Always generate fixes (#4239) 2023-05-10 07:06:14 +00:00
Zanie Adkins
0801f14046
Refactor Fix and Edit API (#4198) 2023-05-08 11:57:03 +02:00
Micha Reiser
cab65b25da
Replace row/column based Location with byte-offsets. (#3931) 2023-04-26 18:11:02 +00:00
Micha Reiser
c33c9dc585
Introduce SourceFile to avoid cloning the message filename (#3904) 2023-04-11 08:28:55 +00:00
Micha Reiser
381203c084
Store source code on message (#3897) 2023-04-11 07:57:36 +00:00
Chris Chan
10504eb9ed
Generate ImportMap from module path to imported dependencies (#3243) 2023-04-04 03:31:37 +00:00
Micha Reiser
cdbe2ee496
refactor: Introduce CacheKey trait (#3323)
This PR introduces a new `CacheKey` trait for types that can be used as a cache key.

I'm not entirely sure if this is worth the "overhead", but I was surprised to find `HashableHashSet` and got scared when I looked at the time complexity of the `hash` function. These implementations must be extremely slow in hashed collections.

I then searched for usages and quickly realized that only the cache uses these `Hash` implementations, where performance is less sensitive.

This PR introduces a new `CacheKey` trait to communicate the difference between a hash and computing a key for the cache. The new trait can be implemented for types that don't implement `Hash` for performance reasons, and we can define additional constraints on the implementation:  For example, we'll want to enforce portability when we add remote caching support. Using a different trait further allows us not to implement it for types without stable identities (e.g. pointers) or use other implementations than the standard hash function.
2023-03-03 18:29:49 +00:00
Charlie Marsh
18800c6884
Include file permissions in cache key (#3104) 2023-02-21 18:20:06 -05:00
Micha Reiser
cd8be8c0be
refactor: Introduce crates folder (#2088)
This PR introduces a new `crates` directory and moves all "product" crates into that folder. 

Part of #2059.
2023-02-05 16:47:48 -05:00
Renamed from ruff_cli/src/cache.rs (Browse further)