To enable ruff_dev to automatically generate the rule Markdown tables in
the README the ruff library contained the following function:
Linter::codes() -> Vec<RuleSelector>
which was slightly changed to `fn prefixes(&self) -> Prefixes` in
9dc66b5a65 to enable ruff_dev to split
up the Markdown tables for linters that have multiple prefixes
(pycodestyle has E & W, Pylint has PLC, PLE, PLR & PLW).
The definition of this method was however largely redundant with the
#[prefix] macro attributes in the Linter enum, which are used to
derive the Linter::parse_code function, used by the --explain command.
This commit removes the redundant Linter::prefixes by introducing a
same-named method with a different signature to the RuleNamespace trait:
fn prefixes(&self) -> &'static [&'static str];
As well as implementing IntoIterator<Rule> for &Linter. We extend the
extisting RuleNamespace proc macro to automatically derive both
implementations from the Linter enum definition.
To support the previously mentioned Markdown table splitting we
introduce a very simple hand-written method to the Linter impl:
fn categories(&self) -> Option<&'static [LinterCategory]>;
ParseCode was a fitting name since the trait only contained a single
parse_code method ... since we now however want to introduce an
additional `prefixes` method RuleNamespace is more fitting.
Using Ident as the key type is inconvenient since creating an Ident
requires the specification of a Span, which isn't actually used by
the Hash implementation of Ident.
If a file doesn't have a `package`, then it must both be in a directory that lacks an `__init__.py`, and a directory that _isn't_ marked as a namespace package.
Closes#2075.
- optional `prefix` argument for `add_plugin.py`
- rules directory instead of `rules.rs`
- pathlib syntax
- fix test case where code was added instead of name
Example:
```
python scripts/add_plugin.py --url https://pypi.org/project/example/1.0.0/ example --prefix EXA
python scripts/add_rule.py --name SecondRule --code EXA002 --linter example
python scripts/add_rule.py --name FirstRule --code EXA001 --linter example
python scripts/add_rule.py --name ThirdRule --code EXA003 --linter example
```
Note that it breaks compatibility with 'old style' plugins (generation works fine, but namespaces need to be changed):
```
python scripts/add_rule.py --name DoTheThing --code PLC999 --linter pylint
```
## Summary
The problem: given a (row, column) number (e.g., for a token in the AST), we need to be able to map it to a precise byte index in the source code. A while ago, we moved to `ropey` for this, since it was faster in practice (mostly, I think, because it's able to defer indexing). However, at some threshold of accesses, it becomes faster to index the string in advance, as we're doing here.
## Benchmark
It looks like this is ~3.6% slower for the default rule set, but ~9.3% faster for `--select ALL`.
**I suspect there's a strategy that would be strictly faster in both cases**, based on deferring even more computation (right now, we lazily compute these offsets, but we do it for the entire file at once, even if we only need some slice at the top), or caching the `ropey` lookups in some way.
Before:

After:

## Alternatives
I tried tweaking the `Vec::with_capacity` hints, and even trying `Vec::with_capacity(str_indices::lines_crlf::count_breaks(contents))` to do a quick scan of the number of lines, but that turned out to be slower.
Add tests.
Ensure that these cases are caught by ICN001:
```python
from xml.dom import minidom
from xml.dom.minidom import parseString
```
with config:
```toml
[tool.ruff.flake8-import-conventions.extend-aliases]
"dask.dataframe" = "dd"
"xml.dom.minidom" = "md"
"xml.dom.minidom.parseString" = "pstr"
```
This _did_ fix https://github.com/charliermarsh/ruff/issues/1894, but was a little premature. `toml` doesn't actually depend on `toml-edit` yet, and `v0.5.11` was mostly about deprecations AFAICT. So upgrading might solve that issue, but could introduce other incompatibilities, and I'd like to minimize churn. I expect that `toml` will have a new release soon, so we can revert this revert.
Reverts charliermarsh/ruff#2040.
The idea is the same as #1867. Avoids emitting `SIM102` twice for the following code:
```python
if a:
if b:
if c:
d
```
```
resources/test/fixtures/flake8_simplify/SIM102.py:1:1: SIM102 Use a single `if` statement instead of nested `if` statements
resources/test/fixtures/flake8_simplify/SIM102.py:2:5: SIM102 Use a single `if` statement instead of nested `if` statements
```
This PR adds the scaffolding files for `flake8-type-checking`, along with the simplest rule (`empty-type-checking-block`), just as an example to get us started.
See: #1785.
543865c96b introduced
RuleCode::origin() -> RuleOrigin generation via a macro, while that
signature now has been renamed to Rule::origin() -> Linter we actually
want to get rid of it since rules and linters shouldn't be this tightly
coupled (since one rule can exist in multiple linters).
Another disadvantage of the previous approach was that the prefixes
had to be defined in ruff_macros/src/prefixes.rs, which was easy to
miss when defining new linters in src/*, case in point
INP001 => violations::ImplicitNamespacePackage has in the meantime been
added without ruff_macros/src/prefixes.rs being updated accordingly
which resulted in `ruff --explain INP001` mistakenly reporting that the
rule belongs to isort (since INP001 starts with the isort prefix "I").
The derive proc macro introduced in this commit requires every variant
to have at least one #[prefix = "..."], eliminating such mistakes.
More accurate since the enum also encompasses:
* ALL (which isn't a prefix at all)
* fully-qualified rule codes (which aren't prefixes unless you say
they're a prefix to the empty string but that's not intuitive)
"origin" was accurate since ruff rules are currently always modeled
after one origin (except the Ruff-specific rules).
Since we however want to introduce a many-to-many mapping between codes
and rules, the term "origin" no longer makes much sense. Rules usually
don't have multiple origins but one linter implements a rule first and
then others implement it later (often inspired from another linter).
But we don't actually care much about where a rule originates from when
mapping multiple rule codes to one rule implementation, so renaming
RuleOrigin to Linter is less confusing with the many-to-many system.
Tracking issue: https://github.com/charliermarsh/ruff/issues/2024
Implementation for EXE003, EXE004 and EXE005 of `flake8-executable`
(shebang should contain "python", not have whitespace before, and should be on the first line)
Please take in mind that this is my first rust contribution.
The remaining EXE-rules are a combination of shebang (`lines.rs`), file permissions (`fs.rs`) and if-conditions (`ast.rs`). I was not able to find other rules that have interactions/dependencies in them. Any advice on how this can be best implemented would be very welcome.
For autofixing `EXE005`, I had in mind to _move_ the shebang line to the top op the file. This could be achieved by a combination of `Fix::insert` and `Fix::delete` (multiple fixes per diagnostic), or by implementing a dedicated `Fix::move`, or perhaps in other ways. For now I've left it out, but keen on hearing what you think would be most consistent with the package, and pointer where to start (if at all).
---
If you care about another testimonial:
`ruff` not only helps staying on top of the many excellent flake8 plugins and other Python code quality tools that are available, it also applies them at baffling speed.
(Planning to implement it soon for github.com/pandas-profiling/pandas-profiling (as largest contributor) and github.com/ing-bank/popmon.)
Rule described here: https://www.flake8rules.com/rules/E101.html
I tried to follow contributing guidelines closely, I've never worked with Rust before. Stumbled across Ruff a few days ago and would like to use it in our project, but we use a bunch of flake8 rules that are not yet implemented in ruff, so I decided to give it a go.
Following up on #2018/#2019 discussion, this moves the readme's development-related bits to `CONTRIBUTING.md` to avoid duplication, and fixes up the commands accordingly 😄