Commit graph

8 commits

Author SHA1 Message Date
Charlie Marsh
6fffde72e7
Use memchr for string lexing (#9888)
## Summary

On `main`, string lexing consists of walking through the string
character-by-character to search for the closing quote (with some
nuance: we also need to skip escaped characters, and error if we see
newlines in non-triple-quoted strings). This PR rewrites `lex_string` to
instead use `memchr` to search for the closing quote, which is
significantly faster. On my machine, at least, the `globals.py`
benchmark (which contains a lot of docstrings) gets 40% faster...

```text
lexer/numpy/globals.py  time:   [3.6410 µs 3.6496 µs 3.6585 µs]
                        thrpt:  [806.53 MiB/s 808.49 MiB/s 810.41 MiB/s]
                 change:
                        time:   [-40.413% -40.185% -39.984%] (p = 0.00 < 0.05)
                        thrpt:  [+66.623% +67.181% +67.822%]
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
lexer/unicode/pypinyin.py
                        time:   [12.422 µs 12.445 µs 12.467 µs]
                        thrpt:  [337.03 MiB/s 337.65 MiB/s 338.27 MiB/s]
                 change:
                        time:   [-9.4213% -9.1930% -8.9586%] (p = 0.00 < 0.05)
                        thrpt:  [+9.8401% +10.124% +10.401%]
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe
lexer/pydantic/types.py time:   [107.45 µs 107.50 µs 107.56 µs]
                        thrpt:  [237.11 MiB/s 237.24 MiB/s 237.35 MiB/s]
                 change:
                        time:   [-4.0108% -3.7005% -3.3787%] (p = 0.00 < 0.05)
                        thrpt:  [+3.4968% +3.8427% +4.1784%]
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) high mild
  5 (5.00%) high severe
lexer/numpy/ctypeslib.py
                        time:   [46.123 µs 46.165 µs 46.208 µs]
                        thrpt:  [360.36 MiB/s 360.69 MiB/s 361.01 MiB/s]
                 change:
                        time:   [-19.313% -18.996% -18.710%] (p = 0.00 < 0.05)
                        thrpt:  [+23.016% +23.451% +23.935%]
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  3 (3.00%) low mild
  1 (1.00%) high mild
  4 (4.00%) high severe
lexer/large/dataset.py  time:   [231.07 µs 231.19 µs 231.33 µs]
                        thrpt:  [175.87 MiB/s 175.97 MiB/s 176.06 MiB/s]
                 change:
                        time:   [-2.0437% -1.7663% -1.4922%] (p = 0.00 < 0.05)
                        thrpt:  [+1.5148% +1.7981% +2.0864%]
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  5 (5.00%) high mild
  5 (5.00%) high severe
```
2024-02-08 17:23:06 +00:00
Charlie Marsh
f0d43dafcf
Ignore trailing quotes for unclosed l-brace errors (#9388)
## Summary

Given:

```python
F"{"ڤ
```

We try to locate the "unclosed left brace" error by subtracting the
quote size from the lexer offset -- so we subtract 1 from the end of the
source, which puts us in the middle of a Unicode character. I don't
think we should try to adjust the offset in this way, since there can be
content _after_ the quote. For example, with the advent of PEP 701, this
string could reasonably be fixed as:

```python
F"{"ڤ"}"
````

Closes https://github.com/astral-sh/ruff/issues/9379.
2024-01-04 05:00:55 +00:00
Andrew Gallant
3ce145c476
release: switch to Cargo's default (#9031)
This sets `lto = "thin"` instead of using "fat" LTO, and sets
`codegen-units = 16`. These are the defaults for Cargo's `release`
profile, and I think it may give us faster iteration times, especially
when benchmarking. The point of this PR is to see what kind of impact
this has on benchmarks. It is expected that benchmarks may regress to
some extent.

I did some quick ad hoc experiments to quantify this change in compile
times. Namely, I ran:

    cargo build --profile release -p ruff_cli

Then I ran

touch crates/ruff_python_formatter/src/expression/string/docstring.rs

(because that's where i've been working lately) and re-ran

    cargo build --profile release -p ruff_cli

This last command is what I timed, since it reflects how much time one
has to wait between making a change and getting a compiled artifact.

Here are my results:

* With status quo `release` profile, build takes 77s
* with `release` but `lto = "thin"`, build takes 41s
* with `release`, but `lto = false`, build takes 19s
* with `release`, but `lto = false` **and** `codegen-units = 16`, build
takes 7s
* with `release`, but `lto = "thin"` **and** `codegen-units = 16`, build
takes 16s (i believe this is the default `release` configuration)

This PR represents the last option. It's not the fastest to compile, but
it's nearly a whole minute faster! The idea is that with `codegen-units
= 16`, we still make use of parallelism, but keep _some_ level of LTO on
to try and re-gain what we lose by increasing the number of codegen
units.
2023-12-15 08:19:35 -05:00
Carter Snook
e2b5c6ac5f
perf(parser): use memchr for lexing comments (#8193) 2023-10-27 02:07:43 +01:00
Dhruv Manilawala
e62e245c61
Add support for PEP 701 (#7376)
## Summary

This PR adds support for PEP 701 in Ruff. This is a rollup PR of all the
other individual PRs. The separate PRs were created for logic separation
and code reviews. Refer to each pull request for a detail description on
the change.

Refer to the PR description for the list of pull requests within this PR.

## Test Plan

### Formatter ecosystem checks

Explanation for the change in ecosystem check:
https://github.com/astral-sh/ruff/pull/7597#issue-1908878183

#### `main`

```
| project      | similarity index  | total files       | changed files     |
|--------------|------------------:|------------------:|------------------:|
| cpython      |           0.76083 |              1789 |              1631 |
| django       |           0.99983 |              2760 |                36 |
| transformers |           0.99963 |              2587 |               319 |
| twine        |           1.00000 |                33 |                 0 |
| typeshed     |           0.99983 |              3496 |                18 |
| warehouse    |           0.99967 |               648 |                15 |
| zulip        |           0.99972 |              1437 |                21 |
```

#### `dhruv/pep-701`

```
| project      | similarity index  | total files       | changed files     |
|--------------|------------------:|------------------:|------------------:|
| cpython      |           0.76051 |              1789 |              1632 |
| django       |           0.99983 |              2760 |                36 |
| transformers |           0.99963 |              2587 |               319 |
| twine        |           1.00000 |                33 |                 0 |
| typeshed     |           0.99983 |              3496 |                18 |
| warehouse    |           0.99967 |               648 |                15 |
| zulip        |           0.99972 |              1437 |                21 |
```
2023-09-29 02:55:39 +00:00
Micha Reiser
8ce138760a
Emit LexError for dedent to incorrect level (#7638) 2023-09-25 11:45:44 +01:00
Dhruv Manilawala
4d49d5e845
Add eat_char2 for the lexer (#6968)
## Summary

This PR adds a new helper method on the `Cursor` called `eat_char2`
which is similar to `eat_char` but accepts 2 characters instead of 1. It'll
`bump` the cursor twice if both characters are found on lookahead.

## Test Plan

`cargo test`
2023-08-29 17:18:02 +05:30
Micha Reiser
40f54375cb
Pull in RustPython parser (#6099) 2023-07-27 09:29:11 +00:00