ruff/scripts
Charlie Marsh 574c0e0105
Use match instead of phf for confusable lookup (#5953)
I don't know whether we want to make this change but here's some data...

Binary size:

- `main`: 30,384
- `charlie/match-phf`: 30,416

llvm-lines:

- `main`: 1,784,148
- `charlie/match-phf`: 1,789,877

llvm-lines and binary size are both unchanged (or, by < 5) when moving
from `u8` to `u32` return types, and even when moving to `char` keys and
values. I didn't expect this, but I'm not very knowledgable on this
topic.

Performance:

```
Confusables/match/src   time:   [4.9102 µs 4.9352 µs 4.9777 µs]
                        change: [+1.7469% +2.2421% +2.8710%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
  2 (2.00%) low mild
  4 (4.00%) high mild
  6 (6.00%) high severe
Confusables/match-with-skip/src
                        time:   [2.0676 µs 2.0945 µs 2.1317 µs]
                        change: [+0.9384% +1.6000% +2.3920%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
  3 (3.00%) high mild
  5 (5.00%) high severe
Confusables/phf/src     time:   [31.087 µs 31.188 µs 31.305 µs]
                        change: [+1.9262% +2.2188% +2.5496%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 15 outliers among 100 measurements (15.00%)
  3 (3.00%) low mild
  6 (6.00%) high mild
  6 (6.00%) high severe
Confusables/phf-with-skip/src
                        time:   [2.0470 µs 2.0486 µs 2.0502 µs]
                        change: [-0.3093% -0.1446% +0.0106%] (p = 0.08 > 0.05)
                        No change in performance detected.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe
```

The `-with-skip` variants add our optimization which first checks
whether the character is ASCII. So `match` is way, way faster than PHF,
but it tends not to matter since almost all source code is ASCII anyway.
2023-07-24 02:23:36 +00:00
..
benchmarks markdownlint: enforce 100 char max length (#4698) 2023-05-28 22:45:56 -04:00
_utils.py Use __future__ imports in scripts (#5301) 2023-06-22 11:40:16 -04:00
add_plugin.py Port over some fixes from #3747 (#5940) 2023-07-21 03:55:01 +00:00
add_rule.py Create snake_case file if linter is Pylint (#5948) 2023-07-21 22:13:43 -04:00
check_docs_formatted.py [perflint] Add PERF401 and PERF402 rules (#5298) 2023-07-03 04:03:09 +00:00
check_ecosystem.py Use permalinks in ecosystem diff references (#5704) 2023-07-12 01:26:37 -05:00
Dockerfile.ecosystem Remove outdated feature flag from Dockerfile.ecosystem (#4620) 2023-05-24 08:19:08 +00:00
ecosystem_all_check.py Update scripts/ecosystem_all_check.sh (#5737) 2023-07-13 15:25:22 +02:00
ecosystem_all_check.sh Update scripts/ecosystem_all_check.sh (#5737) 2023-07-13 15:25:22 +02:00
ecosystem_all_check_entrypoint.sh Make ecosystem all check more generic (#4629) 2023-05-24 16:26:23 +02:00
generate_known_standard_library.py Remove HashMap and HashSet for known-standard-library detection (#5345) 2023-06-23 19:59:03 +00:00
generate_mkdocs.py Move some MkDocs responsibilities around (#5542) 2023-07-05 22:06:01 +00:00
pyproject.toml Check for Any in other types for ANN401 (#5601) 2023-07-13 18:19:27 +05:30
transform_readme.py Use __future__ imports in scripts (#5301) 2023-06-22 11:40:16 -04:00
update_ambiguous_characters.py Use match instead of phf for confusable lookup (#5953) 2023-07-24 02:23:36 +00:00
update_schemastore.py Use SSH clones in update_schemastore.py (#5322) 2023-06-23 09:50:10 -04:00