mirror of
https://github.com/astral-sh/ruff.git
synced 2025-09-29 13:25:17 +00:00

## Summary This PR fixes a bug where the checker would require the tokens for an invalid offset w.r.t. the source code. Taking the source code from the linked issue as an example: ```py relese_version :"0.0is 64" ``` Now, this isn't really a valid type annotation but that's what this PR is fixing. Regardless of whether it's valid or not, Ruff shouldn't panic. The checker would visit the parsed type annotation (`0.0is 64`) and try to detect any violations. Certain rule logic requests the tokens for the same but it would fail because the lexer would only have the `String` token considering original source code. This worked before because the lexer was invoked again for each rule logic. The solution is to store the parsed type annotation on the checker if it's in a typing context and use the tokens from that instead if it's available. This is enforced by creating a new API on the checker to get the tokens. But, this means that there are two ways to get the tokens via the checker API. I want to restrict this in a follow-up PR (#11741) to only expose `tokens` and `comment_ranges` as methods and restrict access to the parsed source code. fixes: #11736 ## Test Plan - [x] Add a test case for `F632` rule and update the snapshot - [x] Check all affected rules - [x] No ecosystem changes
86 lines
2.1 KiB
Python
86 lines
2.1 KiB
Python
# ASCII literals should be replaced by a bytes literal
|
|
"foo".encode("utf-8") # b"foo"
|
|
"foo".encode("u8") # b"foo"
|
|
"foo".encode() # b"foo"
|
|
"foo".encode("UTF8") # b"foo"
|
|
U"foo".encode("utf-8") # b"foo"
|
|
"foo".encode(encoding="utf-8") # b"foo"
|
|
"""
|
|
Lorem
|
|
|
|
Ipsum
|
|
""".encode(
|
|
"utf-8"
|
|
)
|
|
(
|
|
"Lorem "
|
|
"Ipsum".encode()
|
|
)
|
|
(
|
|
"Lorem " # Comment
|
|
"Ipsum".encode() # Comment
|
|
)
|
|
(
|
|
"Lorem " "Ipsum".encode()
|
|
)
|
|
|
|
# `encode` on variables should not be processed.
|
|
string = "hello there"
|
|
string.encode("utf-8")
|
|
|
|
bar = "bar"
|
|
f"foo{bar}".encode("utf-8")
|
|
encoding = "latin"
|
|
"foo".encode(encoding)
|
|
f"foo{bar}".encode(encoding)
|
|
f"{a=} {b=}".encode(
|
|
"utf-8",
|
|
)
|
|
|
|
# `encode` with custom args and kwargs should not be processed.
|
|
"foo".encode("utf-8", errors="replace")
|
|
"foo".encode("utf-8", "replace")
|
|
"foo".encode(errors="replace")
|
|
"foo".encode(encoding="utf-8", errors="replace")
|
|
|
|
# `encode` with custom args and kwargs on unicode should not be processed.
|
|
"unicode text©".encode("utf-8", errors="replace")
|
|
"unicode text©".encode("utf-8", "replace")
|
|
"unicode text©".encode(errors="replace")
|
|
"unicode text©".encode(encoding="utf-8", errors="replace")
|
|
|
|
# Unicode literals should only be stripped of default encoding.
|
|
"unicode text©".encode("utf-8") # "unicode text©".encode()
|
|
"unicode text©".encode()
|
|
"unicode text©".encode(encoding="UTF8") # "unicode text©".encode()
|
|
|
|
r"foo\o".encode("utf-8") # br"foo\o"
|
|
u"foo".encode("utf-8") # b"foo"
|
|
R"foo\o".encode("utf-8") # br"foo\o"
|
|
U"foo".encode("utf-8") # b"foo"
|
|
print("foo".encode()) # print(b"foo")
|
|
|
|
# `encode` on parenthesized strings.
|
|
(
|
|
"abc"
|
|
"def"
|
|
).encode()
|
|
|
|
((
|
|
"abc"
|
|
"def"
|
|
)).encode()
|
|
|
|
(f"foo{bar}").encode("utf-8")
|
|
(f"foo{bar}").encode(encoding="utf-8")
|
|
("unicode text©").encode("utf-8")
|
|
("unicode text©").encode(encoding="utf-8")
|
|
|
|
|
|
# Regression test for: https://github.com/astral-sh/ruff/issues/7455#issuecomment-1722459882
|
|
def _match_ignore(line):
|
|
input=stdin and'\n'.encode()or None
|
|
|
|
# Not a valid type annotation but this test shouldn't result in a panic.
|
|
# Refer: https://github.com/astral-sh/ruff/issues/11736
|
|
x: '"foo".encode("utf-8")'
|