ruff/crates/ruff_linter/resources/test/fixtures/pyupgrade/UP012.py
Dhruv Manilawala b021b5babe
Use Tokens from parsed type annotation or parsed source (#11740)
## Summary

This PR fixes a bug where the checker would require the tokens for an
invalid offset w.r.t. the source code.

Taking the source code from the linked issue as an example:
```py
relese_version :"0.0is 64"
```

Now, this isn't really a valid type annotation but that's what this PR
is fixing. Regardless of whether it's valid or not, Ruff shouldn't
panic.

The checker would visit the parsed type annotation (`0.0is 64`) and try
to detect any violations. Certain rule logic requests the tokens for the
same but it would fail because the lexer would only have the `String`
token considering original source code. This worked before because the
lexer was invoked again for each rule logic.

The solution is to store the parsed type annotation on the checker if
it's in a typing context and use the tokens from that instead if it's
available. This is enforced by creating a new API on the checker to get
the tokens.

But, this means that there are two ways to get the tokens via the
checker API. I want to restrict this in a follow-up PR (#11741) to only
expose `tokens` and `comment_ranges` as methods and restrict access to
the parsed source code.

fixes: #11736 

## Test Plan

- [x] Add a test case for `F632` rule and update the snapshot
- [x] Check all affected rules
- [x] No ecosystem changes
2024-06-05 07:50:33 +00:00

86 lines
2.1 KiB
Python

# ASCII literals should be replaced by a bytes literal
"foo".encode("utf-8") # b"foo"
"foo".encode("u8") # b"foo"
"foo".encode() # b"foo"
"foo".encode("UTF8") # b"foo"
U"foo".encode("utf-8") # b"foo"
"foo".encode(encoding="utf-8") # b"foo"
"""
Lorem
Ipsum
""".encode(
"utf-8"
)
(
"Lorem "
"Ipsum".encode()
)
(
"Lorem " # Comment
"Ipsum".encode() # Comment
)
(
"Lorem " "Ipsum".encode()
)
# `encode` on variables should not be processed.
string = "hello there"
string.encode("utf-8")
bar = "bar"
f"foo{bar}".encode("utf-8")
encoding = "latin"
"foo".encode(encoding)
f"foo{bar}".encode(encoding)
f"{a=} {b=}".encode(
"utf-8",
)
# `encode` with custom args and kwargs should not be processed.
"foo".encode("utf-8", errors="replace")
"foo".encode("utf-8", "replace")
"foo".encode(errors="replace")
"foo".encode(encoding="utf-8", errors="replace")
# `encode` with custom args and kwargs on unicode should not be processed.
"unicode text©".encode("utf-8", errors="replace")
"unicode text©".encode("utf-8", "replace")
"unicode text©".encode(errors="replace")
"unicode text©".encode(encoding="utf-8", errors="replace")
# Unicode literals should only be stripped of default encoding.
"unicode text©".encode("utf-8") # "unicode text©".encode()
"unicode text©".encode()
"unicode text©".encode(encoding="UTF8") # "unicode text©".encode()
r"foo\o".encode("utf-8") # br"foo\o"
u"foo".encode("utf-8") # b"foo"
R"foo\o".encode("utf-8") # br"foo\o"
U"foo".encode("utf-8") # b"foo"
print("foo".encode()) # print(b"foo")
# `encode` on parenthesized strings.
(
"abc"
"def"
).encode()
((
"abc"
"def"
)).encode()
(f"foo{bar}").encode("utf-8")
(f"foo{bar}").encode(encoding="utf-8")
("unicode text©").encode("utf-8")
("unicode text©").encode(encoding="utf-8")
# Regression test for: https://github.com/astral-sh/ruff/issues/7455#issuecomment-1722459882
def _match_ignore(line):
input=stdin and'\n'.encode()or None
# Not a valid type annotation but this test shouldn't result in a panic.
# Refer: https://github.com/astral-sh/ruff/issues/11736
x: '"foo".encode("utf-8")'