ENH copyright-notice: check in the first 4096 bytes instead of 1024 (#11927)

<!--
Thank you for contributing to Ruff! To help us out with reviewing,
please consider the following:

- Does this pull request include a summary of the change? (See below.)
- Does this pull request include a descriptive title?
- Does this pull request include references to any relevant issues?
-->

## Summary
related to https://github.com/astral-sh/ruff/issues/5306

The check right now only checks in the first 1024 bytes, and that's
really not enough when there's a docstring at the beginning of a file.

A more proper fix might be needed, which might be more complex (and I
don't have the `rust` skills to implement that). But this temporary
"fix" might enable more users to use this.

Context: We want to use this rule in
https://github.com/scikit-learn/scikit-learn/ and we got blocked because
of this hardcoded rule (which TBH took us quite a while to figure out
why it was failing since it's not documented).

## Test Plan

This is already kinda tested, modified the test for the new byte number.

<!-- How was it tested? -->
This commit is contained in:
Adrin Jalali 2024-06-18 18:04:34 +02:00 committed by GitHub
parent 1d73d60bd3
commit 2e7c3454e0
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
2 changed files with 34 additions and 2 deletions

View file

@ -295,6 +295,36 @@ import os
# Content Content Content Content Content Content Content Content Content Content
# Content Content Content Content Content Content Content Content Content Content
# Content Content Content Content Content Content Content Content Content Content
# Content Content Content Content Content Content Content Content Content Content
# Content Content Content Content Content Content Content Content Content Content
# Content Content Content Content Content Content Content Content Content Content
# Content Content Content Content Content Content Content Content Content Content
# Content Content Content Content Content Content Content Content Content Content
# Content Content Content Content Content Content Content Content Content Content
# Content Content Content Content Content Content Content Content Content Content
# Content Content Content Content Content Content Content Content Content Content
# Content Content Content Content Content Content Content Content Content Content
# Content Content Content Content Content Content Content Content Content Content
# Content Content Content Content Content Content Content Content Content Content
# Content Content Content Content Content Content Content Content Content Content
# Content Content Content Content Content Content Content Content Content Content
# Content Content Content Content Content Content Content Content Content Content
# Content Content Content Content Content Content Content Content Content Content
# Content Content Content Content Content Content Content Content Content Content
# Content Content Content Content Content Content Content Content Content Content
# Content Content Content Content Content Content Content Content Content Content
# Content Content Content Content Content Content Content Content Content Content
# Content Content Content Content Content Content Content Content Content Content
# Content Content Content Content Content Content Content Content Content Content
# Content Content Content Content Content Content Content Content Content Content
# Content Content Content Content Content Content Content Content Content Content
# Content Content Content Content Content Content Content Content Content Content
# Content Content Content Content Content Content Content Content Content Content
# Content Content Content Content Content Content Content Content Content Content
# Content Content Content Content Content Content Content Content Content Content
# Content Content Content Content Content Content Content Content Content Content
# Content Content Content Content Content Content Content Content Content Content
# Content Content Content Content Content Content Content Content Content Content
# Copyright 2023
"

View file

@ -8,6 +8,8 @@ use crate::settings::LinterSettings;
/// ## What it does
/// Checks for the absence of copyright notices within Python files.
///
/// Note that this check only searches within the first 4096 bytes of the file.
///
/// ## Why is this bad?
/// In some codebases, it's common to have a license header at the top of every
/// file. This rule ensures that the license header is present.
@ -31,8 +33,8 @@ pub(crate) fn missing_copyright_notice(
return None;
}
// Only search the first 1024 bytes in the file.
let contents = locator.up_to(locator.floor_char_boundary(TextSize::new(1024)));
// Only search the first 4096 bytes in the file.
let contents = locator.up_to(locator.floor_char_boundary(TextSize::new(4096)));
// Locate the copyright notice.
if let Some(match_) = settings.flake8_copyright.notice_rgx.find(contents) {