[red-knot] Add support for knot check <paths> (#16375)

## Summary

This PR adds support for an optional list of paths that should be
checked to `knot check`.

E.g. to only check the `src` directory

```sh
knot check src
```

The default is to check all files in the project but users can reduce
the included files by specifying one or multiple optional paths.

The main two challenges with adding this feature were:

* We now need to show an error when one of the provided paths doesn't
exist. That's why this PR now collects errors from the project file
indexing phase and adds them to the output diagnostics. The diagnostic
looks similar to ruffs (see CLI test)
* The CLI should pick up new files added to included folders. For
example, `knot check src --watch` should pick up new files that are
added to the `src` folder. This requires that we now filter the files
before adding them to the project. This is a good first step to
supporting `include` and `exclude`.


The PR makes two simplifications:

1. I didn't test the changes with case-insensitive file systems. We may
need to do some extra path normalization to support those well. See
https://github.com/astral-sh/ruff/issues/16400
2. Ideally, we'd accumulate the IO errors from the initial indexing
phase and subsequent incremental indexing operations. For example, we
should preserve the IO diagnostic for a non existing `test.py` if it was
specified as an explicit CLI argument until the file gets created and we
should show it again when the file gets deleted. However, this is
somewhat complicated because we'd need to track which files we revisited
(or were removed because the entire directory is gone). I considered
this too low a priority as it's worth dealing with right now.

The implementation doesn't support symlinks within the project but that
is the same as Ruff and is unchanged from before this PR.



Closes https://github.com/astral-sh/ruff/issues/14193

## Test Plan

Added CLI and file watching integration tests. Manually testing.
This commit is contained in:
Micha Reiser 2025-03-03 12:59:56 +00:00 committed by GitHub
parent 5d56c2e877
commit c4578162d5
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
10 changed files with 1065 additions and 418 deletions

View file

@ -1,6 +1,7 @@
#![allow(clippy::ref_option)]
use crate::metadata::options::OptionDiagnostic;
use crate::walk::{ProjectFilesFilter, ProjectFilesWalker};
pub use db::{Db, ProjectDatabase};
use files::{Index, Indexed, IndexedFiles};
use metadata::settings::Settings;
@ -9,23 +10,23 @@ use red_knot_python_semantic::lint::{LintRegistry, LintRegistryBuilder, RuleSele
use red_knot_python_semantic::register_lints;
use red_knot_python_semantic::types::check_types;
use ruff_db::diagnostic::{Diagnostic, DiagnosticId, ParseDiagnostic, Severity, Span};
use ruff_db::files::{system_path_to_file, File};
use ruff_db::files::File;
use ruff_db::parsed::parsed_module;
use ruff_db::source::{source_text, SourceTextError};
use ruff_db::system::walk_directory::WalkState;
use ruff_db::system::{FileType, SystemPath};
use ruff_python_ast::PySourceType;
use rustc_hash::{FxBuildHasher, FxHashSet};
use ruff_db::system::{SystemPath, SystemPathBuf};
use rustc_hash::FxHashSet;
use salsa::Durability;
use salsa::Setter;
use std::borrow::Cow;
use std::sync::Arc;
use thiserror::Error;
pub mod combine;
mod db;
mod files;
pub mod metadata;
mod walk;
pub mod watch;
pub static DEFAULT_LINT_REGISTRY: std::sync::LazyLock<LintRegistry> =
@ -71,6 +72,30 @@ pub struct Project {
#[return_ref]
pub settings: Settings,
/// The paths that should be included when checking this project.
///
/// The default (when this list is empty) is to include all files in the project root
/// (that satisfy the configured include and exclude patterns).
/// However, it's sometimes desired to only check a subset of the project, e.g. to see
/// the diagnostics for a single file or a folder.
///
/// This list gets initialized by the paths passed to `knot check <paths>`
///
/// ## How is this different from `open_files`?
///
/// The `included_paths` is closely related to `open_files`. The only difference is that
/// `open_files` is already a resolved set of files whereas `included_paths` is only a list of paths
/// that are resolved to files by indexing them. The other difference is that
/// new files added to any directory in `included_paths` will be indexed and added to the project
/// whereas `open_files` needs to be updated manually (e.g. by the IDE).
///
/// In short, `open_files` is cheaper in contexts where the set of files is known, like
/// in an IDE when the user only wants to check the open tabs. This could be modeled
/// with `included_paths` too but it would require an explicit walk dir step that's simply unnecessary.
#[default]
#[return_ref]
included_paths_list: Vec<SystemPathBuf>,
/// Diagnostics that were generated when resolving the project settings.
#[return_ref]
settings_diagnostics: Vec<OptionDiagnostic>,
@ -106,6 +131,16 @@ impl Project {
self.settings(db).to_rules()
}
/// Returns `true` if `path` is both part of the project and included (see `included_paths_list`).
///
/// Unlike [Self::files], this method does not respect `.gitignore` files. It only checks
/// the project's include and exclude settings as well as the paths that were passed to `knot check <paths>`.
/// This means, that this method is an over-approximation of `Self::files` and may return `true` for paths
/// that won't be included when checking the project because they're ignored in a `.gitignore` file.
pub fn is_path_included(self, db: &dyn Db, path: &SystemPath) -> bool {
ProjectFilesFilter::from_project(db, self).is_included(path)
}
pub fn reload(self, db: &mut dyn Db, metadata: ProjectMetadata) {
tracing::debug!("Reloading project");
assert_eq!(self.root(db), metadata.root());
@ -140,6 +175,13 @@ impl Project {
diagnostic
}));
let files = ProjectFiles::new(db, self);
diagnostics.extend(files.diagnostics().iter().cloned().map(|diagnostic| {
let diagnostic: Box<dyn Diagnostic> = Box::new(diagnostic);
diagnostic
}));
let result = Arc::new(std::sync::Mutex::new(diagnostics));
let inner_result = Arc::clone(&result);
@ -147,7 +189,6 @@ impl Project {
let project_span = project_span.clone();
rayon::scope(move |scope| {
let files = ProjectFiles::new(&db, self);
for file in &files {
let result = inner_result.clone();
let db = db.clone();
@ -207,6 +248,30 @@ impl Project {
removed
}
pub fn set_included_paths(self, db: &mut dyn Db, paths: Vec<SystemPathBuf>) {
tracing::debug!("Setting included paths: {paths}", paths = paths.len());
self.set_included_paths_list(db).to(paths);
self.reload_files(db);
}
/// Returns the paths that should be checked.
///
/// The default is to check the entire project in which case this method returns
/// the project root. However, users can specify to only check specific sub-folders or
/// even files of a project by using `knot check <paths>`. In that case, this method
/// returns the provided absolute paths.
///
/// Note: The CLI doesn't prohibit users from specifying paths outside the project root.
/// This can be useful to check arbitrary files, but it isn't something we recommend.
/// We should try to support this use case but it's okay if there are some limitations around it.
fn included_paths_or_root(self, db: &dyn Db) -> &[SystemPathBuf] {
match &**self.included_paths_list(db) {
[] => std::slice::from_ref(&self.metadata(db).root),
paths => paths,
}
}
/// Returns the open files in the project or `None` if the entire project should be checked.
pub fn open_files(self, db: &dyn Db) -> Option<&FxHashSet<File>> {
self.open_fileset(db).as_deref()
@ -289,6 +354,17 @@ impl Project {
index.insert(file);
}
/// Replaces the diagnostics from indexing the project files with `diagnostics`.
///
/// This is a no-op if the project files haven't been indexed yet.
pub fn replace_index_diagnostics(self, db: &mut dyn Db, diagnostics: Vec<IOErrorDiagnostic>) {
let Some(mut index) = IndexedFiles::indexed_mut(db, self) else {
return;
};
index.set_diagnostics(diagnostics);
}
/// Returns the files belonging to this project.
pub fn files(self, db: &dyn Db) -> Indexed<'_> {
let files = self.file_set(db);
@ -296,12 +372,14 @@ impl Project {
let indexed = match files.get() {
Index::Lazy(vacant) => {
let _entered =
tracing::debug_span!("Project::index_files", package = %self.name(db))
tracing::debug_span!("Project::index_files", project = %self.name(db))
.entered();
let files = discover_project_files(db, self);
tracing::info!("Found {} files in project `{}`", files.len(), self.name(db));
vacant.set(files)
let walker = ProjectFilesWalker::new(db);
let (files, diagnostics) = walker.collect_set(db);
tracing::info!("Indexed {} file(s)", files.len());
vacant.set(files, diagnostics)
}
Index::Indexed(indexed) => indexed,
};
@ -327,8 +405,8 @@ fn check_file_impl(db: &dyn Db, file: File) -> Vec<Box<dyn Diagnostic>> {
if let Some(read_error) = source.read_error() {
diagnostics.push(Box::new(IOErrorDiagnostic {
file,
error: read_error.clone(),
file: Some(file),
error: read_error.clone().into(),
}));
return diagnostics;
}
@ -355,53 +433,6 @@ fn check_file_impl(db: &dyn Db, file: File) -> Vec<Box<dyn Diagnostic>> {
diagnostics
}
fn discover_project_files(db: &dyn Db, project: Project) -> FxHashSet<File> {
let paths = std::sync::Mutex::new(Vec::new());
db.system().walk_directory(project.root(db)).run(|| {
Box::new(|entry| {
match entry {
Ok(entry) => {
// Skip over any non python files to avoid creating too many entries in `Files`.
match entry.file_type() {
FileType::File => {
if entry
.path()
.extension()
.and_then(PySourceType::try_from_extension)
.is_some()
{
let mut paths = paths.lock().unwrap();
paths.push(entry.into_path());
}
}
FileType::Directory | FileType::Symlink => {}
}
}
Err(error) => {
// TODO Handle error
tracing::error!("Failed to walk path: {error}");
}
}
WalkState::Continue
})
});
let paths = paths.into_inner().unwrap();
let mut files = FxHashSet::with_capacity_and_hasher(paths.len(), FxBuildHasher);
for path in paths {
// If this returns `None`, then the file was deleted between the `walk_directory` call and now.
// We can ignore this.
if let Ok(file) = system_path_to_file(db.upcast(), &path) {
files.insert(file);
}
}
files
}
#[derive(Debug)]
enum ProjectFiles<'a> {
OpenFiles(&'a FxHashSet<File>),
@ -416,6 +447,13 @@ impl<'a> ProjectFiles<'a> {
ProjectFiles::Indexed(project.files(db))
}
}
fn diagnostics(&self) -> &[IOErrorDiagnostic] {
match self {
ProjectFiles::OpenFiles(_) => &[],
ProjectFiles::Indexed(indexed) => indexed.diagnostics(),
}
}
}
impl<'a> IntoIterator for &'a ProjectFiles<'a> {
@ -448,10 +486,10 @@ impl Iterator for ProjectFilesIter<'_> {
}
}
#[derive(Debug)]
#[derive(Debug, Clone)]
pub struct IOErrorDiagnostic {
file: File,
error: SourceTextError,
file: Option<File>,
error: IOErrorKind,
}
impl Diagnostic for IOErrorDiagnostic {
@ -464,7 +502,7 @@ impl Diagnostic for IOErrorDiagnostic {
}
fn span(&self) -> Option<Span> {
Some(Span::from(self.file))
self.file.map(Span::from)
}
fn severity(&self) -> Severity {
@ -472,6 +510,15 @@ impl Diagnostic for IOErrorDiagnostic {
}
}
#[derive(Error, Debug, Clone)]
enum IOErrorKind {
#[error(transparent)]
Walk(#[from] walk::WalkError),
#[error(transparent)]
SourceText(#[from] SourceTextError),
}
#[cfg(test)]
mod tests {
use crate::db::tests::TestDb;