uv/crates/pep508-rs
konsti d9dbb8a4af
Support conflicting URL in separate forks (#4435)
Downstack PR: #4481

## Introduction

We support forking the dependency resolution to support conflicting
registry requirements for different platforms, say on package range is
required for an older python version while a newer is required for newer
python versions, or dependencies that are different per platform. We
need to extend this support to direct URL requirements.

```toml
dependencies = [
  "iniconfig @ 62565a6e1c/iniconfig-2.0.0-py3-none-any.whl ; python_version >= '3.12'",
  "iniconfig @ b3c12c6d70/iniconfig-1.1.1-py2.py3-none-any.whl ; python_version < '3.12'"
]
```

This did not work because `Urls` was built on the assumption that there
is a single allowed URL per package. We collect all allowed URL ahead of
resolution by following direct URL dependencies (including path
dependencies) transitively, i.e. a registry distribution can't require a
URL.

## The same package can have Registry and URL requirements

Consider the following two cases:

requirements.in:
```text
werkzeug==2.0.0
werkzeug @ 960bb4017c/Werkzeug-2.0.0-py3-none-any.whl
```
pyproject.toml:
```toml
dependencies = [
  "iniconfig == 1.1.1 ; python_version < '3.12'",
  "iniconfig @ git+https://github.com/pytest-dev/iniconfig@93f5930e668c0d1ddf4597e38dd0dea4e2665e7a ; python_version >= '3.12'",
]
```

In the first case, we want the URL to override the registry dependency,
in the second case we want to fork and have one branch use the registry
and the other the URL. We have to know about this in
`PubGrubRequirement::from_registry_requirement`, but we only fork after
the current method.

Consider the following case too:

a:
```
c==1.0.0
b @ https://b.zip
```
b:
```
c @ https://c_new.zip ; python_version >= '3.12'",
c @ https://c_old.zip ; python_version < '3.12'",
```

When we convert the requirements of `a`, we can't know the url of `c`
yet. The solution is to remove the `Url` from `PubGrubPackage`: The
`Url` is redundant with `PackageName`, there can be only one url per
package name per fork. We now do the following: We track the urls from
requirements in `PubGrubDependency`. After forking, we call
`add_package_version_dependencies` where we apply override URLs, check
if the URL is allowed and check if the url is unique in this fork. When
we request a distribution, we ask the fork urls for the real URL. Since
we prioritize url dependencies over registry dependencies and skip
packages with `Urls` entries in pre-visiting, we know that when fetching
a package, we know if it has a url or not.

## URL conflicts

pyproject.toml (invalid):
```toml
dependencies = [
  "iniconfig @ e96292c7f7/iniconfig-1.1.0.tar.gz",
  "iniconfig @ b3c12c6d70/iniconfig-1.1.1-py2.py3-none-any.whl ; python_version < '3.12'",
  "iniconfig @ 62565a6e1c/iniconfig-2.0.0-py3-none-any.whl ; python_version >= '3.12'",
]
```

On the fork state, we keep `ForkUrls` that check for conflicts after
forking, rejecting the third case because we added two packages of the
same name with different URLs.

We need to flatten out the requirements before transformation into
pubgrub requirements to get the full list of other requirements which
may contain a URL, which was changed in a previous PR: #4430.

## Complex Example

a:
```toml
dependencies = [
  # Force a split
  "anyio==4.3.0 ; python_version >= '3.12'",
  "anyio==4.2.0 ; python_version < '3.12'",
  # Include URLs transitively
  "b"
]
```
b:
```toml
dependencies = [
  # Only one is used in each split.
  "b1 ; python_version < '3.12'",
  "b2 ; python_version >= '3.12'",
  "b3 ; python_version >= '3.12'",
]
```
b1:
```toml
dependencies = [
  "iniconfig @ b3c12c6d70/iniconfig-1.1.1-py2.py3-none-any.whl",
]
```
b2:
```toml
dependencies = [
  "iniconfig @ 62565a6e1c/iniconfig-2.0.0-py3-none-any.whl",
]
```
b3:
```toml
dependencies = [
  "iniconfig @ e96292c7f7/iniconfig-1.1.0.tar.gz",
]
```

In this example, all packages are url requirements (directory
requirements) and the root package is `a`. We first split on `a`, `b`
being in each split. In the first fork, we reach `b1`, the fork URLs are
empty, we insert the iniconfig 1.1.1 URL, and then we skip over `b2` and
`b3` since the mark is disjoint with the fork markers. In the second
fork, we skip over `b1`, visit `b2`, insert the iniconfig 2.0.0 URL into
the again empty fork URLs, then visit `b3` and try to insert the
iniconfig 1.1.0 URL. At this point we find a conflict for the iniconfig
URL and error.

## Closing

The git tests are slow, but they make the best example for different URL
types i could find.

Part of #3927. This PR does not handle `Locals` or pre-releases yet.
2024-06-26 13:58:23 +02:00
..
src Support conflicting URL in separate forks (#4435) 2024-06-26 13:58:23 +02:00
Cargo.lock Copy over pep508-rs crate (#31) 2023-10-06 20:12:19 -04:00
Cargo.toml Enable workspace lint configuration in remaining crates (#4329) 2024-06-18 03:02:28 +00:00
License-Apache Copy over pep508-rs crate (#31) 2023-10-06 20:12:19 -04:00
License-BSD Copy over pep508-rs crate (#31) 2023-10-06 20:12:19 -04:00
Readme.md Copy over pep508-rs crate (#31) 2023-10-06 20:12:19 -04:00

Dependency specifiers (PEP 508) in Rust

Crates.io PyPI

A library for python dependency specifiers, better known as PEP 508.

Usage

In Rust

use std::str::FromStr;
use pep508_rs::Requirement;

let marker = r#"requests [security,tests] >= 2.8.1, == 2.8.* ; python_version > "3.8""#;
let dependency_specification = Requirement::from_str(marker).unwrap();
assert_eq!(dependency_specification.name, "requests");
assert_eq!(dependency_specification.extras, Some(vec!["security".to_string(), "tests".to_string()]));

In Python

from pep508_rs import Requirement

requests = Requirement(
    'requests [security,tests] >= 2.8.1, == 2.8.* ; python_version > "3.8"'
)
assert requests.name == "requests"
assert requests.extras == ["security", "tests"]
assert [str(i) for i in requests.version_or_url] == [">= 2.8.1", "== 2.8.*"]

Python bindings are built with maturin, but you can also use the normal pip install .

Version and VersionSpecifier from pep440_rs are reexported to avoid type mismatches.

Markers

Markers allow you to install dependencies only in specific environments (python version, operating system, architecture, etc.) or when a specific feature is activated. E.g. you can say importlib-metadata ; python_version < "3.8" or itsdangerous (>=1.1.0) ; extra == 'security'. Unfortunately, the marker grammar has some oversights (e.g. https://github.com/pypa/packaging.python.org/pull/1181) and the design of comparisons (PEP 440 comparisons with lexicographic fallback) leads to confusing outcomes. This implementation tries to carefully validate everything and emit warnings whenever bogus comparisons with unintended semantics are made.

In python, warnings are by default sent to the normal python logging infrastructure:

from pep508_rs import Requirement, MarkerEnvironment

env = MarkerEnvironment.current()
assert not Requirement("numpy; extra == 'science'").evaluate_markers(env, [])
assert Requirement("numpy; extra == 'science'").evaluate_markers(env, ["science"])
assert not Requirement(
    "numpy; extra == 'science' and extra == 'arrays'"
).evaluate_markers(env, ["science"])
assert Requirement(
    "numpy; extra == 'science' or extra == 'arrays'"
).evaluate_markers(env, ["science"])
from pep508_rs import Requirement, MarkerEnvironment

env = MarkerEnvironment.current()
Requirement("numpy; python_version >= '3.9.'").evaluate_markers(env, [])
# This will log: 
# "Expected PEP 440 version to compare with python_version, found '3.9.', "
# "evaluating to false: Version `3.9.` doesn't match PEP 440 rules"