Add support for URL dependencies (#251)

## Summary

This PR adds support for resolving and installing dependencies via
direct URLs, like:

```
werkzeug @ 960bb4017c/Werkzeug-2.0.0-py3-none-any.whl
```

These are fairly common (e.g., with `torch`), but you most often see
them as Git dependencies.

Broadly, structs like `RemoteDistribution` and friends are now enums
that can represent either registry-based dependencies or URL-based
dependencies:

```rust
/// A built distribution (wheel) that exists as a remote file (e.g., on `PyPI`).
#[derive(Debug, Clone)]
#[allow(clippy::large_enum_variant)]
pub enum RemoteDistribution {
    /// The distribution exists in a registry, like `PyPI`.
    Registry(PackageName, Version, File),
    /// The distribution exists at an arbitrary URL.
    Url(PackageName, Url),
}
```

In the resolver, we now allow packages to take on an extra, optional
`Url` field:

```rust
#[derive(Debug, Clone, Eq, Derivative)]
#[derivative(PartialEq, Hash)]
pub enum PubGrubPackage {
    Root,
    Package(
        PackageName,
        Option<DistInfoName>,
        #[derivative(PartialEq = "ignore")]
        #[derivative(PartialOrd = "ignore")]
        #[derivative(Hash = "ignore")]
        Option<Url>,
    ),
}
```

However, for the purpose of version satisfaction, we ignore the URL.
This allows for the URL dependency to satisfy the transitive request in
cases like:

```
flask==3.0.0
werkzeug @ 254c3e9b5f/werkzeug-3.0.1-py3-none-any.whl
```

There are a couple limitations in the current approach:

- The caching for remote URLs is done separately in the resolver vs. the
installer. I decided not to sweat this too much... We need to figure out
caching holistically.
- We don't support any sort of time-based cache for remote URLs -- they
just exist forever. This will be a problem for URL dependencies, where
we need some way to evict and refresh them. But I've deferred it for
now.
- I think I need to redo how this is modeled in the resolver, because
right now, we don't detect a variety of invalid cases, e.g., providing
two different URLs for a dependency, asking for a URL dependency and a
_different version_ of the same dependency in the list of first-party
dependencies, etc.
- (We don't yet support VCS dependencies.)
This commit is contained in:
Charlie Marsh 2023-11-01 06:21:44 -07:00 committed by GitHub
parent fa9f8df396
commit 2652caa3e3
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
44 changed files with 1334 additions and 369 deletions

View file

@ -3,6 +3,7 @@ use std::str::FromStr;
use pep440_rs::Version;
use thiserror::Error;
use url::Url;
use platform_tags::Tags;
@ -134,6 +135,29 @@ impl WheelFilename {
}
}
impl TryFrom<&Url> for WheelFilename {
type Error = WheelFilenameError;
fn try_from(url: &Url) -> Result<Self, Self::Error> {
let filename = url
.path_segments()
.ok_or_else(|| {
WheelFilenameError::InvalidWheelFileName(
url.to_string(),
"URL must have a path".to_string(),
)
})?
.last()
.ok_or_else(|| {
WheelFilenameError::InvalidWheelFileName(
url.to_string(),
"URL must contain a filename".to_string(),
)
})?;
Self::from_str(filename)
}
}
#[derive(Error, Debug)]
pub enum WheelFilenameError {
#[error("The wheel filename \"{0}\" is invalid: {1}")]