Commit graph

102 commits

Author SHA1 Message Date
Andrew Gallant
63f7f65190
change global allocator to jemalloc (and mimalloc on Windows) (#399)
This copies the allocator configuration used in the Ruff project. In
particular, this gives us an instant 10% win when resolving the top 1K
PyPI packages:

    $ hyperfine \
"./target/profiling/puffin-dev-main resolve-many --cache-dir
cache-docker-no-build --no-build pypi_top_8k_flat.txt --limit 1000 2>
/dev/null" \
"./target/profiling/puffin-dev resolve-many --cache-dir
cache-docker-no-build --no-build pypi_top_8k_flat.txt --limit 1000 2>
/dev/null"
Benchmark 1: ./target/profiling/puffin-dev-main resolve-many --cache-dir
cache-docker-no-build --no-build pypi_top_8k_flat.txt --limit 1000 2>
/dev/null
Time (mean ± σ): 974.2 ms ± 26.4 ms [User: 17503.3 ms, System: 2205.3
ms]
      Range (min … max):   943.5 ms … 1015.9 ms    10 runs

Benchmark 2: ./target/profiling/puffin-dev resolve-many --cache-dir
cache-docker-no-build --no-build pypi_top_8k_flat.txt --limit 1000 2>
/dev/null
Time (mean ± σ): 883.1 ms ± 23.3 ms [User: 14626.1 ms, System: 2542.2
ms]
      Range (min … max):   849.5 ms … 916.9 ms    10 runs

    Summary
'./target/profiling/puffin-dev resolve-many --cache-dir
cache-docker-no-build --no-build pypi_top_8k_flat.txt --limit 1000 2>
/dev/null' ran
1.10 ± 0.04 times faster than './target/profiling/puffin-dev-main
resolve-many --cache-dir cache-docker-no-build --no-build
pypi_top_8k_flat.txt --limit 1000 2> /dev/null'

I was moved to do this because I noticed `malloc`/`free` taking up a
fairly sizeable percentage of time during light profiling.

As is becoming a pattern, it will be easier to review this
commit-by-commit.

Ref #396 (wouldn't call this issue fixed)

-----

I did also try adding a `smallvec` optimization to the
`Version::release` field, but it didn't bare any fruit. I still think
there is more to explore since the results I observed don't quite line
up with what I expect. (So probably either my mental model is off or my
measurement process is flawed.) You can see that attempt with a little
more explanation here:
f9528b4ecd

In the course of adding the `smallvec` optimization, I also shrunk the
`Version` fields from a `usize` to a `u32`. They should at least be a
fixed size integer since version numbers aren't used to index memory,
and I shrunk it to `u32` since it seems reasonable to assume that all
version numbers will be smaller than `2^32`.
2023-11-10 14:48:59 -05:00
konsti
d8408b1783
Add source to failing metadata parsing (#387)
Before:
```
cargo run --bin puffin-dev -q -- resolve-cli "transformers[accelerate, agents, all, audio, codecarbon, deepspeed, deepspeed-testing, dev, dev-tensorflow, dev-torch, docs, docs_specific, flax, flax-speech, ftfy, integrations, ja, modelcreation, onnx, onnxruntime, optuna, quality, ray, retrieval, sagemaker, sentencepiece, serving, sigopt, sklearn, speech, testing, tf, tf-cpu, tf-speech, timm, tokenizers, torch, torch-speech, torch-vision, torchhub, video, vision]"
puffin-dev failed
  Caused by: No solution found when resolving: transformers[accelerate,agents,all,audio,codecarbon,deepspeed,deepspeed-testing,dev,dev-tensorflow,dev-torch,docs,docs-specific,flax,flax-speech,ftfy,integrations,ja,modelcreation,onnx,onnxruntime,optuna,quality,ray,retrieval,sagemaker,sentencepiece,serving,sigopt,sklearn,speech,testing,tf,tf-cpu,tf-speech,timm,tokenizers,torch,torch-speech,torch-vision,torchhub,video,vision]
  Caused by: Not a valid package or extra name: ".none". Names must start and end with a letter or digit and may only contain -, _, ., and alphanumeric characters
```
After:
```
cargo run --bin puffin-dev -q -- resolve-cli "transformers[accelerate, agents, all, audio, codecarbon, deepspeed, deepspeed-testing, dev, dev-tensorflow, dev-torch, docs, docs_specific, flax, flax-speech, ftfy, integrations, ja, modelcreation, onnx, onnxruntime, optuna, quality, ray, retrieval, sagemaker, sentencepiece, serving, sigopt, sklearn, speech, testing, tf, tf-cpu, tf-speech, timm, tokenizers, torch, torch-speech, torch-vision, torchhub, video, vision]"
puffin-dev failed
  Caused by: No solution found when resolving: transformers[accelerate,agents,all,audio,codecarbon,deepspeed,deepspeed-testing,dev,dev-tensorflow,dev-torch,docs,docs-specific,flax,flax-speech,ftfy,integrations,ja,modelcreation,onnx,onnxruntime,optuna,quality,ray,retrieval,sagemaker,sentencepiece,serving,sigopt,sklearn,speech,testing,tf,tf-cpu,tf-speech,timm,tokenizers,torch,torch-speech,torch-vision,torchhub,video,vision]
  Caused by: Couldn't parse metadata in fastapi-0.10.1-py3-none-any.whl (97ac91cb7c/fastapi-0.10.1-py3-none-any.whl)
  Caused by: Not a valid package or extra name: ".none". Names must start and end with a letter or digit and may only contain -, _, ., and alphanumeric characters
```
2023-11-10 18:33:49 +00:00
Charlie Marsh
b3edf7c2b2
Delete any directories listed in the RECORD file (#394)
## Summary

It looks like, when you install `pip`, it includes a bunch of
`__pycache__` directories in the RECORD file (although these directories
don't exist until you run `pip`). Our uninstaller assumed that the
RECORD file only contained _files_.

Closes https://github.com/astral-sh/puffin/issues/389.
2023-11-10 18:17:52 +00:00
Charlie Marsh
6a15950cb5
Rename Distribution to Dist in all structs and traits (#384)
We tend to avoid abbreviations, but this one is just so long and
absolutely ubiquitous.
2023-11-10 14:55:11 +00:00
konsti
5cef40d87a
Add proper caching for pypi metadata fetching kinds (#368)
I intend this to become the main form of caching for puffin: You can
make http requests, you tranform the data to what you really need, you
have control over the cache key, and the cache is always json (or
anything else much faster we want to replace it with as long as it's
serde!)
2023-11-10 11:03:40 +00:00
Charlie Marsh
a148f9d0be
Refactor distribution types to adhere to a clear hierarchy (#369)
## Summary

This PR refactors our `RemoteDistribution` type such that it now follows
a clear hierarchy that matches the actual variants, and encodes the
differences between source and built distributions:

```rust
pub enum Distribution {
    Built(BuiltDistribution),
    Source(SourceDistribution),
}

pub enum BuiltDistribution {
    Registry(RegistryBuiltDistribution),
    DirectUrl(DirectUrlBuiltDistribution),
}

pub enum SourceDistribution {
    Registry(RegistrySourceDistribution),
    DirectUrl(DirectUrlSourceDistribution),
    Git(GitSourceDistribution),
}

/// A built distribution (wheel) that exists in a registry, like `PyPI`.
pub struct RegistryBuiltDistribution {
    pub name: PackageName,
    pub version: Version,
    pub file: File,
}

/// A built distribution (wheel) that exists at an arbitrary URL.
pub struct DirectUrlBuiltDistribution {
    pub name: PackageName,
    pub url: Url,
}

/// A source distribution that exists in a registry, like `PyPI`.
pub struct RegistrySourceDistribution {
    pub name: PackageName,
    pub version: Version,
    pub file: File,
}

/// A source distribution that exists at an arbitrary URL.
pub struct DirectUrlSourceDistribution {
    pub name: PackageName,
    pub url: Url,
}

/// A source distribution that exists in a Git repository.
pub struct GitSourceDistribution {
    pub name: PackageName,
    pub url: Url,
}
```

Most of the PR just stems downstream from this change. There are no
behavioral changes, so I'm largely relying on lint, tests, and the
compiler for correctness.
2023-11-10 02:45:41 +00:00
konsti
d407bbbee6
Special case missing header build errors (on linux) (#354)
One of the most common errors i observed are build failures due to
missing header files. On ubuntu, this generally means that you need to
install some `<...>-dev` package that the documentation tells you about,
e.g. [mysqlclient](https://github.com/PyMySQL/mysqlclient#linux) needs
`default-libmysqlclient-dev`, [some psycopg
versions](https://www.psycopg.org/psycopg3/docs/basic/install.html#local-installation)
(i remember that this was always required at some earlier point) require
`libpq-dev` and pygraphviz wants `graphviz-dev`. This is quite common
for many scientific packages (where conda has an advantage because they
can provide those package as a dependency).

The error message can be completely inscrutable if you're just a python
programmer (or user) and not a c programmer (example: pygraphviz):

```
warning: no files found matching '*.png' under directory 'doc'
warning: no files found matching '*.txt' under directory 'doc'
warning: no files found matching '*.css' under directory 'doc'
warning: no previously-included files matching '*~' found anywhere in distribution
warning: no previously-included files matching '*.pyc' found anywhere in distribution
warning: no previously-included files matching '.svn' found anywhere in distribution
no previously-included directories found matching 'doc/build'
pygraphviz/graphviz_wrap.c:3020:10: fatal error: graphviz/cgraph.h: No such file or directory
 3020 | #include "graphviz/cgraph.h"
      |          ^~~~~~~~~~~~~~~~~~~
compilation terminated.
error: command '/usr/bin/gcc' failed with exit code 1
```

The only relevant part is `Fatal error: graphviz/cgraph.h: No such file
or directory`. Why is this file not there and how do i get it to be
there?

This is even harder to spot in pip's output, where it's 11 lines above
the last line:


![image](7a3d7279-e7b1-4511-ab22-d0a35be5e672)

I've special cased missing headers and made sure that the last line
tells you the important information: We're missing some header, please
check the documentation of {package} {version} for what to install:


![image](4bbb8923-5a82-472f-ab1f-9e1471aa2896)

Scrolling up:


![image](89a2495a-e188-4288-b534-ad885ee08763)

The difference gets even clearer with a default ubuntu terminal with its
80 columns:


![image](49fb27bc-07c6-4b10-a1a1-30ec8e112438)

---

Note that the situation is better for a missing compiler, there i get:

```
[...]
warning: no previously-included files matching '*~' found anywhere in distribution
warning: no previously-included files matching '*.pyc' found anywhere in distribution
warning: no previously-included files matching '.svn' found anywhere in distribution
no previously-included directories found matching 'doc/build'
error: command 'gcc' failed: No such file or directory
---
```
Putting the last line into google, the first two results tell me to
`sudo apt-get install gcc`, the third even tells me about `sudo apt
install build-essential`
2023-11-08 15:26:39 +00:00
konsti
2ebe40b986
Add --no-build (#358)
By default, we will build source distributions for both resolving and
installing, running arbitrary code. `--no-build` adds an option to ban
this and only install from wheels, no source distributions or git builds
allowed. We also don't fetch these and instead report immediately.

I've heard from users for whom this is a requirement, i'm implementing
it now because it's helpful for testing.

I'm thinking about adding a shared `PuffinSharedArgs` struct so we don't
have to repeat each option everywhere.
2023-11-08 10:05:15 -05:00
Charlie Marsh
4fe583257e
Use a custom PubGrub error type to always show resolution report (#365)
Closes https://github.com/astral-sh/puffin/issues/356.

The example from the issue now renders as:

```
❯ cargo run --bin puffin-dev -q -- resolve-cli tensorflow-cpu-aws
puffin-dev failed
  Caused by: No solution found when resolving build dependencies for source distribution:
  Caused by: Because there is no available version for tensorflow-cpu-aws and root depends on tensorflow-cpu-aws, version solving failed.
```
2023-11-08 09:57:26 -05:00
Charlie Marsh
b0286a8939
Add user feedback when building source distributions in the resolver (#347)
It looks like Cargo, notice the bold green lines at the top (which
appear during the resolution, to indicate Git fetches and source
distribution builds):

<img width="868" alt="Screen Shot 2023-11-06 at 11 28 47 PM"
src="9647a480-7be7-41e9-b1d3-69faefd054ae">

<img width="868" alt="Screen Shot 2023-11-06 at 11 28 51 PM"
src="6bc491aa-5b51-4b37-9ee1-257f1bc1c049">

Closes https://github.com/astral-sh/puffin/issues/287 although we can do
a lot more here.
2023-11-07 14:17:31 +00:00
Charlie Marsh
2c32bc5a86
Respect direct URLs in puffin installer (#345)
We now write the `direct_url.json` when installing, and _skip_
installing if we find a package installed via the direct URL that the
user is requesting.

A lot of TODOs, especially around cleaning up the `Source` abstraction
and its relationship to `DirectUrl`. I'm gonna keep working on these
today, but this works and makes the requirements clear.

Closes #332.
2023-11-07 09:11:27 -05:00
konsti
fbe28d3b7c
Fix mastodon-py dist-info handling (#336)
mastodon-py 1.5.1 uses a dot in its dist-info dir name, which we
previously didn't handle, causing home-assistant to fail. The new
implementation is based on
2f83540272/src/packaging/utils.py (L146-L172).

Part of #199

```
unzip -l  Mastodon.py-1.5.1-py2.py3-none-any.whl
Archive:  Mastodon.py-1.5.1-py2.py3-none-any.whl
  Length      Date    Time    Name
---------  ---------- -----   ----
   153929  2020-02-29 17:39   mastodon/Mastodon.py
     1029  2019-10-11 19:15   mastodon/__init__.py
     7357  2019-10-11 20:24   mastodon/streaming.py
       10  2020-03-14 18:14   Mastodon.py-1.5.1.dist-info/DESCRIPTION.rst
     1398  2020-03-14 18:14   Mastodon.py-1.5.1.dist-info/metadata.json
        9  2020-03-14 18:14   Mastodon.py-1.5.1.dist-info/top_level.txt
      110  2020-03-14 18:14   Mastodon.py-1.5.1.dist-info/WHEEL
     1543  2020-03-14 18:14   Mastodon.py-1.5.1.dist-info/METADATA
      753  2020-03-14 18:14   Mastodon.py-1.5.1.dist-info/RECORD
---------                     -------
   166138                     9 files
```
2023-11-07 12:36:11 +01:00
konsti
aac8ae997f
Rename source distribution build to source build (#334)
This is less verbose and better reflects that we're building both source
distributions and source trees passed into the function.
2023-11-07 03:55:23 +00:00
Zanie Blue
e952557bf1
Improve root message when version solving fails (#344)
Matching description at
https://github.com/dart-lang/pub/blob/master/doc/solver.md#linear-error-reporting
2023-11-06 20:07:50 +00:00
Zanie Blue
b0720ea5b2
Improve error message for dependencies with no versions available (#342)
Partially addresses https://github.com/astral-sh/puffin/issues/310
Addresses case at
https://github.com/astral-sh/puffin/issues/309#issuecomment-1793541558
Follow-up to #300 ensuring `PuffinExternal` is used consistently when
formatting messages

Example at
https://github.com/astral-sh/puffin/pull/342/files#diff-5c74a74ef34ef1d6e7453de8d2d19134813156e8b6a657e6b5ed71fda5a3a870
2023-11-06 14:04:29 -06:00
Zanie Blue
1748cfb522
Display dependency versions in pip-like format during solve failure (#346)
- Display `==` for exact version ranges
- Remove space between dependency and version range
2023-11-06 13:53:15 -06:00
Charlie Marsh
24e30e6557
Split puffin-package into requirements.txt parser and pypi-types (#341)
There are only two things left in this crate and they don't really have
anything to do with one another.
2023-11-06 18:19:49 +00:00
Charlie Marsh
d9bcfafa16
Write direct_url.json in wheel installer (#337)
## Summary

This PR just adds the logic in `install-wheel-rs` to write
`direct_url.json`. We're not actually taking advantage of it yet (or
wiring it through) in Puffin.

Part of https://github.com/astral-sh/puffin/issues/332.
2023-11-06 17:09:28 +00:00
konsti
b2439b24a1
Fetch wheel metadata by async range requests on the remote wheel (#301)
Use range requests and async zip to extract the METADATA file from a
remote wheel.

We currently only cache when the remote says the remote declares the
resource as immutable, see
https://github.com/06chaynes/http-cache/issues/57 and
https://github.com/baszalmstra/async_http_range_reader/pull/1 . The
cache is stored as json with the description omitted, this improve cache
deserialization performance.
2023-11-06 15:06:49 +01:00
Charlie Marsh
6d672b8951
Add source distribution support to pip-compile (#323)
## Summary

This is a first-pass at adding source distribution support to the
installer.

The previous installation flow was:

1. Come up with a plan.
1. Find a distribution (specific file) for every package that we'll need
to download.
1. Download those distributions.
1. Unzip them (since we assumed they were all wheels).
1. Install them into the virtual environment.

Now, Step (3) downloads both wheels and source distributions, and we
insert a step between Steps (3) and (4) to build any source
distributions into zipped wheels.

There are a bunch of TODOs, the most important (IMO) is that we
basically have two implementations of downloading and building, between
the stuff in `puffin_installer` and `puffin_resolver` (namely in
`crates/puffin-resolver/src/distribution`). I didn't attempt to clean
that up here -- it's already a problem, and it's related to the overall
problem we need to solve around unified caching and resource management.

Closes #243.
2023-11-06 08:22:36 -05:00
konsti
81f380b10e
Validate package and extra name (#290)
`PackageName` and `ExtraName` can now only be constructed from valid
names. They share the same rules, so i gave them the same
implementation. Constructors are split between `new` (owned) and
`from_str` (borrowed), with the owned version avoiding allocations.

Closes #279

---------

Co-authored-by: Zanie <contact@zanie.dev>
2023-11-06 10:04:31 +00:00
Charlie Marsh
1637f1c216
Add source distribution support to the DistributionFinder (#322)
## Summary

This just enables the `DistributionFinder` (previously known as the
`WheelFinder`) to select source distributions when there are no matching
wheels for a given platform. As a reminder, the `DistributionFinder` is
a simple resolver that doesn't look at any dependencies: it just takes a
set of pinned packages, and finds a distribution to install to satisfy
each requirement.
2023-11-06 00:16:04 -05:00
Charlie Marsh
d785ffdbff
Move Source abstraction into puffin-distribution (#321)
No code changes, but this will allow it to be shared between the
installer and the resolver.
2023-11-06 02:31:15 +00:00
Charlie Marsh
4b83d8e949
Require URL dependencies to be declared upfront (#319)
In the resolver, our current model for solving URL dependencies requires
that we visit the URL dependency _before_ the registry-based dependency.
This PR encodes a strict requirement that all URL dependencies be
declared upfront, either as requirements or constraints.

I wrote more about how it works and why it's necessary in documentation
[here](https://github.com/astral-sh/puffin/pull/319/files#diff-2b1c4f36af0c62a2b7bebeae9473ae083588f2a6b18a3ec52393a24266adecbbR20).
I think we could relax this constraint over time, but it requires a more
sophisticated model -- and for now, I just want something that's (1)
correct, (2) easy for us to reason about, and (3) easy for users to
reason about.

As additional motivation... allowing arbitrary URL dependencies anywhere
in the tree creates some really confusing situations in which I'm not
even sure what the right answers are. For example, assume you declare a
direct dependency on `Werkzeug==2.0.0`. You then depend on a version of
Flask that depends on a version of `Werkzeug` from some arbitrary URL.
You build the source distribution at that arbitrary URL, and it turns
out it _does_ build to a declared version of 2.0.0. What should happen?
(And if it resolves to a version that _isn't_ 2.0.0, what should happen
_then_?) I suspect different tools handle this differently, but it must
lead to a lot of "silent" failures. In my testing of Poetry, it seems
like Poetry just ignores the URL dependency, which seems wrong, but is
also a behavior we could implement in the future.

Closes https://github.com/astral-sh/puffin/issues/303.
Closes https://github.com/astral-sh/puffin/issues/284.
2023-11-05 17:09:58 +00:00
Charlie Marsh
a53188cac7
Avoid unnecessarily fetching non-marker-required first-party dependencies (#318)
E.g., given:

```
flask; python_version < '3.7'
requests
```

We shouldn't request the metadata for Flask when on Python versions 3.7
or later.
2023-11-04 17:03:43 +00:00
Charlie Marsh
051188dce0
Use separate representations for canonical repository vs. commit (#317)
Given `https://github.com/pypa/package.git#subdirectory=pkg_a` and
`https://github.com/pypa/package.git#subdirectory=pkg_b`, we want these
to map to the same shared _resource_ (for locking and cloning), but
different _packages_ (for determining whether the wheel already exists
in the cache). As such, we need two distinct concepts for "canonical
equality".

Closes #316.
2023-11-04 11:46:42 -04:00
Charlie Marsh
b589813e59
Enforce that built package name matches declared package name (#315)
Closes https://github.com/astral-sh/puffin/issues/306.
2023-11-03 22:58:12 +00:00
Charlie Marsh
643cf3b3aa
Unify subdirectory handling in source.rs (#314)
Avoids having to encode all the `git+` and `subdirectory=` logic in
multiple places.
2023-11-03 19:33:38 +00:00
Charlie Marsh
edce4ccb24
Add support for subdirectories in URL dependencies (#312)
Closes https://github.com/astral-sh/puffin/issues/307.
2023-11-03 15:28:38 -04:00
Charlie Marsh
aa9882eee8
Use locks to prevent concurrent accesses to the same Git repo (#304)
Ensures that if we need to access the same Git repo twice in a
resolution, we only have one handler to that repo at a time. (Otherwise,
`git2` panics.)
2023-11-03 16:33:14 +00:00
Charlie Marsh
fa1bbbbe08
Write fully-precise Git SHAs to pip-compile output (#299)
This PR adds a mechanism by which we can ensure that we _always_ try to
refresh Git dependencies when resolving; further, we now write the fully
resolved SHA to the "lockfile". However, nothing in the code _assumes_
we do this, so the installer will remain agnostic to this behavior.

The specific approach taken here is minimally invasive. Specifically,
when we try to fetch a source distribution, we check if it's a Git
dependency; if it is, we fetch, and return the exact SHA, which we then
map back to a new URL. In the resolver, we keep track of URL
"redirects", and then we use the redirect (1) for the actual source
distribution building, and (2) when writing back out to the lockfile. As
such, none of the types outside of the resolver change at all, since
we're just mapping `RemoteDistribution` to `RemoteDistribution`, but
swapping out the internal URLs.

There are some inefficiencies here since, e.g., we do the Git fetch,
send back the "precise" URL, then a moment later, do a Git checkout of
that URL (which will be _mostly_ a no-op -- since we have a full SHA, we
don't have to fetch anything, but we _do_ check back on disk to see if
the SHA is still checked out). A more efficient approach would be to
return the path to the checked-out revision when we do this conversion
to a "precise" URL, since we'd then only interact with the Git repo
exactly once. But this runs the risk that the checked-out SHA changes
between the time we make the "precise" URL and the time we build the
source distribution.

Closes #286.
2023-11-03 16:26:57 +00:00
Zanie Blue
addcfe533a
Implement custom resolution failure reporter to hide root package versions (#300)
Extends #295 
Closes #214 

Copies some of the implementations from `pubgrub::report` so we can
implement Puffin `PubGrubPackage` specific display when explaining
failed resolutions.

Here, we just drop the dummy version number if it's a
`PubGrubPackage::Root` package. In the future, we can further customize
reporting.
2023-11-03 10:47:01 -05:00
Zanie Blue
e1382cc747
Report project name instead of root when using pyproject.toml files (#295)
Part of https://github.com/astral-sh/puffin/issues/214

Adds a `project: Option<PackageName>` to the `Manifest`, `Resolver`, and
`RequirementsSpecification`.
To populate an optional `name` for `PubGubPackage::Root`.

I'll work on removing the version number next.

Should we consider using the parent directory name when a
`pyproject.toml` file is not present?
2023-11-03 10:22:10 -05:00
Charlie Marsh
a4002fe132
Make cache non-optional in most crates (#293)
This PR makes the cache non-optional in most of Puffin, which simplifies
the code, allows us to reuse the cache within a single command (even
with `--no-cache`), and also allows us to use the cache for disk storage
across an invocation.

I left the cache as optional for the `Virtualenv` and `InterpreterInfo`
abstractions, since those are generic enough that it seems nice to have
a non-cached version, but it's kind of arbitrary.
2023-11-02 13:40:20 -04:00
Charlie Marsh
a02bf2e415
Split source_distribution.rs into separate wheel and sdist fetchers (#291) 2023-11-02 16:04:51 +00:00
Charlie Marsh
62c474d880
Add support for Git dependencies (#283)
## Summary

This PR adds support for Git dependencies, like:

```
flask @ git+https://github.com/pallets/flask.git
```

Right now, they're only supported in the resolver (and not the
installer), since the installer doesn't yet support source distributions
at all.

The general approach here is based on Cargo's Git implementation.
Specifically, I adapted Cargo's
[`git`](23eb492cf9/src/cargo/sources/git/mod.rs)
module to perform the cloning, which is based on `libgit2`.

As compared to Cargo's implementation, I made the following changes:

- Removed any unnecessary code.
- Fixed any Clippy errors for our stricter ruleset.
- Removed the dependency on `curl`, in favor of `reqwest` which we use
elsewhere.
- Removed the ability to use `gix`. Cargo allows the use of `gix` as an
experimental flag, but it only supports a small subset of the
operations. When Cargo fully adopts `gix`, we should plan to do the
same.
- Removed Cargo's host key checking. We need to re-add this! I'll do it
shortly.
- Removed Cargo's progress bars. We should re-add this too, but we use
`indicatif` and Cargo had their own thing.

There are a few follow-ups to consider:

- Adding support in the installer.
- When we lock, we should write out the Git URL that includes the exact
SHA. This lets us cache in perpetuity and avoids dependencies changing
without re-locking.
- When we resolve, we should _always_ try to refresh Git dependencies.
(Right now, we skip if the wheel was already built.)

I'll work on the latter two in follow-up PRs.

Closes #202.
2023-11-02 15:14:55 +00:00
konsti
4adaa9a700
Wheel filename distribution package name (#278)
The normalized name abstractions were not consistently, this PR uses
them where they were previously missing:
* `WheelFilename::distribution`
* `Requirement::name`
* `Requirement::extras`
* `Metadata21::name`
* `Metadata21::provides_dist`

With `puffin-package` depending on `pep508_rs` this would be cyclical
crate dependency, so `puffin-normalize` gets split out from
`puffin-package`.

`DistInfoName` has the same task and semantics as `PackageName`, so it's
merged into the latter.

`PackageName` and `ExtraName` documentation is moved onto the type and
their constructors are called `new` instead of `normalize`. We now use
these constructors rarely enough the implicit allocation by
`to_string()` shouldn't matter anymore, while more actual cloning
becomes visible.
2023-11-02 11:15:27 +00:00
Charlie Marsh
0c9e975f75
Rename distribution.rs to file.rs in puffin-resolver (#288) 2023-11-01 23:52:53 -04:00
Zanie Blue
b8ff32f6be
Respect markers on constraints (#282)
Closes #252
2023-11-01 20:20:32 -05:00
Charlie Marsh
2652caa3e3
Add support for URL dependencies (#251)
## Summary

This PR adds support for resolving and installing dependencies via
direct URLs, like:

```
werkzeug @ 960bb4017c/Werkzeug-2.0.0-py3-none-any.whl
```

These are fairly common (e.g., with `torch`), but you most often see
them as Git dependencies.

Broadly, structs like `RemoteDistribution` and friends are now enums
that can represent either registry-based dependencies or URL-based
dependencies:

```rust
/// A built distribution (wheel) that exists as a remote file (e.g., on `PyPI`).
#[derive(Debug, Clone)]
#[allow(clippy::large_enum_variant)]
pub enum RemoteDistribution {
    /// The distribution exists in a registry, like `PyPI`.
    Registry(PackageName, Version, File),
    /// The distribution exists at an arbitrary URL.
    Url(PackageName, Url),
}
```

In the resolver, we now allow packages to take on an extra, optional
`Url` field:

```rust
#[derive(Debug, Clone, Eq, Derivative)]
#[derivative(PartialEq, Hash)]
pub enum PubGrubPackage {
    Root,
    Package(
        PackageName,
        Option<DistInfoName>,
        #[derivative(PartialEq = "ignore")]
        #[derivative(PartialOrd = "ignore")]
        #[derivative(Hash = "ignore")]
        Option<Url>,
    ),
}
```

However, for the purpose of version satisfaction, we ignore the URL.
This allows for the URL dependency to satisfy the transitive request in
cases like:

```
flask==3.0.0
werkzeug @ 254c3e9b5f/werkzeug-3.0.1-py3-none-any.whl
```

There are a couple limitations in the current approach:

- The caching for remote URLs is done separately in the resolver vs. the
installer. I decided not to sweat this too much... We need to figure out
caching holistically.
- We don't support any sort of time-based cache for remote URLs -- they
just exist forever. This will be a problem for URL dependencies, where
we need some way to evict and refresh them. But I've deferred it for
now.
- I think I need to redo how this is modeled in the resolver, because
right now, we don't detect a variety of invalid cases, e.g., providing
two different URLs for a dependency, asking for a URL dependency and a
_different version_ of the same dependency in the list of first-party
dependencies, etc.
- (We don't yet support VCS dependencies.)
2023-11-01 09:21:44 -04:00
Charlie Marsh
079b685c8c
Use distributions for Reporter signatures (#266) 2023-11-01 03:19:13 +00:00
Charlie Marsh
bee1b0f5ad
Avoid re-parsing wheel filename in source distribution tree (#265) 2023-10-31 21:02:09 +00:00
Charlie Marsh
aff26f2301
Reuse distribution structs in Resolver's source_distribution.rs (#264) 2023-10-31 20:50:34 +00:00
Charlie Marsh
89dad0c9ad
Move distribution abstraction in shared crate (#258)
This also allows us to get rid of `PinnedPackage` _and_ to remove some
`Result<...>` types due to needless conversions between
otherwise-identical types.
2023-10-31 15:30:06 -04:00
Charlie Marsh
16aac834ee
Move PyPI-oriented types out of puffin-client crate (#255)
Just an internal change to avoid a dependency on `puffin-client` for
those crates that need access to PyPI-metadata types.
2023-10-31 17:10:23 +00:00
Charlie Marsh
1f059b30dd
Remove Box<Pin<...>> from process_request (#242) 2023-10-30 16:34:57 -04:00
konstin
1529def563 Implement mixed PEP 517 and setup.py build
There are packages such as DTLSSocket 0.1.16 that say
```toml
[build-system]
requires = ["Cython<3", "setuptools", "wheel"]
```
In this case we need to install requires PEP 517 style but then call setup.py in the
legacy way

Part of making home-assistant work
2023-10-30 19:11:52 +01:00
konsti
d47dc64974
Ignore self requirements (#233)
gps3 0.33.3 depends on itself, which we can ignore. I've also added the
home assistant requirements since it occurred when testing with this.
2023-10-30 17:13:52 +01:00
Charlie Marsh
0be20a41a4
Make version selection wheel-vs.-sdist-agnostic (#232)
Closes https://github.com/astral-sh/puffin/issues/231.
2023-10-30 11:21:10 -04:00
Charlie Marsh
e73d3f0ff8
Use bounded ranges rather than constructing manual ranges (#228)
I didn't realize this, but they made a bunch of improvements to how
PubGrub represents versions which lets us greatly simplify our own
PubGrub version wrapper
(https://github.com/pubgrub-rs/guide/pull/6/files).
2023-10-30 03:58:43 +00:00