Commit graph

76 commits

Author SHA1 Message Date
konsti
5820a9d937
Update dependencies (#794)
Pull in a bunch of updates so they get some testing before we announce
the project. textwrap 0.16 is blocked on miette updating, http 1.0 on
reqwest.
2024-01-05 11:40:12 -05:00
Andrew Gallant
d7c9b151fb
pep440: some minor refactoring, mostly around error types (#780)
This PR does a bit of refactoring to the pep440 crate, and in
particular around the erorr types. This PR is meant to be a precursor
to another PR that does some surgery (both in parsing and in `Version`
representation) that benefits somewhat from this refactoring.

As usual, please review commit-by-commit.
2024-01-04 12:28:36 -05:00
Charlie Marsh
b2230e7f4d
Make index URLs insensitive to trailing slashes (#771)
Closes https://github.com/astral-sh/puffin/issues/770.
2024-01-04 08:45:50 -05:00
konsti
26f597a787
Add spans to all significant tasks (#740)
I've tried to investigate puffin's performance wrt to builds and
parallelism in general, but found the previous instrumentation to
granular. I've tried to add spans to every function that either needs
noticeable io or cpu resources without creating duplication. This also
fixes some wrong tracing usage on async functions
(https://docs.rs/tracing/latest/tracing/struct.Span.html#in-asynchronous-code)
and some spans that weren't actually entered.
2024-01-02 16:17:03 +00:00
Charlie Marsh
007f52bb4e
Add support for relative URLs in simple metadata responses (#721)
## Summary

This PR adds support for relative URLs in the simple JSON responses. We
already support relative URLs for HTML responses, but the handling has
been consolidated between the two. Similar to index URLs, we now store
the base alongside the metadata, and use the base when resolving the
URL.

Closes #455.

## Test Plan

`cargo test` (to test HTML indexes). Separately, I also ran `cargo run
-p puffin-cli -- pip-compile requirements.in -n
--index-url=http://localhost:3141/packages/pypi/+simple` on the
`zb/relative` branch with `packse` running, and forced both HTML and
JSON by limiting the `accept` header.
2023-12-27 08:53:21 -05:00
Charlie Marsh
ae83a74309
Review feedback for HTML indexes (#733)
See: https://github.com/astral-sh/puffin/pull/719
2023-12-26 21:57:20 +00:00
Charlie Marsh
bbe0246205
Change internal representation of CacheEntry to avoid allocations (#730)
Removes a TODO.
2023-12-26 02:10:30 +00:00
Charlie Marsh
188ab75769
Split File into internal and external type (#729)
## Summary

This PR makes the `pypi_types::File` a response-only type (i.e., a type
that's only used when deserializing over the wire), and adds a separate
internal `File` type. Right now, the representations are similar, but
already, we can avoid the "lenient" deserialization on our internal
`File` type, and avoid the special-casing of the property names that's
required in the JSON. Over time, we can evolve this representation
entirely separately from the representation we receive from PyPI and
other indexes.
2023-12-25 15:42:28 -05:00
Charlie Marsh
6ff21374dc
Split puffin-cache into Puffin-specific and generic utilities (#728)
This crate started off as generic caching utilities, but we started
adding a lot of Puffin-specific stuff (like the cache buckets
abstraction that knows about Git vs. direct URL vs. indexes and so on).
This PR moves the generic stuff into a new `cache-key` crate.
2023-12-25 14:38:56 +00:00
Charlie Marsh
343880820b
Un-escape HTML entities when decoding (#723)
I don't have a good testing strategy here (I'm manually testing against
`devpi` via `packse`), but the HTML index uses (e.g.)
`data-requires-python=">=3.8"`, so we need to decode.
2023-12-24 16:35:45 -05:00
Charlie Marsh
2d721a497e
Add a SimpleHttp abstraction similar to SimpleJson (#722)
Just an internal refactor to turn some standalone functions into
associated methods (and reduce the diff in the next PR).
2023-12-24 20:55:57 +00:00
Charlie Marsh
5bce699ee1
Add support for HTML indexes (#719)
## Summary

This PR adds support for HTML index responses (as with
`--index-url=https://download.pytorch.org/whl`).

Closes https://github.com/astral-sh/puffin/issues/412.
2023-12-24 16:04:00 +00:00
Zanie Blue
e705267dac
Fix fallback download when index does not support HTTP range requests (#702)
Otherwise, when a server does not support HTTP range requests we throw
an error instead of downloading without range requests.

---------

Co-authored-by: konstin <konstin@mailbox.org>
2023-12-20 10:55:23 +00:00
Zanie Blue
ab15b08cbe
Perform 3 retries by default instead of 0 on failed index requests (#710)
As a user, I'd expect retries to occur by default.

We should also expose this via a setting probably.
2023-12-20 11:51:24 +01:00
Zanie Blue
12eedb1c12
Include Accept header specifying that we can only parse JSON responses (#701)
Otherwise, when an index does not support the query variable we get an
HTML response and a JSON parse error.
2023-12-19 12:22:53 -06:00
Zanie Blue
52ba65aa9c
Derive Debug for CachedClientError (#703)
Discovered while debugging https://github.com/astral-sh/puffin/pull/702
2023-12-19 12:22:39 -06:00
Charlie Marsh
3660d8a08e
Introduce separate traits for ahead-of-time and installed metadata (#692)
This is a pure refactor to follow-up #690, to separate the metadata that
we know upfront about distributions (like the version, for
registry-based distributions) vs. the metadata that requires building
(like the version, for URL-based distributions).
2023-12-18 22:37:45 +00:00
konsti
f4f67ebde0
Rebase: Uninstall existing non-editable versions when installing editable requirements bug (#682)
Separate branch for rebasing #677 onto main because i don't trust the
rebase enough to force push.

Closes #677.

---

If you install `black` from PyPI, then `-e ../black`, we need to
uninstall the existing `black`. This sounds simple, but that in turn
requires that we _know_ `-e ../black` maps to the package `black`, so
that we can mark it for uninstallation in the install plan. This, in
turn, means that we need to build editable dependencies prior to the
install plan.

This is just a bunch of reorganization to fix that specific bug
(installing multiple versions of `black` if you run through the above
workflow): we now run through the list of editables upfront, mark those
that are already installed, build those that aren't, and then ensure
that `InstallPlan` correctly removes those that need to be removed, etc.

Closes #676.

Co-authored-by: Charlie Marsh <charlie.r.marsh@gmail.com>
2023-12-18 09:28:14 +00:00
konsti
f059c6e6a6
Support editable in pip-sync and pip-compile (#587)
Support `-e path/do/dir` in pip-sync and and pip-compile.
2023-12-16 22:37:34 +00:00
konsti
71964ec7a8
Switch to msgpack in the cached client (#662)
This gives a 1.23 speedup on transformers-extras. We could change to
msgpack for the entire cache if we want. I only tried this format and
postcard so far, where postcard was much slower (like 1.6s).

I don't actually want to merge it like this, i wanted to figure out the
ballpark of improvement for switching away from json.

```
hyperfine --warmup 3 --runs 10 "target/profiling/puffin pip-compile --cache-dir cache-msgpack scripts/requirements/transformers-extras.in" "target/profiling/branch pip-compile scripts/requirements/transformers-extras.in"
Benchmark 1: target/profiling/puffin pip-compile --cache-dir cache-msgpack scripts/requirements/transformers-extras.in
  Time (mean ± σ):     179.1 ms ±   4.8 ms    [User: 157.5 ms, System: 48.1 ms]
  Range (min … max):   174.9 ms … 188.1 ms    10 runs

Benchmark 2: target/profiling/branch pip-compile scripts/requirements/transformers-extras.in
  Time (mean ± σ):     221.1 ms ±   6.7 ms    [User: 208.1 ms, System: 46.5 ms]
  Range (min … max):   213.5 ms … 235.5 ms    10 runs

Summary
  target/profiling/puffin pip-compile --cache-dir cache-msgpack scripts/requirements/transformers-extras.in ran
    1.23 ± 0.05 times faster than target/profiling/branch pip-compile scripts/requirements/transformers-extras.in
```

Disadvantage: We can't manually look into the cache anymore to debug
things

- [ ] Check more formats, i currently only tested json, msgpack and
postcard, there should be other formats, too
- [x] Switch over `CachedByTimestamp` serialization (for the interpreter
caching)
- [x] Switch over error handling and make sure puffin is still resilient
to cache failure
2023-12-16 21:01:35 +00:00
Charlie Marsh
84093773ef
Store source distribution sources in the cache (#653)
## Summary

This PR modifies `source_dist.rs` to store source distributions (from
remote URLs) in the cache. The cache structure for registries now looks
like:

<img width="1053" alt="Screen Shot 2023-12-14 at 10 43 43 PM"
src="3c2dbf6b-5926-41f2-b69b-74031741aba8">

(I will update the docs prior to merging, if approved.)

The benefit here is that we can reuse the source distribution (avoid
download + unzipping it) if we need to build multiple wheels. In the
future, it will be even more relevant, since we'll need to reuse the
source distribution to support
https://github.com/astral-sh/puffin/issues/599.

I also included some misc. refactors to DRY up repeated operations and
add some more abstraction to `source_dist.rs`.
2023-12-15 17:19:33 +00:00
Charlie Marsh
ed8dfbfcf7
Preserve verbatim URLs (#639)
## Summary

This PR adds a `VerbatimUrl` struct to preserve verbatim URLs throughout
the resolution and installation pipeline. In short, alongside the parsed
`Url`, we also keep the URL as written by the user. This enables us to
display the URL exactly as written by the user, rather than the
serialized path that we use internally.

This will be especially useful once we start expanding environment
variables since, at that point, we'll be able to write the version of
the URL that includes the _unexpected_ environment variable to the
output file.
2023-12-14 15:03:39 +00:00
Charlie Marsh
0499fe0613
Fix incorrect unknown size marker in traces (#600)
It said `(unknown size)` for _all_ disk-based wheels.
2023-12-09 04:46:01 +00:00
Zanie Blue
ef7be9103c
Parse SimpleJson into categorized data in the client (#522)
Extends #517 with a suggestion from @konstin to parse the `SimpleJson`
into an intermediate type `SimpleMetadata(BTreeMap<Version,
VersionFiles>)` before converting to a `VersionMap`. This reduces the
number of times we need to parse the response. Additionally, we cache
the parsed response now instead of `SimpleJson`.

`VersionFiles` stores two vectors with
`WheelFilename`/`SourceDistFilename` and `File` tuples. These can be
iterated over together or separately. A new enum `DistFilename` was
added to capture the `SourceDistFilename` and `WheelFilename` variants
allowing iteration over both vectors.
2023-12-07 11:04:47 -06:00
Charlie Marsh
a825b2db06
Shard the registry cache by package (#583)
## Summary

This PR modifies the cache structure in a few ways. Most notably, we now
shard the set of registry wheels by package, and index them lazily when
computing the install plan.

This applies both to built wheels:

<img width="989" alt="Screen Shot 2023-12-06 at 4 42 19 PM"
src="0e8a306f-befd-4be9-a63e-2303389837bb">

And remote wheels:

<img width="836" alt="Screen Shot 2023-12-06 at 4 42 30 PM"
src="7fd908cd-dd86-475e-9779-07ed067b4a1a">

For other distributions, we now consistently cache using the package
name, which is really just for clarity and debuggability (we could
consider omitting these):

<img width="955" alt="Screen Shot 2023-12-06 at 4 58 30 PM"
src="3e8d0f99-df45-429a-9175-d57b54a72e56">

Obliquely closes https://github.com/astral-sh/puffin/issues/575.
2023-12-07 05:02:46 +00:00
Zanie Blue
2bb04771ce
Allow switching out the resolver's IO (#517)
I'm working off of @konstin's commit here to implement arbitrary unsat
test cases for the resolver.

The entirety of the resolver's io are two functions: Get the version map
for a package (PEP 440 version -> distribution) and get the metadata for
a distribution. A new trait `ResolverProvider` abstracts these two away and
allows replacing the real network requests e.g. with stored responses
(https://github.com/pradyunsg/pip-resolver-benchmarks/blob/main/scenarios/pyrax_198.json).

---------

Co-authored-by: konsti <konstin@mailbox.org>
2023-12-06 11:53:16 -06:00
Charlie Marsh
6f055ecf3b
Remove existing built wheels when building source distributions (#559)
This PR modifies the source distribution building to replace any
existing targets after building the new wheel. In some cases, the
existence of an existing target may be indicative of a bug, so we warn.
It's partially a workaround for some (but not all) of the errors in
https://github.com/astral-sh/puffin/issues/554.
2023-12-05 12:45:24 -05:00
Charlie Marsh
5fddcc362e
Improve error messages for 'file not found' case (#550)
Right now, if you specify a wheel that doesn't exist, you get: `no such
file or directory` with no additional context. Oops!
2023-12-04 22:01:51 +00:00
konsti
d5abd33813
Use atomic writes for the cache consistently (#546)
Ensure we're using atomic writes everywhere in our cache to avoid broken
cache records and error with parallel puffin actions
(https://github.com/astral-sh/puffin/pull/544#issuecomment-1838841581).

All json files that are written to the cache are written atomically and
the build wheels are written to temp dir and then moved atomically. I
didn't touch venv creation though, i don't think that's worth it since
python does not support atomic package installation through its design.
2023-12-04 12:02:01 -05:00
konsti
9806901a16
Consolidate wheel caches (#524)
After this change, two wheel caches remain: `built-wheels-v0` and
`wheels-v0`, docs screenshots below. Each contains both the wheel
metadata, cache policy and zip or unzipped wheels under the same name.

The zipped/unzipped strategy is as follows: In `pip-compile`, when we
build a wheel, we store it zipped. When `pip-sync` or a source dist
build in `pip-compile` need to install the wheel, we unzip it, remove
the file and replace it with the unzipped wheel.

This removes `WheelCache` and `UrlIndex` in favor of `Cache` plus
`WheelCache`. The non-built wheel cache now considers index urls and the
url for url wheels.

I'm unsure if we need the `Unzipper` type, this could just be a
function.

I move `no_index` into `IndexUrls` and started using `IndexUrl` up to
the clap level.

I left a number of TODOs in the code, namely performing the actual
invalidation of unzipped wheels and making the `InstallPlan` understand
cache invalidation (i.e. uninstall wheels when their remote changed).


![image](c4d45979-485b-4954-848d-fd3347ee2510)
2023-12-01 20:16:33 +00:00
konsti
d89fbeb642
Migrate interpreter query to custom caching (#508)
This removes the last usage of cacache by replacing it with a custom,
flat json caching keyed by the digest of the executable path.


![image](8f777c4c-1f1b-4656-ba7b-002175270556)

A step towards #478. I've made `CachedByTimestamp<T>` generic over `T`
but intentionally not moved it to `puffin-cache` yet.
2023-11-28 17:14:59 +00:00
konsti
5435d44756
Introduce Cache, CacheBucket and CacheEntry (#507)
This is mostly a mechanical refactor that moves 80% of our code to the
same cache abstraction.

It introduces cache `Cache`, which abstracts away the path of the cache
and the temp dir drop and is passed throughout the codebase. To get a
specific cache bucket, you need to requests your `CacheBucket` from
`Cache`. `CacheBucket` is the centralizes the names of all cache
buckets, moving them away from the string constants spread throughout
the crates.

Specifically for working with the `CachedClient`, there is a
`CacheEntry`. I'm not sure yet if that is a strict improvement over
`cache_dir: PathBuf, cache_file: String`, i may have to rotate that
later.

The interpreter cache moved into `interpreter-v0`.

We can use the `CacheBucket` page to document the cache structure in
each bucket:


![image](b023fdfb-e34d-4c2d-8663-b5f73937a539)
2023-11-28 17:11:14 +00:00
konsti
8855f44b5f
Move simple index queries to CachedClient (#504)
Replaces the usage of `http-cache-reqwest` for simple index queries with
our custom cached client, removing `http-cache-reqwest` altogether.

The new cache paths are `<cache>/simple-v0/<index>/<package_name>.json`.
I could not test with a non-pypi index since i'm not aware of any other
json indices (jax and torch are both html indices).

In a future step, we can transform the response to be a
`HashMap<Version, {source_dists: Vec<(SourceDistFilename, File)>,
wheels: Vec<(WheeFilename, File)>}` (independent of python version, this
cache is used by all environments together). This should speed up cache
deserialization a bit, since we don't need to try source dist and wheel
anymore and drop incompatible dists, and it should make building the
`VersionMap` simpler. We can speed this up even further by splitting
into a version lists and the info for each version. I'm mentioning this
because deserialization was a major bottleneck in the rust part of the
old python prototype.

Fixes #481
2023-11-28 00:11:03 +00:00
Charlie Marsh
afda835544
Avoid clone for WheelMetadataCache (#500)
This doesn't need to own the underlying data which allows us to remove a
number of clones.
2023-11-25 23:33:59 +00:00
konsti
d54e780843
Source dist metadata refactor (#468)
## Summary and motivation

For a given source dist, we store the metadata of each wheel built
through it in `built-wheel-metadata-v0/pypi/<source dist
filename>/metadata.json`. During resolution, we check the cache status
of the source dist. If it is fresh, we check `metadata.json` for a
matching wheel. If there is one we use that metadata, if there isn't, we
build one. If the source is stale, we build a wheel and override
`metadata.json` with that single wheel. This PR thereby ties the local
built wheel metadata cache to the freshness of the remote source dist.
This functionality is available through `SourceDistCachedBuilder`.

`puffin_installer::Builder`, `puffin_installer::Downloader` and
`Fetcher` are removed, instead there are now `FetchAndBuild` which calls
into the also new `SourceDistCachedBuilder`. `FetchAndBuild` is the new
main high-level abstraction: It spawns parallel fetching/building, for
wheel metadata it calls into the registry client, for wheel files it
fetches them, for source dists it calls `SourceDistCachedBuilder`. It
handles locks around builds, and newly added also inter-process file
locking for git operations.

Fetching and building source distributions now happens in parallel in
`pip-sync`, i.e. we don't have to wait for the largest wheel to be
downloaded to start building source distributions.

In a follow-up PR, I'll also clear built wheels when they've become
stale.

Another effect is that in a fully cached resolution, we need neither zip
reading nor email parsing.

Closes #473

## Source dist cache structure 

Entries by supported sources:
 * `<build wheel metadata cache>/pypi/foo-1.0.0.zip/metadata.json`
* `<build wheel metadata
cache>/<sha256(index-url)>/foo-1.0.0.zip/metadata.json`
* `<build wheel metadata
cache>/url/<sha256(url)>/foo-1.0.0.zip/metadata.json`
But the url filename does not need to be a valid source dist filename

(<https://github.com/search?q=path%3A**%2Frequirements.txt+master.zip&type=code>),
so it could also be the following and we have to take any string as
filename:
* `<build wheel metadata
cache>/url/<sha256(url)>/master.zip/metadata.json`

Example:
```text
# git source dist
pydantic-extra-types @ git+https://github.com/pydantic/pydantic-extra-types.git
# pypi source dist
django_allauth==0.51.0
# url source dist
werkzeug @ ff1904eb5e/werkzeug-3.0.1.tar.gz
```
will be stored as
```text
built-wheel-metadata-v0
├── git
│   └── 5c56bc1c58c34c11
│       └── 843b753e9e8cb74e83cac55598719b39a4d5ef1f
│           └── metadata.json
├── pypi
│   └── django-allauth-0.51.0.tar.gz
│       └── metadata.json
└── url
    └── 6781bd6440ae72c2
        └── werkzeug-3.0.1.tar.gz
            └── metadata.json
```

The inside of a `metadata.json`:
```json
{
  "data": {
    "django_allauth-0.51.0-py3-none-any.whl": {
      "metadata-version": "2.1",
      "name": "django-allauth",
      "version": "0.51.0",
      ...
    }
  }
}
```
2023-11-24 17:47:58 +00:00
Charlie Marsh
443a0a9df2
Use a sparse Metadata 2.1 representation (#488)
This is an optimization to avoid parsing the entire Metadata 2.1 when we
only need a small subset of the fields.

Closes #175.
2023-11-22 13:25:35 +00:00
konsti
934e32ea98
Remove outdated todos (#476) 2023-11-21 13:57:40 +00:00
Charlie Marsh
f1aa70d9d3
Refactor distribution types to return Result (#470)
## Summary

A variety of small refactors to the distribution types crate to (1)
return `Result` if we find an invalid wheel, rather than treating it as
a source distribution with a `.whl` suffix, and (2) DRY up some repeated
code around URLs.
2023-11-20 23:08:54 +00:00
konsti
f0841cdb6e
Wheel metadata refactor (#462)
A consistent cache structure for remote wheel metadata:

 * `<wheel metadata cache>/pypi/foo-1.0.0-py3-none-any.json`
* `<wheel metadata
cache>/<digest(index-url)>/foo-1.0.0-py3-none-any.json`
* `<wheel metadata cache>/url/<digest(url)>/foo-1.0.0-py3-none-any.json`

The source dist caching will use a similar structure (#468).
2023-11-20 17:26:36 +01:00
Charlie Marsh
8decb29bad
Use a dedicated error type for puffin-distribution (#466) 2023-11-20 11:38:27 +00:00
konsti
46bb18f06e
Track file index (#452)
Track the index (or at least its url) where we got a file from across
the source code.

Fixes #448
2023-11-20 08:48:16 +00:00
konsti
751f7fa9c6
Improve PEP 691 compatibility (#428)
[PEP 691](https://peps.python.org/pep-0691/#project-detail) has slightly
different, more relaxed rules around file metadata. These changes are
now reflected in the `File` struct. This will make it easier to support
alternative indices.

I had expected that i need to introduce a separate type for that, so i'm
happy it's two `Option`s more and an alias.

Part of #412
2023-11-16 19:03:44 +01:00
konsti
81c9cd0d4a
Print url for bad json error (#409)
Split out from #382
2023-11-13 11:41:20 +00:00
konsti
d8408b1783
Add source to failing metadata parsing (#387)
Before:
```
cargo run --bin puffin-dev -q -- resolve-cli "transformers[accelerate, agents, all, audio, codecarbon, deepspeed, deepspeed-testing, dev, dev-tensorflow, dev-torch, docs, docs_specific, flax, flax-speech, ftfy, integrations, ja, modelcreation, onnx, onnxruntime, optuna, quality, ray, retrieval, sagemaker, sentencepiece, serving, sigopt, sklearn, speech, testing, tf, tf-cpu, tf-speech, timm, tokenizers, torch, torch-speech, torch-vision, torchhub, video, vision]"
puffin-dev failed
  Caused by: No solution found when resolving: transformers[accelerate,agents,all,audio,codecarbon,deepspeed,deepspeed-testing,dev,dev-tensorflow,dev-torch,docs,docs-specific,flax,flax-speech,ftfy,integrations,ja,modelcreation,onnx,onnxruntime,optuna,quality,ray,retrieval,sagemaker,sentencepiece,serving,sigopt,sklearn,speech,testing,tf,tf-cpu,tf-speech,timm,tokenizers,torch,torch-speech,torch-vision,torchhub,video,vision]
  Caused by: Not a valid package or extra name: ".none". Names must start and end with a letter or digit and may only contain -, _, ., and alphanumeric characters
```
After:
```
cargo run --bin puffin-dev -q -- resolve-cli "transformers[accelerate, agents, all, audio, codecarbon, deepspeed, deepspeed-testing, dev, dev-tensorflow, dev-torch, docs, docs_specific, flax, flax-speech, ftfy, integrations, ja, modelcreation, onnx, onnxruntime, optuna, quality, ray, retrieval, sagemaker, sentencepiece, serving, sigopt, sklearn, speech, testing, tf, tf-cpu, tf-speech, timm, tokenizers, torch, torch-speech, torch-vision, torchhub, video, vision]"
puffin-dev failed
  Caused by: No solution found when resolving: transformers[accelerate,agents,all,audio,codecarbon,deepspeed,deepspeed-testing,dev,dev-tensorflow,dev-torch,docs,docs-specific,flax,flax-speech,ftfy,integrations,ja,modelcreation,onnx,onnxruntime,optuna,quality,ray,retrieval,sagemaker,sentencepiece,serving,sigopt,sklearn,speech,testing,tf,tf-cpu,tf-speech,timm,tokenizers,torch,torch-speech,torch-vision,torchhub,video,vision]
  Caused by: Couldn't parse metadata in fastapi-0.10.1-py3-none-any.whl (97ac91cb7c/fastapi-0.10.1-py3-none-any.whl)
  Caused by: Not a valid package or extra name: ".none". Names must start and end with a letter or digit and may only contain -, _, ., and alphanumeric characters
```
2023-11-10 18:33:49 +00:00
konsti
5cef40d87a
Add proper caching for pypi metadata fetching kinds (#368)
I intend this to become the main form of caching for puffin: You can
make http requests, you tranform the data to what you really need, you
have control over the cache key, and the cache is always json (or
anything else much faster we want to replace it with as long as it's
serde!)
2023-11-10 11:03:40 +00:00
Charlie Marsh
a148f9d0be
Refactor distribution types to adhere to a clear hierarchy (#369)
## Summary

This PR refactors our `RemoteDistribution` type such that it now follows
a clear hierarchy that matches the actual variants, and encodes the
differences between source and built distributions:

```rust
pub enum Distribution {
    Built(BuiltDistribution),
    Source(SourceDistribution),
}

pub enum BuiltDistribution {
    Registry(RegistryBuiltDistribution),
    DirectUrl(DirectUrlBuiltDistribution),
}

pub enum SourceDistribution {
    Registry(RegistrySourceDistribution),
    DirectUrl(DirectUrlSourceDistribution),
    Git(GitSourceDistribution),
}

/// A built distribution (wheel) that exists in a registry, like `PyPI`.
pub struct RegistryBuiltDistribution {
    pub name: PackageName,
    pub version: Version,
    pub file: File,
}

/// A built distribution (wheel) that exists at an arbitrary URL.
pub struct DirectUrlBuiltDistribution {
    pub name: PackageName,
    pub url: Url,
}

/// A source distribution that exists in a registry, like `PyPI`.
pub struct RegistrySourceDistribution {
    pub name: PackageName,
    pub version: Version,
    pub file: File,
}

/// A source distribution that exists at an arbitrary URL.
pub struct DirectUrlSourceDistribution {
    pub name: PackageName,
    pub url: Url,
}

/// A source distribution that exists in a Git repository.
pub struct GitSourceDistribution {
    pub name: PackageName,
    pub url: Url,
}
```

Most of the PR just stems downstream from this change. There are no
behavioral changes, so I'm largely relying on lint, tests, and the
compiler for correctness.
2023-11-10 02:45:41 +00:00
konsti
fbe28d3b7c
Fix mastodon-py dist-info handling (#336)
mastodon-py 1.5.1 uses a dot in its dist-info dir name, which we
previously didn't handle, causing home-assistant to fail. The new
implementation is based on
2f83540272/src/packaging/utils.py (L146-L172).

Part of #199

```
unzip -l  Mastodon.py-1.5.1-py2.py3-none-any.whl
Archive:  Mastodon.py-1.5.1-py2.py3-none-any.whl
  Length      Date    Time    Name
---------  ---------- -----   ----
   153929  2020-02-29 17:39   mastodon/Mastodon.py
     1029  2019-10-11 19:15   mastodon/__init__.py
     7357  2019-10-11 20:24   mastodon/streaming.py
       10  2020-03-14 18:14   Mastodon.py-1.5.1.dist-info/DESCRIPTION.rst
     1398  2020-03-14 18:14   Mastodon.py-1.5.1.dist-info/metadata.json
        9  2020-03-14 18:14   Mastodon.py-1.5.1.dist-info/top_level.txt
      110  2020-03-14 18:14   Mastodon.py-1.5.1.dist-info/WHEEL
     1543  2020-03-14 18:14   Mastodon.py-1.5.1.dist-info/METADATA
      753  2020-03-14 18:14   Mastodon.py-1.5.1.dist-info/RECORD
---------                     -------
   166138                     9 files
```
2023-11-07 12:36:11 +01:00
Charlie Marsh
24e30e6557
Split puffin-package into requirements.txt parser and pypi-types (#341)
There are only two things left in this crate and they don't really have
anything to do with one another.
2023-11-06 18:19:49 +00:00
konsti
b2439b24a1
Fetch wheel metadata by async range requests on the remote wheel (#301)
Use range requests and async zip to extract the METADATA file from a
remote wheel.

We currently only cache when the remote says the remote declares the
resource as immutable, see
https://github.com/06chaynes/http-cache/issues/57 and
https://github.com/baszalmstra/async_http_range_reader/pull/1 . The
cache is stored as json with the description omitted, this improve cache
deserialization performance.
2023-11-06 15:06:49 +01:00
konsti
4adaa9a700
Wheel filename distribution package name (#278)
The normalized name abstractions were not consistently, this PR uses
them where they were previously missing:
* `WheelFilename::distribution`
* `Requirement::name`
* `Requirement::extras`
* `Metadata21::name`
* `Metadata21::provides_dist`

With `puffin-package` depending on `pep508_rs` this would be cyclical
crate dependency, so `puffin-normalize` gets split out from
`puffin-package`.

`DistInfoName` has the same task and semantics as `PackageName`, so it's
merged into the latter.

`PackageName` and `ExtraName` documentation is moved onto the type and
their constructors are called `new` instead of `normalize`. We now use
these constructors rarely enough the implicit allocation by
`to_string()` shouldn't matter anymore, while more actual cloning
becomes visible.
2023-11-02 11:15:27 +00:00