This PR tweaks uv to support reading `requirements.txt` regardless of
whether it is encoded as UTF-8 or UTF-16. This is particularly relevant
on Windows where `uv pip freeze > requirements.txt` will likely write a
UTF-16 encoded `requirements.txt` file.
There is some discussion on #1666 where it's suggested that perhaps
we should explicitly not support this. I didn't see that until I
had already put this PR together, but even so, I think it's worth
considering this. UTF-16 is predominant on Windows. It is very easy
to produce a UTF-16 encoded file. Moreover, there is an easy and well
specified way to recognize and transcode UTF-16 encoded data to UTF-8.
I think the downside of this is that it could encourage the use UTF-16
encoded `requirements.txt` files *in addition* to UTF-8 encoded
files, and it would probably be nice to converge and standardize on
one encoding. One possible alternative to this PR is that we provide
a better error message. Another alternative is to ensure that a
`-o/--output` flag exists for all commands (neither `uv pip freeze` nor
`pip freeze` have such a flag) so that users can always write output
to a file without relying on their environment's piping behavior.
(Although this last alternative seems a little sad to me.)
It's also worth noting the [PEP-0508] doesn't seem to mention file
encoding at all. So I think from a "do the standards allow this"
perspective, this change is OK.
Finally, `pip` itself seems to work with UTF-16 encoded
`requirements.txt` files.
I think I personally overall lean towards supporting UTF-16 for
`requirements.txt` files. In part because I think it smoothes out the
UX a little bit, in part because there is no obvious specification
(that I'm aware of) that mandates that these files are UTF-8, and
finally in part because `pip` supports it too.
Fixes#1666, Fixes#2276
[PEP-0508]: https://peps.python.org/pep-0508/
## Summary
Allow using http(s) urls for constraints and requirements files handed
to the CLI, by handling paths starting with `http://` or `https://`
differently. This allows commands for such as: `uv pip install -c
https://raw.githubusercontent.com/apache/airflow/constraints-2.8.1/constraints-3.8.txt
requests`.
closes#1332
## Test Plan
Testing install using a `constraints.txt` file hosted on github in the
airflow repository:
fbdc2eba8e/crates/uv/tests/pip_install.rs (L1440-L1484)
## Advice Needed
- filesystem/http dispatch is implemented at a relatively low level (at
`crates/uv-fs/src/lib.rs#read_to_string`). Should I change some naming
here so it is obvious that the function is able to dispatch?
- I kept the CLI argument for -c and -r as a PathBuf, even though now it
is technically either a path or a url. We could either keep this as is
for now, or implement a new enum for this case? The enum could then
handle dispatch to files/http.
- Using another abstraction layer like
https://docs.rs/object_store/latest/object_store/ for the
files/urls/[s3] could work as well, though I ran into a bug during
testing which I couldn't debug
## Summary
This PR expands environment variables in `-r` and `-c` paths _within_
requirements files. We already do this for `@` URL references and
others.
Closes https://github.com/astral-sh/uv/issues/1473.
## Summary
Closes#2012
This changes `RequirementsTxtParserError::Parser` to take a line, column
instead of cursor location to improve reporting of parser errors. A new
function was added to compute the line and column based on the content
and cursor location when a parser error occurs for simplicity.
Given `uv pip compile .\requirements.txt` of below
```
numpy>=1,<2
--borken
tqdm
```
Before:
```
error: Unexpected '-', expected '-c', '-e', '-r' or the start of a requirement in `.\requirements.txt` at position 14
```
After:
```
error: Unexpected '-', expected '-c', '-e', '-r' or the start of a requirement in `.\requirements.txt` at position 2:3
```
Open Question: Do we want to support `line:column` for other types of
errors? I didn't look dig other potential error types where this might
be desired.
## Test Plan
New test was added to `requirements-txt` crate with this example.
## Summary
This also preserves the environment variables in the output file, e.g.:
```
Resolved 1 package in 216ms
# This file was autogenerated by uv via the following command:
# uv pip compile requirements.in --emit-index-url
--index-url https://test.pypi.org/${SUFFIX}
requests==2.5.4.1
```
I'm torn on whether that's correct or undesirable here.
Closes#2035.
Address a few pedantic lints
lints are separated into separate commits so they can be reviewed
individually.
I've not added enforcement for any of these lints, but that could be
added if desirable.
## Summary
The main change is that we need to have an explicit list of protocols we
_do_ support (like `https`), so that when we see a Windows absolute path
(`C:\...`), we don't treat the `C` as a protocol itself.
Closes https://github.com/astral-sh/uv/issues/1539.
## Summary
In a `requirements.txt` file, it turns out that the `-c` and `-r`
entries should be interpreted as relative to the file in which they're
declared, while the `-e` entries should be interpreted as relative to
the current working directory, no matter where they're defined.
Previously, we always used the current working directory; now, we use
the declaring file's directory for `-c` and `-r`.
Closes https://github.com/astral-sh/uv/issues/1367.
Closes https://github.com/astral-sh/uv/issues/1416.
First, replace all usages in files in-place. I used my editor for this.
If someone wants to add a one-liner that'd be fun.
Then, update directory and file names:
```
# Run twice for nested directories
find . -type d -print0 | xargs -0 rename s/puffin/uv/g
find . -type d -print0 | xargs -0 rename s/puffin/uv/g
# Update files
find . -type f -print0 | xargs -0 rename s/puffin/uv/g
```
Then add all the files again
```
# Add all the files again
git add crates
git add python/uv
# This one needs a force-add
git add -f crates/uv-trampoline
```
## Summary
For PEP 517 builds, the current working directory needs to be set to the
directory of the source distribution. It turns out that on Windows, if
you use a UNC path for the working directory, then relative paths are
interpreted relative to the root of the current drive
([source](https://www.fileside.app/blog/2023-03-17_windows-file-paths/#paths-relative-to-the-root-of-the-current-drive)).
So, when builds attempted to resolve relative paths, they always
errored...
This PR ensures that we remove the UNC prefix when setting the current
working directory.
Closes#1238.
## Test Plan
I tested this on my Windows machine by installing `ujson` with
`--no-binary ujson`. (I don't want to add that specific test, since it's
really slow to build.)
## Summary
Like https://github.com/astral-sh/puffin/pull/1180, this PR adds logic
for `requirements.txt` parsing whereby if a requirement _looks like_ a
local requirements file or an editable directory, we prompt the user to
correct the error (typically, by adding `-r`).
## Summary
See: https://github.com/astral-sh/puffin/issues/1181.
## Test Plan
```
❯ cargo run -- pip install packse@../../zanieb/packse
Finished dev [unoptimized + debuginfo] target(s) in 0.15s
Running `target/debug/puffin pip install 'packse@../../zanieb/packse'`
error: Distribution not found at: file:///Users/crmarsh/zanieb/packse
```
## Summary
This PR adds support for `--find-links`, `--index-url`, and
`--extra-index-url` arguments when specified in a `requirements.txt`.
It's a mostly-straightforward change. The only uncertain piece is what
to do when multiple files include these flags, and/or when we include
them on the CLI and in other files.
In general:
- If _anything_ specifies `--no-index`, we respect it.
- We combine all `--extra-index-url` and `--find-links` across all
sources, since those are just vectors.
- If we see multiple `--index-url` in requirements files, we error.
- We respect the `--index-url` from the command line over any provided
in a requirements file.
(`pip-compile` seems to just pick one semi-arbitrarily when multiple are
provided.)
Closes https://github.com/astral-sh/puffin/issues/1143.
This PR adds initial support for [rkyv] to puffin. In particular,
the main aim here is to make puffin-client's `SimpleMetadata` type
possible to deserialize from a `&[u8]` without doing any copies. This
PR **stops short of actuallying doing that zero-copy deserialization**.
Instead, this PR is about adding the necessary trait impls to a variety
of types, along with a smattering of small refactorings to make rkyv
possible to use.
For those unfamiliar, rkyv works via the interplay of three traits:
`Archive`, `Serialize` and `Deserialize`. The usual flow of things is
this:
* Make a type `T` implement `Archive`, `Serialize` and `Deserialize`.
rkyv
helpfully provides `derive` macros to make this pretty painless in most
cases.
* The process of implementing `Archive` for `T` *usually* creates an
entirely
new distinct type within the same namespace. One can refer to this type
without naming it explicitly via `Archived<T>` (where `Archived` is a
clever
type alias defined by rkyv).
* Serialization happens from `T` to (conceptually) a `Vec<u8>`. The
serialization format is specifically designed to reflect the in-memory
layout
of `Archived<T>`. Notably, *not* `T`. But `Archived<T>`.
* One can then get an `Archived<T>` with no copying (albeit, we will
likely
need to incur some cost for validation) from the previously created
`&[u8]`.
This is quite literally [implemented as a pointer cast][rkyv-ptr-cast].
* The problem with an `Archived<T>` is that it isn't your `T`. It's
something
else. And while there is limited interoperability between a `T` and an
`Archived<T>`, the main issue is that the surrounding code generally
demands
a `T` and not an `Archived<T>`. **This is at the heart of the tension
for
introducing zero-copy deserialization, and this is mostly an intrinsic
problem to the technique and not an rkyv-specific issue.** For this
reason,
given an `Archived<T>`, one can get a `T` back via an explicit
deserialization step. This step is like any other kind of
deserialization,
although generally faster since no real "parsing" is required. But it
will
allocate and create all necessary objects.
This PR largely proceeds by deriving the three aforementioned traits
for `SimpleMetadata`. And, of course, all of its type dependencies. But
we stop there for now.
The main issue with carrying this work forward so that rkyv is actually
used to deserialize a `SimpleMetadata` is figuring out how to deal
with `DataWithCachePolicy` inside of the cached client. Ideally, this
type would itself have rkyv support, but adding it is difficult. The
main difficulty lay in the fact that its `CachePolicy` type is opaque,
not easily constructable and is internally the tip of the iceberg of
a rat's nest of types found in more crates such as `http`. While one
"dumb"-but-annoying approach would be to fork both of those crates
and add rkyv trait impls to all necessary types, it is my belief that
this is the wrong approach. What we'd *like* to do is not just use
rkyv to deserialize a `DataWithCachePolicy`, but we'd actually like to
get an `Archived<DataWithCachePolicy>` and make actual decisions used
the archived type directly. Doing that will require some work to make
`Archived<DataWithCachePolicy>` directly useful.
My suspicion is that, after doing the above, we may want to mush
forward with a similar approach for `SimpleMetadata`. That is, we want
`Archived<SimpleMetadata>` to be as useful as possible. But right
now, the structure of the code demands an eager conversion (and thus
deserialization) into a `SimpleMetadata` and then into a `VersionMap`.
Getting rid of that eagerness is, I think, the next step after dealing
with `DataWithCachePolicy` to unlock bigger wins here.
There are many commits in this PR, but most are tiny. I still encourage
review to happen commit-by-commit.
[rkyv]: https://rkyv.org/
[rkyv-ptr-cast]:
https://docs.rs/rkyv/latest/src/rkyv/util/mod.rs.html#63-68
## Summary
This PR adds a `NormalizedDisplay` trait that we can use for user-facing
paths, to strip the UNC prefix on Windows.
On other platforms, the implementation is a no-op (vs. `Display`).
I audited all usages of `.display()`, and changed any that were
user-facing, either via `println!` or `eprintln!`, or by way of being
included in error messages. I did _not_ change uses that were only in
tests or only went to tracing.
Closes https://github.com/astral-sh/puffin/issues/1084.
## Summary
First batch of changes for windows support. Notable changes:
* Fixes all compile errors and added windows specific paths.
* Working venv creation on windows, both from a base interpreter and
from a venv. This requires querying `stdlib` from the sysconfig paths to
find the launcher.
* Basic url/path conversion handling for windows.
* `if cfg!(...)` instead of `#[cfg()]`. This should make it easier to
keep everything compiling across platforms.
## Outlook
Test summary: 402 tests run: 299 passed (15 slow), 103 failed, 1 skipped
There are various reason for the remaining test failure:
* Windows-specific colorama and tzdata dependencies that change the
snapshot slightly. This is by far the biggest batch.
* Some url-path handling issues. I fixed some in the PR, some remain.
* Lack of the latest python patch versions for older pythons on my
machine, since there are no builds for windows and we need to register
them in the registry for them to be picked up for `py --list-paths` (CC
@zanieb RE #1070).
* Lack of entrypoint launchers.
* ... likely more
This PR attempts to fix a common footgun in `requirements.txt` files.
Previously, to provide a file, you had to use `package_name @
file:///Users/crmarsh/...` -- in other words, an absolute path.
Now, these requirements follow the exact same rules as editables, so you
can do:
```
package_name @ ./file.zip
```
And similar.
The way the parsing is setup, this is intentionally _not_ supported when
reading metadata -- only when parsing `requirements.txt` directly.
Closes#984.
## Summary
Based on user feedback. Calling it a "parse error" is misleading, since
this is really something we don't support, but that users can work
around.
## Summary
I don't know if this is actually a good change, but it tries to make the
editable install experience more consistent. Specifically, we now
support...
```
# Use a relative path with a `file://` prefix.
# Prior to this PR, we supported `file:../foo`, but not `file://../foo`, which felt inconsistent.
-e file://../foo
# Use environment variables with paths, not just URLs.
# Prior to this PR, we supported `file://${PROJECT_ROOT}/../foo`, but not the below.
-e ${PROJECT_ROOT}/../foo
```
Importantly, `-e file://../foo` is actually not supported by pip... `-e
file:../foo` _is_ supported though. We support both, as of this PR. Open
to feedback.
We now show the fully-resolved URL, rather than the URL as given by the
user, _everywhere_ except for the output resolution file (which should
retain relative paths, unexpanded environment variables, etc.).
Closes https://github.com/astral-sh/puffin/issues/687.
## Summary
This PR adds a `VerbatimUrl` struct to preserve verbatim URLs throughout
the resolution and installation pipeline. In short, alongside the parsed
`Url`, we also keep the URL as written by the user. This enables us to
display the URL exactly as written by the user, rather than the
serialized path that we use internally.
This will be especially useful once we start expanding environment
variables since, at that point, we'll be able to write the version of
the URL that includes the _unexpected_ environment variable to the
output file.
Currently, `dbg!` is hard to read because versions are verbose, showing
all optional fields, and we have a lot of versions. Changing debug
formatting to displaying the version number (which can be losslessly
converted to the struct and back) makes this more readable.
See e.g.
https://gist.github.com/konstin/38c0f32b109dffa73b3aa0ab86b9662b
**Before**
```text
version: Version {
epoch: 0,
release: [
1,
2,
3,
],
pre: None,
post: None,
dev: None,
local: None,
},
```
**After**
```text
version: "1.2.3",
```