Use ArcStr for marker values (#10453)
Some checks are pending
CI / build binary | linux (push) Blocked by required conditions
CI / build binary | macos aarch64 (push) Blocked by required conditions
CI / build binary | macos x86_64 (push) Blocked by required conditions
CI / build binary | windows (push) Blocked by required conditions
CI / cargo build (msrv) (push) Blocked by required conditions
CI / ecosystem test | prefecthq/prefect (push) Blocked by required conditions
CI / ecosystem test | pallets/flask (push) Blocked by required conditions
CI / integration test | conda on ubuntu (push) Blocked by required conditions
CI / integration test | free-threaded on linux (push) Blocked by required conditions
CI / integration test | free-threaded on windows (push) Blocked by required conditions
CI / integration test | pypy on ubuntu (push) Blocked by required conditions
CI / integration test | pypy on windows (push) Blocked by required conditions
CI / integration test | graalpy on ubuntu (push) Blocked by required conditions
CI / integration test | graalpy on windows (push) Blocked by required conditions
CI / integration test | github actions (push) Blocked by required conditions
CI / integration test | determine publish changes (push) Blocked by required conditions
CI / integration test | uv publish (push) Blocked by required conditions
CI / check cache | ubuntu (push) Blocked by required conditions
CI / check cache | macos aarch64 (push) Blocked by required conditions
CI / check system | python on debian (push) Blocked by required conditions
CI / check system | python on fedora (push) Blocked by required conditions
CI / check system | python on ubuntu (push) Blocked by required conditions
CI / check system | python on opensuse (push) Blocked by required conditions
CI / check system | python on rocky linux 8 (push) Blocked by required conditions
CI / check system | python on rocky linux 9 (push) Blocked by required conditions
CI / check system | pypy on ubuntu (push) Blocked by required conditions
CI / check system | pyston (push) Blocked by required conditions
CI / check system | alpine (push) Blocked by required conditions
CI / Determine changes (push) Waiting to run
CI / lint (push) Waiting to run
CI / cargo clippy | ubuntu (push) Blocked by required conditions
CI / cargo clippy | windows (push) Blocked by required conditions
CI / cargo dev generate-all (push) Blocked by required conditions
CI / cargo shear (push) Waiting to run
CI / build binary | freebsd (push) Blocked by required conditions
CI / cargo test | ubuntu (push) Blocked by required conditions
CI / cargo test | macos (push) Blocked by required conditions
CI / cargo test | windows (push) Blocked by required conditions
CI / check windows trampoline | aarch64 (push) Blocked by required conditions
CI / check windows trampoline | i686 (push) Blocked by required conditions
CI / check windows trampoline | x86_64 (push) Blocked by required conditions
CI / test windows trampoline | i686 (push) Blocked by required conditions
CI / test windows trampoline | x86_64 (push) Blocked by required conditions
CI / typos (push) Waiting to run
CI / mkdocs (push) Waiting to run
CI / check system | python on macos aarch64 (push) Blocked by required conditions
CI / check system | homebrew python on macos aarch64 (push) Blocked by required conditions
CI / check system | python on macos x86_64 (push) Blocked by required conditions
CI / check system | python3.10 on windows (push) Blocked by required conditions
CI / check system | python3.10 on windows x86 (push) Blocked by required conditions
CI / check system | python3.13 on windows (push) Blocked by required conditions
CI / check system | python3.12 via chocolatey (push) Blocked by required conditions
CI / check system | python3.9 via pyenv (push) Blocked by required conditions
CI / check system | python3.13 (push) Blocked by required conditions
CI / check system | conda3.11 on linux (push) Blocked by required conditions
CI / check system | conda3.8 on linux (push) Blocked by required conditions
CI / check system | conda3.11 on macos (push) Blocked by required conditions
CI / check system | conda3.8 on macos (push) Blocked by required conditions
CI / check system | conda3.11 on windows (push) Blocked by required conditions
CI / check system | conda3.8 on windows (push) Blocked by required conditions
CI / check system | amazonlinux (push) Blocked by required conditions
CI / check system | embedded python3.10 on windows (push) Blocked by required conditions
CI / benchmarks (push) Blocked by required conditions

N.B. After fixing #10430, `ArcStr` became the fastest implementation
(and the gains were significantly reduced, down to 1-2%). See:
https://github.com/astral-sh/uv/pull/10453#issuecomment-2583344414.

## Summary

I tried out a variety of small string crates, but `Arc<str>`
outperformed them, giving a ~10% speed-up:

```console
❯ hyperfine "../arcstr lock" "../flexstr lock" "uv lock" "../arc lock" "../compact_str lock" --prepare "rm -f uv.lock" --min-runs 50 --warmup 20
Benchmark 1: ../arcstr lock
  Time (mean ± σ):     304.6 ms ±   2.3 ms    [User: 302.9 ms, System: 117.8 ms]
  Range (min … max):   299.0 ms … 311.3 ms    50 runs

Benchmark 2: ../flexstr lock
  Time (mean ± σ):     319.2 ms ±   1.7 ms    [User: 317.7 ms, System: 118.2 ms]
  Range (min … max):   316.8 ms … 323.3 ms    50 runs

Benchmark 3: uv lock
  Time (mean ± σ):     330.6 ms ±   1.5 ms    [User: 328.1 ms, System: 139.3 ms]
  Range (min … max):   326.6 ms … 334.2 ms    50 runs

Benchmark 4: ../arc lock
  Time (mean ± σ):     303.0 ms ±   1.2 ms    [User: 301.6 ms, System: 118.4 ms]
  Range (min … max):   300.3 ms … 305.3 ms    50 runs

Benchmark 5: ../compact_str lock
  Time (mean ± σ):     320.4 ms ±   2.0 ms    [User: 318.7 ms, System: 120.8 ms]
  Range (min … max):   317.3 ms … 326.7 ms    50 runs

Summary
  ../arc lock ran
    1.01 ± 0.01 times faster than ../arcstr lock
    1.05 ± 0.01 times faster than ../flexstr lock
    1.06 ± 0.01 times faster than ../compact_str lock
    1.09 ± 0.01 times faster than uv lock
```
This commit is contained in:
Charlie Marsh 2025-01-10 15:15:12 -05:00 committed by GitHub
parent 7a21b713b4
commit 8420195aa7
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
12 changed files with 80 additions and 60 deletions

3
Cargo.lock generated
View file

@ -5013,6 +5013,7 @@ name = "uv-distribution-types"
version = "0.0.1"
dependencies = [
"anyhow",
"arcstr",
"bitflags 2.6.0",
"fs-err 3.0.0",
"itertools 0.14.0",
@ -5272,6 +5273,7 @@ dependencies = [
name = "uv-pep508"
version = "0.6.0"
dependencies = [
"arcstr",
"boxcar",
"indexmap",
"insta",
@ -5508,6 +5510,7 @@ dependencies = [
name = "uv-resolver"
version = "0.0.1"
dependencies = [
"arcstr",
"clap",
"dashmap",
"either",

View file

@ -29,6 +29,7 @@ uv-platform-tags = { workspace = true }
uv-pypi-types = { workspace = true }
anyhow = { workspace = true }
arcstr = { workspace = true }
bitflags = { workspace = true }
fs-err = { workspace = true }
itertools = { workspace = true }

View file

@ -1,5 +1,6 @@
use std::fmt::{Display, Formatter};
use arcstr::ArcStr;
use tracing::debug;
use uv_distribution_filename::{BuildTag, WheelFilename};
@ -662,12 +663,12 @@ pub fn implied_markers(filename: &WheelFilename) -> MarkerTree {
let mut tag_marker = MarkerTree::expression(MarkerExpression::String {
key: MarkerValueString::SysPlatform,
operator: MarkerOperator::Equal,
value: "win32".to_string(),
value: arcstr::literal!("win32"),
});
tag_marker.and(MarkerTree::expression(MarkerExpression::String {
key: MarkerValueString::PlatformMachine,
operator: MarkerOperator::Equal,
value: "x86".to_string(),
value: arcstr::literal!("x86"),
}));
marker.or(tag_marker);
}
@ -675,12 +676,12 @@ pub fn implied_markers(filename: &WheelFilename) -> MarkerTree {
let mut tag_marker = MarkerTree::expression(MarkerExpression::String {
key: MarkerValueString::SysPlatform,
operator: MarkerOperator::Equal,
value: "win32".to_string(),
value: arcstr::literal!("win32"),
});
tag_marker.and(MarkerTree::expression(MarkerExpression::String {
key: MarkerValueString::PlatformMachine,
operator: MarkerOperator::Equal,
value: "x86_64".to_string(),
value: arcstr::literal!("x86_64"),
}));
marker.or(tag_marker);
}
@ -688,12 +689,12 @@ pub fn implied_markers(filename: &WheelFilename) -> MarkerTree {
let mut tag_marker = MarkerTree::expression(MarkerExpression::String {
key: MarkerValueString::SysPlatform,
operator: MarkerOperator::Equal,
value: "win32".to_string(),
value: arcstr::literal!("win32"),
});
tag_marker.and(MarkerTree::expression(MarkerExpression::String {
key: MarkerValueString::PlatformMachine,
operator: MarkerOperator::Equal,
value: "arm64".to_string(),
value: arcstr::literal!("arm64"),
}));
marker.or(tag_marker);
}
@ -703,7 +704,7 @@ pub fn implied_markers(filename: &WheelFilename) -> MarkerTree {
let mut tag_marker = MarkerTree::expression(MarkerExpression::String {
key: MarkerValueString::SysPlatform,
operator: MarkerOperator::Equal,
value: "darwin".to_string(),
value: arcstr::literal!("darwin"),
});
// Parse the macOS version from the tag.
@ -787,7 +788,7 @@ pub fn implied_markers(filename: &WheelFilename) -> MarkerTree {
arch_marker.or(MarkerTree::expression(MarkerExpression::String {
key: MarkerValueString::PlatformMachine,
operator: MarkerOperator::Equal,
value: (*arch).to_string(),
value: ArcStr::from(*arch),
}));
}
tag_marker.and(arch_marker);
@ -800,7 +801,7 @@ pub fn implied_markers(filename: &WheelFilename) -> MarkerTree {
let mut tag_marker = MarkerTree::expression(MarkerExpression::String {
key: MarkerValueString::SysPlatform,
operator: MarkerOperator::Equal,
value: "linux".to_string(),
value: arcstr::literal!("linux"),
});
// Parse the architecture from the tag.
@ -866,7 +867,7 @@ pub fn implied_markers(filename: &WheelFilename) -> MarkerTree {
tag_marker.and(MarkerTree::expression(MarkerExpression::String {
key: MarkerValueString::PlatformMachine,
operator: MarkerOperator::Equal,
value: arch.to_string(),
value: ArcStr::from(arch),
}));
marker.or(tag_marker);

View file

@ -23,6 +23,7 @@ uv-fs = { workspace = true }
uv-normalize = { workspace = true }
uv-pep440 = { workspace = true }
arcstr = { workspace = true}
boxcar = { workspace = true }
indexmap = { workspace = true }
itertools = { workspace = true }

View file

@ -1334,17 +1334,17 @@ mod tests {
let mut b = MarkerTree::expression(MarkerExpression::String {
key: MarkerValueString::SysPlatform,
operator: MarkerOperator::Equal,
value: "win32".to_string(),
value: arcstr::literal!("win32"),
});
let mut c = MarkerTree::expression(MarkerExpression::String {
key: MarkerValueString::OsName,
operator: MarkerOperator::Equal,
value: "linux".to_string(),
value: arcstr::literal!("linux"),
});
let d = MarkerTree::expression(MarkerExpression::String {
key: MarkerValueString::ImplementationName,
operator: MarkerOperator::Equal,
value: "cpython".to_string(),
value: arcstr::literal!("cpython"),
});
c.and(d);

View file

@ -51,6 +51,7 @@ use std::ops::Bound;
use std::sync::Mutex;
use std::sync::MutexGuard;
use arcstr::ArcStr;
use itertools::{Either, Itertools};
use rustc_hash::FxHashMap;
use std::sync::LazyLock;
@ -287,28 +288,31 @@ impl InternerGuard<'_> {
// values in `exclusions`.
//
// See: https://discuss.python.org/t/clarify-usage-of-platform-system/70900
let (key, value) = match (key, value.as_str()) {
(MarkerValueString::PlatformSystem, "Windows") => {
(CanonicalMarkerValueString::SysPlatform, "win32".to_string())
}
let (key, value) = match (key, value.as_ref()) {
(MarkerValueString::PlatformSystem, "Windows") => (
CanonicalMarkerValueString::SysPlatform,
arcstr::literal!("win32"),
),
(MarkerValueString::PlatformSystem, "Darwin") => (
CanonicalMarkerValueString::SysPlatform,
"darwin".to_string(),
arcstr::literal!("darwin"),
),
(MarkerValueString::PlatformSystem, "Linux") => (
CanonicalMarkerValueString::SysPlatform,
arcstr::literal!("linux"),
),
(MarkerValueString::PlatformSystem, "AIX") => (
CanonicalMarkerValueString::SysPlatform,
arcstr::literal!("aix"),
),
(MarkerValueString::PlatformSystem, "Linux") => {
(CanonicalMarkerValueString::SysPlatform, "linux".to_string())
}
(MarkerValueString::PlatformSystem, "AIX") => {
(CanonicalMarkerValueString::SysPlatform, "aix".to_string())
}
(MarkerValueString::PlatformSystem, "Emscripten") => (
CanonicalMarkerValueString::SysPlatform,
"emscripten".to_string(),
arcstr::literal!("emscripten"),
),
// See: https://peps.python.org/pep-0738/#sys
(MarkerValueString::PlatformSystem, "Android") => (
CanonicalMarkerValueString::SysPlatform,
"android".to_string(),
arcstr::literal!("android"),
),
_ => (key.into(), value),
};
@ -869,48 +873,48 @@ impl InternerGuard<'_> {
MarkerExpression::String {
key: MarkerValueString::OsName,
operator: MarkerOperator::Equal,
value: "nt".to_string(),
value: arcstr::literal!("nt"),
},
MarkerExpression::String {
key: MarkerValueString::SysPlatform,
operator: MarkerOperator::Equal,
value: "linux".to_string(),
value: arcstr::literal!("linux"),
},
),
(
MarkerExpression::String {
key: MarkerValueString::OsName,
operator: MarkerOperator::Equal,
value: "nt".to_string(),
value: arcstr::literal!("nt"),
},
MarkerExpression::String {
key: MarkerValueString::SysPlatform,
operator: MarkerOperator::Equal,
value: "darwin".to_string(),
value: arcstr::literal!("darwin"),
},
),
(
MarkerExpression::String {
key: MarkerValueString::OsName,
operator: MarkerOperator::Equal,
value: "nt".to_string(),
value: arcstr::literal!("nt"),
},
MarkerExpression::String {
key: MarkerValueString::SysPlatform,
operator: MarkerOperator::Equal,
value: "ios".to_string(),
value: arcstr::literal!("ios"),
},
),
(
MarkerExpression::String {
key: MarkerValueString::OsName,
operator: MarkerOperator::Equal,
value: "posix".to_string(),
value: arcstr::literal!("posix"),
},
MarkerExpression::String {
key: MarkerValueString::SysPlatform,
operator: MarkerOperator::Equal,
value: "win32".to_string(),
value: arcstr::literal!("win32"),
},
),
];
@ -950,12 +954,12 @@ impl InternerGuard<'_> {
MarkerExpression::String {
key: MarkerValueString::PlatformSystem,
operator: MarkerOperator::Equal,
value: platform_system.to_string(),
value: ArcStr::from(platform_system),
},
MarkerExpression::String {
key: MarkerValueString::SysPlatform,
operator: MarkerOperator::Equal,
value: sys_platform.to_string(),
value: ArcStr::from(sys_platform),
},
));
}
@ -996,13 +1000,13 @@ pub(crate) enum Variable {
/// string marker and value.
In {
key: CanonicalMarkerValueString,
value: String,
value: ArcStr,
},
/// A variable representing a `<value> in <key>` expression for a particular
/// string marker and value.
Contains {
key: CanonicalMarkerValueString,
value: String,
value: ArcStr,
},
/// A variable representing the existence or absence of a given extra.
///
@ -1128,7 +1132,7 @@ pub(crate) enum Edges {
// Invariant: All ranges are simple, meaning they can be represented by a bounded
// interval without gaps. Additionally, there are at least two edges in the set.
String {
edges: SmallVec<(Ranges<String>, NodeId)>,
edges: SmallVec<(Ranges<ArcStr>, NodeId)>,
},
// The edges of a boolean variable, representing the values `true` (the `high` child)
// and `false` (the `low` child).
@ -1158,8 +1162,8 @@ impl Edges {
///
/// This function will panic for the `In` and `Contains` marker operators, which
/// should be represented as separate boolean variables.
fn from_string(operator: MarkerOperator, value: String) -> Edges {
let range: Ranges<String> = match operator {
fn from_string(operator: MarkerOperator, value: ArcStr) -> Edges {
let range: Ranges<ArcStr> = match operator {
MarkerOperator::Equal => Ranges::singleton(value),
MarkerOperator::NotEqual => Ranges::singleton(value).complement(),
MarkerOperator::GreaterThan => Ranges::strictly_higher_than(value),

View file

@ -1,5 +1,5 @@
use arcstr::ArcStr;
use std::str::FromStr;
use uv_normalize::ExtraName;
use uv_pep440::{Version, VersionPattern, VersionSpecifier};
@ -92,7 +92,7 @@ pub(crate) fn parse_marker_value<T: Pep508Url>(
Some((start_pos, quotation_mark @ ('"' | '\''))) => {
cursor.next();
let (start, len) = cursor.take_while(|c| c != quotation_mark);
let value = cursor.slice(start, len).to_string();
let value = ArcStr::from(cursor.slice(start, len));
cursor.next_expect_char(quotation_mark, start_pos)?;
Ok(MarkerValue::QuotedString(value))
}

View file

@ -1,12 +1,14 @@
use std::fmt;
use std::ops::Bound;
use arcstr::ArcStr;
use indexmap::IndexMap;
use itertools::Itertools;
use rustc_hash::FxBuildHasher;
use uv_pep440::{Version, VersionSpecifier};
use version_ranges::Ranges;
use uv_pep440::{Version, VersionSpecifier};
use crate::{ExtraOperator, MarkerExpression, MarkerOperator, MarkerTree, MarkerTreeKind};
/// Returns a simplified DNF expression for a given marker tree.
@ -131,7 +133,7 @@ fn collect_dnf(
let expr = MarkerExpression::String {
key: marker.key().into(),
value: marker.value().to_owned(),
value: ArcStr::from(marker.value()),
operator,
};
@ -150,7 +152,7 @@ fn collect_dnf(
let expr = MarkerExpression::String {
key: marker.key().into(),
value: marker.value().to_owned(),
value: ArcStr::from(marker.value()),
operator,
};

View file

@ -3,6 +3,7 @@ use std::fmt::{self, Display, Formatter};
use std::ops::{Bound, Deref};
use std::str::FromStr;
use arcstr::ArcStr;
use itertools::Itertools;
use serde::{de, Deserialize, Deserializer, Serialize, Serializer};
use version_ranges::Ranges;
@ -129,7 +130,7 @@ pub enum MarkerValue {
/// `extra`. This one is special because it's a list and not env but user given
Extra,
/// Not a constant, but a user given quoted string with a value inside such as '3.8' or "windows"
QuotedString(String),
QuotedString(ArcStr),
}
impl FromStr for MarkerValue {
@ -272,8 +273,8 @@ impl MarkerOperator {
/// Returns the marker operator and value whose union represents the given range.
pub fn from_bounds(
bounds: (&Bound<String>, &Bound<String>),
) -> impl Iterator<Item = (MarkerOperator, String)> {
bounds: (&Bound<ArcStr>, &Bound<ArcStr>),
) -> impl Iterator<Item = (MarkerOperator, ArcStr)> {
let (b1, b2) = match bounds {
(Bound::Included(v1), Bound::Included(v2)) if v1 == v2 => {
(Some((MarkerOperator::Equal, v1.clone())), None)
@ -291,7 +292,7 @@ impl MarkerOperator {
}
/// Returns a value specifier representing the given lower bound.
pub fn from_lower_bound(bound: &Bound<String>) -> Option<(MarkerOperator, String)> {
pub fn from_lower_bound(bound: &Bound<ArcStr>) -> Option<(MarkerOperator, ArcStr)> {
match bound {
Bound::Included(value) => Some((MarkerOperator::GreaterEqual, value.clone())),
Bound::Excluded(value) => Some((MarkerOperator::GreaterThan, value.clone())),
@ -300,7 +301,7 @@ impl MarkerOperator {
}
/// Returns a value specifier representing the given upper bound.
pub fn from_upper_bound(bound: &Bound<String>) -> Option<(MarkerOperator, String)> {
pub fn from_upper_bound(bound: &Bound<ArcStr>) -> Option<(MarkerOperator, ArcStr)> {
match bound {
Bound::Included(value) => Some((MarkerOperator::LessEqual, value.clone())),
Bound::Excluded(value) => Some((MarkerOperator::LessThan, value.clone())),
@ -485,7 +486,7 @@ pub enum MarkerExpression {
String {
key: MarkerValueString,
operator: MarkerOperator,
value: String,
value: ArcStr,
},
/// `extra <extra op> '...'` or `'...' <extra op> extra`.
Extra {
@ -1383,7 +1384,7 @@ impl Ord for VersionMarkerTree<'_> {
pub struct StringMarkerTree<'a> {
id: NodeId,
key: CanonicalMarkerValueString,
map: &'a [(Ranges<String>, NodeId)],
map: &'a [(Ranges<ArcStr>, NodeId)],
}
impl StringMarkerTree<'_> {
@ -1393,7 +1394,7 @@ impl StringMarkerTree<'_> {
}
/// The edges of this node, corresponding to possible output ranges of the given variable.
pub fn children(&self) -> impl ExactSizeIterator<Item = (&Ranges<String>, MarkerTree)> {
pub fn children(&self) -> impl ExactSizeIterator<Item = (&Ranges<ArcStr>, MarkerTree)> {
self.map
.iter()
.map(|(range, node)| (range, MarkerTree(node.negate(self.id))))
@ -1418,7 +1419,7 @@ impl Ord for StringMarkerTree<'_> {
#[derive(PartialEq, Eq, Clone, Debug)]
pub struct InMarkerTree<'a> {
key: CanonicalMarkerValueString,
value: &'a str,
value: &'a ArcStr,
high: NodeId,
low: NodeId,
}
@ -1430,7 +1431,7 @@ impl InMarkerTree<'_> {
}
/// The value (RHS) for this expression.
pub fn value(&self) -> &str {
pub fn value(&self) -> &ArcStr {
self.value
}
@ -1654,6 +1655,7 @@ mod test {
use std::str::FromStr;
use insta::assert_snapshot;
use uv_normalize::ExtraName;
use uv_pep440::Version;
@ -2041,7 +2043,7 @@ mod test {
MarkerExpression::String {
key: MarkerValueString::OsName,
operator: MarkerOperator::Equal,
value: "nt".to_string(),
value: arcstr::literal!("nt")
}
);
}

View file

@ -38,6 +38,7 @@ uv-types = { workspace = true }
uv-warnings = { workspace = true }
uv-workspace = { workspace = true }
arcstr = { workspace = true }
clap = { workspace = true, features = ["derive"], optional = true }
dashmap = { workspace = true }
either = { workspace = true }

View file

@ -7,6 +7,7 @@ use petgraph::{
Directed, Direction,
};
use rustc_hash::{FxBuildHasher, FxHashMap, FxHashSet};
use uv_configuration::{Constraints, Overrides};
use uv_distribution::Metadata;
use uv_distribution_types::{
@ -726,7 +727,7 @@ impl ResolverOutput {
MarkerExpression::String {
key: value_string.into(),
operator: MarkerOperator::Equal,
value: from_env.to_string(),
value: from_env.into(),
}
}
};

View file

@ -1424,11 +1424,15 @@ impl<InstalledPackages: InstalledPackagesProvider> ResolverState<InstalledPackag
// macOS. But if _neither_ version supports Intel macOS, we'd rather use `sys_platform == 'darwin'`
// instead of `sys_platform == 'darwin' and platform_machine == 'arm64'`, since it's much
// simpler, and _neither_ version will succeed with Intel macOS anyway.
for sys_platform in &["darwin", "linux", "win32"] {
for value in [
arcstr::literal!("darwin"),
arcstr::literal!("linux"),
arcstr::literal!("win32"),
] {
let sys_platform = MarkerTree::expression(MarkerExpression::String {
key: MarkerValueString::SysPlatform,
operator: MarkerOperator::Equal,
value: (*sys_platform).to_string(),
value,
});
if dist.implied_markers().is_disjoint(sys_platform)
&& !remainder.is_disjoint(sys_platform)