This commit writes out the Serde helpers for unsigned durations.
This is the pay off for adding the dedicated parsers/printers for a
`std::time::Duration`. Previously, we were converting to/from a
`SignedDuration`, which fundamentally limited the range of values we
could support.
Closes#380, Ref #298
I think when I originally wrote fractional formatting, I was thinking
about using precision values greater than `9`. But that never really
panned out. And I used signed values because I used signed values
everywhere.
I think this is the last change we need to be able to feasibly add
native `std::time::Duration` support.
... so that we can print `u64` values as-is without suspicious casts.
Unsigned integer printing is also simpler, so it's plausible that it's
faster too. Although I haven't been careful about benchmarking duration
printing.
In the next commit, we will finish removing the casts by tweaking
fractional printing.
This is continued progress toward natively supporting
`std::time::Duration`. I basically didn't want to duplicate all of the
code used to print a `SignedDuration` just so that we could print a
`std::time::Duration`. Since we handle the sign before/after most of the
meat of printing, we can actually lower a `SignedDuration` to a
`std::time::Duration` and focus most of our printing logic from that.
There are some papercuts in this commit though. Notably, some casts
between `u64` and `i64`, since Jiff's lowest level integer printing
routine requires a `u64`. We'll fix that in the next commit.
Thank goodness I decided to make `SignedDuration` mimic
`std::time::Duration` exactly (except for being signed). That made this
change super easy.
This only applies to parsing into a `SignedDuration` since `i64::MIN`
isn't supported by any units on `Span` (including nanoseconds, since it
only supports `(i64::MIN+1)..=i64::MAX` to make negations infallible).
We pull out the u64 prefix parser from the friendly duration and use
that. Since that does the parsing in a single pass and is overall
generally tighter, this also leads to a nice perf improvement on ISO
8601 duration parsing:
group master new
----- ------ ---
parse/iso8601_duration/long-time/duration/jiff 1.14 27.6±0.32ns ? ?/sec 1.00 24.2±0.24ns ? ?/sec
parse/iso8601_duration/long-time/span/jiff 1.21 76.6±0.17ns ? ?/sec 1.00 63.2±0.24ns ? ?/sec
parse/iso8601_duration/long/span/jiff 2.14 101.8±0.21ns ? ?/sec 1.00 47.4±0.24ns ? ?/sec
parse/iso8601_duration/medium/span/jiff 1.71 60.3±0.12ns ? ?/sec 1.00 35.2±0.12ns ? ?/sec
parse/iso8601_duration/short/duration/jiff 1.56 12.1±0.22ns ? ?/sec 1.00 7.8±0.13ns ? ?/sec
parse/iso8601_duration/short/span/jiff 1.50 47.7±0.12ns ? ?/sec 1.00 31.9±0.08ns ? ?/sec
parse/iso8601_duration/tiny/duration/jiff 1.17 6.7±0.10ns ? ?/sec 1.00 5.7±0.03ns ? ?/sec
parse/iso8601_duration/tiny/span/jiff 1.19 33.1±0.08ns ? ?/sec 1.00 27.8±0.10ns ? ?/sec
The results above are compared against current `master`
(`211a36d5ecddf1aeb9b00648d9d5e631a5420903`).
The "before" here reflects `master`
(`211a36d5ec`) and not the previous
commit:
group master new
----- ------ ---
parse/friendly/longest-time/duration/jiff 1.29 41.9±0.69ns ? ?/sec 1.00 32.4±0.42ns ? ?/sec
parse/friendly/short/duration/jiff 1.14 15.7±0.27ns ? ?/sec 1.00 13.8±0.30ns ? ?/sec
parse/friendly/tiny/duration/jiff 1.23 8.1±0.36ns ? ?/sec 1.00 6.6±0.11ns ? ?/sec
parse/iso8601_duration/long-time/duration/jiff 1.00 27.6±0.32ns ? ?/sec 1.06 29.3±0.27ns ? ?/sec
parse/iso8601_duration/short/duration/jiff 1.05 12.1±0.22ns ? ?/sec 1.00 11.5±0.20ns ? ?/sec
parse/iso8601_duration/tiny/duration/jiff 1.00 6.7±0.10ns ? ?/sec 1.15 7.7±0.09ns ? ?/sec
There are some minor regressions, but also some wins, particularly for
the friendly format.
I mostly focused on optimizing the "turn parsed values into a
`SignedDuration`" part. I used a data dependent optimization that
quickly detects whether we can avoid error handling entirely.
We could probably optimize the general case a bit more too. But this is
enough for now.
This introduces a new internal helper type, `DurationUnits`, that is
used to parse all duration types and formats. It centralizes the logic
and trims away a lot of fat. This should reduce code size (although I
haven't checked) and also improve perf. Currently, it does significantly
improve perf for parsing longer durations and especially for `Span`. It
does however slightly regress perf for parsing shorter `SignedDuration`
values.
However, I think there are a lot of perf improvement opportunities. I'll
put those in a subsequent commit.
(We still haven't implemented `std::time::Duration` yet. But
`DurationUnits` is clearly designed with it in mind. This is one helluva
yak shave!)
This also fixes a long-standing bug where we couldn't parse
`abs(i64::MIN) secs ago`. The `DurationUnits` design was specifically
motivated (in part) by this.
Specifically, the ranged integer used when parsing fractional
values. We'll want to do this in order to switch over to parsing
durations into a single shared unified type.
In particular, `NoUnits` and `NoUnits128`.
I dare say that this might be (the very) beginning of getting
away from ranged integers. The specific motivation here is that
we want to be able to parse `std::time::Duration`. And to do that,
we need to be able to parse a `u64`. But before this commit, we
were parsing into a `t::NoUnits`, which is an `i64`. And we don't
have unsigned ranged integers and I refused to go down that path.
So we need to use a `u64` because I refuse to use an `i128` any
time we want to parse an integer.
... and also a smattering of internal helpers.
The `SignedDuration::from_nanos_i128` constructor in particular
is analogous to `std::time::Duration::from_nanos_u128` coming to
a `std` near you. The rest are just a logical continuation.
This also adds some "get units with a remainder" methods that
I think will be useful in simplifying some of the duration
parsing code.
I think this is a more natural place for them. In particular,
I kept going to the top of the file to look for them and then
needing to scroll down through a bunch of docs.
This brings the code for parsing duration strings (ISO 8601 and
friendly) into `SignedDuration` or `Span` into more alignment.
There are still some annoying differences. And some bugs I
(re)discovered. They are marked as TODOs in this commit.
The main motivation here is to support parsing into
`std::time::Duration`. A third copy of this code was too unappealing to
me, so I'm trying to refactor the code to make it tighter and less
redundant.
This is a slight refactor to reduce a little duplication.
I don't mind duplication, but in this case, I was wondering why
this block was repeated and had to very carefully scrutinize the
code to follow it and make sure there weren't any differences. DRY
in this case I think makes it easier to understand.
I'm kind of surprised I hadn't added these yet. But I think among Jiff,
Chrono and `time`, Jiff is the only one to support them.
I'm going to be touching the implementation (in persuit of adding
support for parsing/printing `std::time::Duration` directly), so I want
to make sure I am at least not regressing perf.
This test is failing intermittently on macOS. Which means that the
initialization of the zoneinfo database is taking more than 500ms. I had
scrutinized the CI logs and timestamps, and it looks like it's just
random I/O delay when traversing the zoneinfo directory. Sigh.
See the comment in the code for explanation, but basically, this comes
from S 3.3.1 of RFC 9636:
> 3.3.1. All-Year Daylight Saving Time
>
> DST is considered to be in effect all year if its UT offset is less
> than (i.e., west of) that of standard time, and it starts January 1
> at 00:00 and ends December 31 at 24:00 minus the difference between
> standard and daylight saving time, leaving no room for standard time
> in the calendar. [POSIX] implies but does not explicitly state this,
> so it is spelled out here for clarity.
>
> Example: XXX3EDT4,0/0,J365/23
> This represents a time zone that is perpetually 4 hours west of UT
> and is abbreviated "EDT". The "XXX" is ignored.
These odd time zones can occur when compiling TZif files using `zic`
with rearguard semantics. In particular, this applies for times at or
after 2087 in `Africa/Casablanca`. Rearguard semantics are used by
Jiff's `jiff-tzdb` crate, but aren't always otherwise used in standard
`/usr/share/zoneinfo` installations. As a result, this bug's effects
cannot always be consistently observed.
Fixes#386
It looks like I only captured some of them. But there were seveal I
omitted.
(This is why I don't keep a CHANGELOG in most projects. It adds a lot of
maintenance overhead.)
Previously, we only had impls for `&Zoned`. These impls are a little
weird because `Zoned` isn't `Copy`. So using them will result in
consuming the `Zoned` value. But if that's okay, then having these impls
is a nice quality of life improvement instead of a papercut. In the
worst case, Rust programmers will need to add a `&` to avoid consuming
the `Zoned` value. But I think Rust programmers are pretty used to that.
Fixes#396
Previously, if `%s` was parsed and there was no offset (`%z`) or IANA
time zone identifier (`%Q`), then trying to pull a `Zoned` out of a
`BrokenDownTime` would fail. But arguably, this should work. Namely, the
`strptime` APIs are already very relaxed about returning a `Zoned` with
fixed offset time zones (where the normal `FromStr` APIs forbid it due
to footguns). And since we have a way of representing an unknown time
zone, we can do exactly that.
Specifically, all of the other higher level constructors on
`BrokenDownTime` already try their "hardest" to produce the
requested value. Notably, both `BrokenDownTime::to_date` and
`BrokenDownTime::to_time` go to some effort to achieve this.
However, `BrokenDownTime::to_timestamp` wasn't doing everything it
could. Namely, it was completely ignoring its internal `timestamp`
field. This is weird and inconsistent from how the rest of the
constructors worked.
I think this is overall more flexible as well. Because it means that
parsing `%s` is, at the level of the individual pieces of data inside a
`BrokenDownTime`, completely separate from everything else. So you don't
have any weird overwriting behavior at the lowest levels.
I started thinking about this in #428. While I don't think the
motivation in #428 is that compelling, it got me thinking more deeply
about how a `BrokenDownTime` deals with timestamps. I think we got to
this state because I bolted `%s` on to `fmt::strtime` and didn't think
deeply about it.
Fixes#428
Previously, these would fail if the year wasn't in the range
`1969-2068`. This was done because parsing fails for the same reason,
and it seemed like good sense to have parsing and formatting be
consistent with one another.
... however, this is the *only* formatting specifier that can yield an
error based on the specific values provided. In contrast, all other
errors are tied to either the validity of the formatting string or if
all relevant data is provided (i.e., you get an error if you use `%y`
and your `BrokenDownTime` doesn't have a way to get the year).
I don't think this aberration is worth it. I think it's better to be
able to say the two broad error categories and be able to stick to them.
This should also hopefully avoid accidental panics, since now the only
error conditions that result in, say, `Zoned::strftime` panicking are
limited to the format string itself.
Fixes#428
Previously, it was possible for lenient formatting to still error when
the formatting string contained invalid UTF-8. This commit alleviates
that error condition by writing a replacement character instead. This
required tweaking our UTF-8 decoding routines so that we could implement
the "substitution by maximal subparts" strategy for lossy decoding.
This now means that lenient formatting can only fail when writing to the
underlying writer fails.
I had been uncertain about how this should be implemented in the face of
truly aberrant time zone transitions (like 2011-12-30 being missing from
`Pacific/Apia`). But this particular method is incredibly useful _and_
is present on all of the other datetime types. So I think it's important
enough to add, even if it may have somewhat odd semantics in some cases.
In particular, if we implement `Zoned::series` like we do
`civil::DateTime::series`, then a time series over spans of `1 day`
starting on `2011-12-29` in `Pacific/Apia` will result in yielding
`2011-12-31` twice. This is because `2011-12-29 + 1 day` and
`2011-12-29 + 2 days` are equivalent (i.e., `2011-12-30`) in
`Pacific/Apia`. Indeed, this behavior is copied from Temporal which
behaves the same:
>>> zdt = Temporal.ZonedDateTime.from("2011-12-29[Pacific/Apia]")
Object { … }
>>> zdt.toString()
"2011-12-29T00:00:00-10:00[Pacific/Apia]"
>>> zdt.add({days: 1}).toString()
"2011-12-31T00:00:00+14:00[Pacific/Apia]"
>>> zdt.add({days: 2}).toString()
"2011-12-31T00:00:00+14:00[Pacific/Apia]"
I think yielding the same instant twice in a row is perhaps very odd and
unexpected. Instead, the iterator guarantees that each item yielded by
the iterator is strictly greater (or strictly less for negative spans)
than the one that precedes it. This fixes the `Pacific/Apia` case while
preserving the same kind of semantics we get with
`civil::DateTime::series`.
Specifically, document what happens when there is conflicting data
parsed with `%s` and, say, `%Y-%m-%d`.
This is a good example of why `strptime` is kinda fucked up, but on the
same token, somewhat pragmatic in its utility.
In https://github.com/rust-lang/rust/pull/138907, the `doc_auto_cfg`
feature was subsumed by `doc_cfg`. This does overall looks like we're on
a path toward stabilization, which is great.
One problem here though is that a bunch of crates use the `cfg`
`docsrs` to enable `doc_auto_cfg`. So if we enable it, then it causes
those crates to emit hard errors. This is overall very annoying, and
I don't know how to unfuck things. So I changed the `cfg` knob to
`docsrs_jiff` which, doesn't quite provide a guarantee, but gets us
closer to being masters of our own destiny.
See also a similar change made to `regex`:
https://github.com/rust-lang/regex/pull/1305
It doesn't make sense to distinguish between a duration and a span in
the "construct timestamp from seconds" benchmark, so just use a more
generic name here.
The code previously assumed that `%C` could never parse more than 2
digits. I believe this restriction was relaxed at some point, but the
assumption was never revisited in this part of the code.
This fixes the panic by forcefully restricting the absolute value of the
parsed century to be in the range `0..=99`.
Fixes#426
This can only happen when malformed TZif data is given. Generally
speaking, Jiff treats TZif data as untrusted and promises not to panic
when given malformed data. So this is something that ought to be fixed.
However, it's rare to use untrusted TZif data in practice.
In this specific case, we were assuming that the time zone designation
list was always terminated by NUL.
Fixes#423
I'm not sure why I didn't add this originally. There is some confusion
around what happens when the hour value is inconsistent with the
meridiem, but that could already happen when parsing. And in that case,
the meridiem takes precedence over the parsed hour value. So we do the
same when formatting too.
Ref https://github.com/tiffany352/rink-rs/pull/210#pullrequestreview-3063953337
PR #397
The first one here was pretty embarrassing: the parser panicked on the
empty string.
The others were a bit subtler and had to do with incorrect construction
of error messages. Arguably, more tests for the error cases here ought
to be added.
Fixes#407
serde_core has just been released. Crates that don't depend on serde derive macros should use it instead of serde, as it speeds up compile times by having the build for the crate itself be independent from serde-derive (in case it were to be enabled by some other crate). See the serde_core docs for more info.
On my Ryzen 5 5500U, building this repo with cargo build --release --features serde went from 10.83s to 8.76s.