I wasn't sure why `cargo package` wasn't picking up
`crates/jiff-static/shared`. So I switched to a traditional `src`
layout, which shouldn't be necessary. Indeed, that didn't fix things.
Turns out, I had a bunk `include` rule.
But I wanted to switch to the `src` scheme anyway, so leave that.
For whatever reason, these seem to take a hideously long time to run in
CI. They even take a long time to run locally, *relatively* speaking. In
core-only, `insta` doesn't support snapshotting at all, which is a huge
bummer. So we just tell insta to force the tests to pass and don't do
any updating. So these tests weren't really being run anyway.
I'm not sure what insta is doing here to be honest, and I don't really
understand why insta can't handle the core-only tests. I mean, I am
still importing the standard library when tests are run, even in
core-only mode. Maybe the insta macros assume the standard library
prelude is present or something? IDK.
... so that we can run each piece in its own job in CI.
This creates an obscene number of jobs, but I'm really hoping this cuts
down on the total wall clock time.
We are going to try and break `test` apart in order to speed up CI
builds. I don't want to pollute the root project directory with more
random test scripts, so let's tuck them away for now.
It is tempting to think of this method as just being a shortcut for
`zdt.timestamp().subsec_nanosecond()`, but it actually isn't! It's
returning the fractional seconds on the *civil* datetime, not the
timestamp. These are usually the same for times after the Unix epoch,
but can differ for times before it.
While the original request in #283 asks for `subsec_millisecond()` and
`subsec_microsecond()` on `Zoned` in order to be consistent with
`Timestamp`, I'm going to pass on those for now. In particular, since
these would return the fractional second value from the *civil*
datetime, `subsec_millisecond()` would always be equivalent to
`millisecond()`. `subsec_microsecond()` wouldn't be the same as
`microsecond()` (just like `subsec_nanosecond()` isn't the same as
`nanosecond()`), but I find that this just overall adds to the confusion
of the methods here. And if you do need `subsec_microsecond()`, you can
just do `subsec_nanosecond() / 1_000`.
The reason that these additional methods make sense for `Timestamp` is
that `Timestamp` doesn't have a civil datetime. So there are no
individual `millisecond()` or `microsecond()` units. A `Timestamp` is
closer to a `SignedDuration` than a `civil::DateTime`.
Closes#283
This should help ensure that generated code doesn't get stale.
This is especially pertinent with the new `src/shared` module, which has
to be copied over to `crates/jiff-static/shared` any time a change is
made. Not all changes result in breakage (theoretical or otherwise), so
it's easy to forget to do.
This makes binary search for TZ lookups substantially faster.
This is yet another brutal refactor. Changing anything in POSIX time
zones or TZif handling is now a monster pain in the ass because all
of that code is shared in a very awkward way with `jiff-static`.
Ref #271
This is an easy win that uses 64-bit integers to represent a timestamp
instead of 96-bit integers. This is okay because this reflects what the
actual source IANA time zone database uses.
This makes the binary search lookup a fair a bit faster.
Next I'd like to split `Transition` into three sequences: timestamps,
civil datetimes and the local type index. This should make them as
small as possible and further improve binary search lookups (I hope).
When enabled, this feature will "fatten" TZif data by adding more time
zone transitions. This corresponds to what tzdb's `zic` program does
when `-b fat` is given, except Jiff does it at runtime. If the TZif data
has already been fattened, then this has no effect.
The reason for this is that it smooths out performance differences in
time zone runtime lookups between pre-fattened TZif data and "slim"
TZif data. It is unpredictable whether `/usr/share/zoneinfo` is
actually fat or not, so this helps makes performance more predictable
regardless of what the source TZif data looks like.
This uses about 25% more heap memory in my experiments. For a single
time zone, this is, in an absolute sense, likely insignificant. But if
you have thousands of time zones loaded into memory, it can add up. But
that's a somewhat niche use case. However, this can make binary sizes
bigger when the `jiff-static` proc macro is used.
So while unlikely to matter too much, the `tz-fat` feature can be
disabled if you want to prioritize memory usage and binary size.
Fixes#271
This was yet another absolutely brutal refactor. But in order to
"fatten" up TZif data after parsing, we need to be able to actually use
POSIX time zones in order to compute missing transitions. And in order
to do that, basically the entire POSIX time zone implementation needs to
be in `shared`. And that means no ranged integers. Which in turn means
implementing several datetime algorithms on just primitives.
This was just overall brutal, and I am getting very close to ripping
out ranged integers.
It looks like I never circled back around to fix the error message here
when I added the `SpanRelativeTo::days_are_24_hours()` functionality.
So fix that here.
It's hard to keep `SpanRelativeToKind`'s `Display` impl, because the
error message kinda needs to be rejiggered at a higher level.
Thankfully, this was only used in one place.
Instead of just free functions operating on tuples, we actually give
them names.
Ranged integers keep pissing me off. The fact that I have "primitive"
datetime types and "ranged" datetime types is just absolutely
infuriating and creating a lot of dissonance.
But at least the new composite stuff makes moving back-and-forth a
little easier now. Of course, the composite stuff is also write-once
and read-never. *heavy sigh*
In the next commit, we're going to start moving some more of our
datetime algorithms to `shared`.
The ultimate goal here is to have enough in `shared` that we can handle
POSIX time zones.
This takes some brutalness out of writing routines for converting
to and from composite types over ranged integers (like `civil::Date`).
This whole mess is a consequence of using ranged integers and
simultaneously implementing our low level datetime algorithms on
primitive integers instead of ranged integers. It's a fucking mess.
I think I am steadily marching toward ripping out ranged integers.
Sigh. Very unfortunate.
This was just really bugging me. And if we're going to move more of the
POSIX time zone implementation into `shared`, we might as well just bite
the bullet and do this too.
Now I believe the only parts of POSIX time zones that require `alloc`
are parsing the `:blah` implementation defined strings for the `TZ`
environment variable and error messages. Not that it really matters
much I think.
Now that we eagerly reject unreasonable POSIX time zones, we can
simplify our type definitions. There's no more split between
`PosixTimeZone` and `ReasonablePosixTimeZone`. Everything is
just reasonable.
This makes the POSIX time zone parser reject strings like `EST5EDT`.
That is, a time zone with daylight saving time, but without an explicit
rule stating when daylight saving time becomes active/inactive.
We were already doing this, but more explicitly by calling
`PosixTimeZone::reasonable`, so there is no public API breakage here.
The only difference is that `EST5EDT` will be treated as invalid and
will instead be attempted to be used as an IANA time zone identifier.
(Which, incidentally enough, actually exists. Odd, but I suppose
technically more correct than the current behavior of just rejecting it
outright.)
I did this because it makes the type definitions simpler. There was a
lot of cognitive energy on my part being devoted to parsing unreasonable
POSIX time zones successfully and only later asserting that they are
reasonable through a fallible API. But I don't think this was really
buying us anything, and we should just reject them outright.
Interestingly, PostgreSQL does support these "unreasonable" POSIX time
zones[1]:
> If a daylight-savings abbreviation is given but the transition
> rule field is omitted, the fallback behavior is to use the rule
> M3.2.0,M11.1.0, which corresponds to USA practice as of 2020 (that
> is, spring forward on the second Sunday of March, fall back on
> the first Sunday of November, both transitions occurring at 2AM
> prevailing time). Note that this rule does not give correct USA
> transition dates for years before 2007.
But POSIX has literally nothing to say about it[2], despite providing a
grammar that clearly makes the DST transition rule optional even when a
DST abbreviation is provided. Like it doesn't even mention that it's
unspecified, despite bloviating about how certain abbreviation lengths
lead to unspecified behavior. Why does POSIX suck so bad?
Anyway, it seems like there are really only two choices here. We could
either reject unreasonable time zones as invalid POSIX time zone
strings, or we could just "helpfully" assume a particular DST transition
rule. Jiff isn't legacy software (yet), so maybe don't try to be so
helpful that we assume one country's DST transition rules silently for
everyone in the world.
This commit does the bare minimum to reject these time zones.
The next commit will be the payoff.
[1]: https://www.postgresql.org/docs/current/datetime-posix-timezone-specs.html
[2]: https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html
This became superfluous with a prior refactor and can just be removed.
Previously, we weren't filling in default values eagerly, so this was
set to the default value when the DST offset was absent. But now we fill
in the default values eagerly, so this just is not needed any more.
The jobs that run `./test`, which test a bunch of different feature
combinations, take quite some time. I think a lot of that time is just
spent compiling the different feature combinations. But I suspect we are
also spending a lot of time running doc tests.
So this switches to Rust 2024 *just* for the jobs that run `./test`.
This also documents _why_ we do this and other relevant settings
involved in generating the bundled tzdb data.
To give a sense of how this changes things, consider this Rust program:
```rust
use jiff::{tz::TimeZoneDatabase, Timestamp};
fn main() -> anyhow::Result<()> {
let winter: Timestamp = "2524-01-05T00Z".parse()?;
let summer: Timestamp = "2524-07-05T00Z".parse()?;
let tzdb = TimeZoneDatabase::from_dir("/usr/share/zoneinfo")?;
let tz = tzdb.get("Europe/Dublin")?;
let info = tz.to_offset_info(winter);
println!("winter time from zoneinfo: {info:?}");
let info = tz.to_offset_info(summer);
println!("summer time from zoneinfo: {info:?}");
let tzdb = TimeZoneDatabase::bundled();
let tz = tzdb.get("Europe/Dublin")?;
let info = tz.to_offset_info(winter);
println!(" winter time from bundled: {info:?}");
let info = tz.to_offset_info(summer);
println!(" summer time from bundled: {info:?}");
Ok(())
}
```
Before this PR, on my Archlinux system, I get this output:
```
winter time from zoneinfo: TimeZoneOffsetInfo { offset: 00:00:00, dst: Yes, abbreviation: Borrowed("GMT") }
summer time from zoneinfo: TimeZoneOffsetInfo { offset: 01:00:00, dst: No, abbreviation: Borrowed("IST") }
winter time from bundled: TimeZoneOffsetInfo { offset: 00:00:00, dst: Yes, abbreviation: Borrowed("GMT") }
summer time from bundled: TimeZoneOffsetInfo { offset: 01:00:00, dst: No, abbreviation: Borrowed("IST") }
```
That is, the tzdb from `/usr/share/zoneinfo` on my system matches what
the tzdb from `jiff-tzdb` does. However, on my macOS system, I get
this output:
```
winter time from zoneinfo: TimeZoneOffsetInfo { offset: 00:00:00, dst: No, abbreviation: Borrowed("GMT") }
summer time from zoneinfo: TimeZoneOffsetInfo { offset: 01:00:00, dst: Yes, abbreviation: Borrowed("IST") }
winter time from bundled: TimeZoneOffsetInfo { offset: 00:00:00, dst: Yes, abbreviation: Borrowed("GMT") }
summer time from bundled: TimeZoneOffsetInfo { offset: 01:00:00, dst: No, abbreviation: Borrowed("IST") }
```
That's because `/usr/share/zoneinfo` on macOS (2025-02-27) uses
rearguard data. This PR makes `jiff-tzdb` match macOS, so that the
output of the above program _with_ this PR on my Linux system is now:
```
winter time from zoneinfo: TimeZoneOffsetInfo { offset: 00:00:00, dst: Yes, abbreviation: Borrowed("GMT") }
summer time from zoneinfo: TimeZoneOffsetInfo { offset: 01:00:00, dst: No, abbreviation: Borrowed("IST") }
winter time from bundled: TimeZoneOffsetInfo { offset: 00:00:00, dst: No, abbreviation: Borrowed("GMT") }
summer time from bundled: TimeZoneOffsetInfo { offset: 01:00:00, dst: Yes, abbreviation: Borrowed("IST") }
```
The reason that Jiff is switching to rearguard data is a bit subtle and has to
do with a difference in how the IANA Time Zone Database treats its internal
"daylight saving time" flag and what people in the "real world" consider
"daylight saving time." For example, in the standard distribution of the IANA
Time Zone Database, `Europe/Dublin` has its daylight saving time flag set to
_true_ during Winter and set to _false_ during Summer. The actual time shifts
are the same as, e.g., `Europe/London`, but which one is actually labeled
"daylight saving time" is not.
The IANA Time Zone Database does this for `Europe/Dublin`, presumably, because
_legally_, time during the Summer in Ireland is called `Irish Standard Time`,
and time during the Winter is called `Greenwich Mean Time`. These legal names
are reversed from what is typically the case, where "standard" time is during
the Winter and daylight saving time is during the Summer. The IANA Time Zone
Database implements this tweak in legal language via a "negative daylight
saving time offset." This is somewhat odd, and some consumers of the IANA Time
Zone Database cannot handle it. Thus, the rearguard format was born for,
seemingly, legacy programs.
Jiff can handle negative daylight saving time offsets just fine, but we use the
rearguard format anyway so that the underlying data more accurately reflects
on-the-ground reality for humans living in `Europe/Dublin`. In particular,
using the rearguard data enables [localization of time zone names] to be done
correctly.
Closes#258
[localization of time zone names]: https://github.com/BurntSushi/jiff/issues/258
Many entries in the tzdb are aliases of one another and thus have
identical TZif data. For these cases, we can reuse the same TZif data
instead of duplicating it.
Ref #258, PR #259
Jiff did previously use some `unsafe`, but it was either trivial (easy
to understand UTF-8 validity elision) or not really checkable by miri
(ffi). But with the new pointer tagging representation of a `TimeZone`,
miri checking is more important.
I "tested" this by changing some of the pointer tagging code to do UB.
And miri caught it. So these tests should be covering what we want it to
cover.
Previously, a `TimeZone` was represented by, essentially, an
`Option<Arc<TimeZoneKind>>`. The `None` value was used to represent
the special `UTC` time zone and the niche created by `Arc` allowed the
`TimeZone` to use only a single word of memory. All time zone kinds
other than `UTC` had to allocate an `Arc`.
With #256, this representation has become agitated. In particular, to
support the use of TZif data in core-only environments, we need to
represent variable length data without dynamic memory allocation. And,
moreover, without the indirection of `Arc` that permits a `TimeZone`
to be a single word. Keeping `TimeZone` small is important because a
`Zoned` embeds a `TimeZone`, and a `Zoned` is already somewhat chonky.
So... we have no way to allocate to create indirection. But we want
`TimeZone` to stay small. What do we do? Well, despite time zone data
being variable length, it is usually invariant. And it is, in many
cases, acceptable to embed it into your binary. Hence why the previous
commits spent a bunch of effort doing exactly that: to make all data we
need organized and accessible as `static` data.
But, we still need a way to stuff this new Arc-less representation for
a time zone into our `TimeZone`. Well, we can do pointer tagging! This
is a little tricky to get right, but the recent strict provenance
stabilization (despite us needing a polyfill here for MSRV reasons) has
crystallized this to a point where I'm pretty comfortable with it. The
one hiccup here is that we can't actually soundly look at the address of
a `&'static` pointer in a `const` context. We work around that by making
the one `&'static` pointer correspond to the tag `0`, so that it doesn't
require any explicit tagging.
Moreover, we get other benefits. While `UTC` previously did not require
`Arc` shenanigans, the `Etc/Unknown` and fixed offset time zones did.
But they no longer do. This makes fixed offset time zones faster to work
with, since they don't require the `Arc`-clone/drop dance. More to the
point, time zones embedded with the proc macro don't require the
`Arc`-clone/drop dance either, which makes them faster in some
circumstances as well.
While Jiff did previously use `unsafe`, I think this is its first
helping of non-trivial `unsafe`. Previously, we only used it for FFI and
in one case for skipping a UTF-8 validity check that was pretty easy to
reason about. I did my best to follow std's strict provenance docs and
documented SAFETY conditions as best as I could.
This isn't quite done, but it does parse TZif and emits the correct
Jiff code to construct a `TimeZone` in a const context.
The main thing missing here is a fair bit of polish and a change
to the TimeZone internals to actually support this method of
construction in core-only environments without increasing the size
of `TimeZone` (i.e., pointer tagging).
See the module comments in `shared` for a bit more of an explanation
for why I ended up with this design. The summary is that this new
module will be copied to the proc macro, which will enable jiff to
depend on and re-export the proc macro.
This was a pretty gnarly refactor, because this required separating
the TZif and POSIX time zone parsers out from their internal data
types.