gh-100833: Remove 'volatile' qualifiers in fsum algorithm (#100845)

This PR removes the `volatile` qualifier on various intermediate quantities
in the `math.fsum` implementation, and updates the notes preceding the
algorithm accordingly (as well as fixing some of the exsting notes). See
the linked issue #100833 for discussion.
This commit is contained in:
Mark Dickinson 2023-01-08 19:40:15 +00:00 committed by GitHub
parent b139bcd892
commit 87d3bd0e02
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
2 changed files with 22 additions and 21 deletions

View file

@ -0,0 +1 @@
Speed up :func:`math.fsum` by removing defensive ``volatile`` qualifiers.

View file

@ -1358,30 +1358,30 @@ FUNC1(tanh, tanh, 0,
Dickinson's post at <http://bugs.python.org/file10357/msum4.py>. Dickinson's post at <http://bugs.python.org/file10357/msum4.py>.
See those links for more details, proofs and other references. See those links for more details, proofs and other references.
Note 1: IEEE 754R floating point semantics are assumed, Note 1: IEEE 754 floating-point semantics with a rounding mode of
but the current implementation does not re-establish special roundTiesToEven are assumed.
value semantics across iterations (i.e. handling -Inf + Inf).
Note 2: No provision is made for intermediate overflow handling; Note 2: No provision is made for intermediate overflow handling;
therefore, sum([1e+308, 1e-308, 1e+308]) returns 1e+308 while therefore, fsum([1e+308, -1e+308, 1e+308]) returns 1e+308 while
sum([1e+308, 1e+308, 1e-308]) raises an OverflowError due to the fsum([1e+308, 1e+308, -1e+308]) raises an OverflowError due to the
overflow of the first partial sum. overflow of the first partial sum.
Note 3: The intermediate values lo, yr, and hi are declared volatile so Note 3: The algorithm has two potential sources of fragility. First, C
aggressive compilers won't algebraically reduce lo to always be exactly 0.0. permits arithmetic operations on `double`s to be performed in an
Also, the volatile declaration forces the values to be stored in memory as intermediate format whose range and precision may be greater than those of
regular doubles instead of extended long precision (80-bit) values. This `double` (see for example C99 §5.2.4.2.2, paragraph 8). This can happen for
prevents double rounding because any addition or subtraction of two doubles example on machines using the now largely historical x87 FPUs. In this case,
can be resolved exactly into double-sized hi and lo values. As long as the `fsum` can produce incorrect results. If `FLT_EVAL_METHOD` is `0` or `1`, or
hi value gets forced into a double before yr and lo are computed, the extra `FLT_EVAL_METHOD` is `2` and `long double` is identical to `double`, then we
bits in downstream extended precision operations (x87 for example) will be should be safe from this source of errors. Second, an aggressively
exactly zero and therefore can be losslessly stored back into a double, optimizing compiler can re-associate operations so that (for example) the
thereby preventing double rounding. statement `yr = hi - x;` is treated as `yr = (x + y) - x` and then
re-associated as `yr = y + (x - x)`, giving `y = yr` and `lo = 0.0`. That
re-association would be in violation of the C standard, and should not occur
except possibly in the presence of unsafe optimizations (e.g., -ffast-math,
-fassociative-math). Such optimizations should be avoided for this module.
Note 4: A similar implementation is in Modules/cmathmodule.c. Note 4: The signature of math.fsum() differs from builtins.sum()
Be sure to update both when making changes.
Note 5: The signature of math.fsum() differs from builtins.sum()
because the start argument doesn't make sense in the context of because the start argument doesn't make sense in the context of
accurate summation. Since the partials table is collapsed before accurate summation. Since the partials table is collapsed before
returning a result, sum(seq2, start=sum(seq1)) may not equal the returning a result, sum(seq2, start=sum(seq1)) may not equal the
@ -1467,7 +1467,7 @@ math_fsum(PyObject *module, PyObject *seq)
Py_ssize_t i, j, n = 0, m = NUM_PARTIALS; Py_ssize_t i, j, n = 0, m = NUM_PARTIALS;
double x, y, t, ps[NUM_PARTIALS], *p = ps; double x, y, t, ps[NUM_PARTIALS], *p = ps;
double xsave, special_sum = 0.0, inf_sum = 0.0; double xsave, special_sum = 0.0, inf_sum = 0.0;
volatile double hi, yr, lo; double hi, yr, lo;
iter = PyObject_GetIter(seq); iter = PyObject_GetIter(seq);
if (iter == NULL) if (iter == NULL)