gh-100833: Remove 'volatile' qualifiers in fsum algorithm (#100845)

This PR removes the `volatile` qualifier on various intermediate quantities in the `math.fsum` implementation, and updates the notes preceding the algorithm accordingly (as well as fixing some of the exsting notes). See the linked issue #100833 for discussion.
2025-09-26 18:29:57 +00:00 · 2023-01-08 19:40:15 +00:00 · 2023-01-08 19:40:15 +00:00 · 87d3bd0e02
commit 87d3bd0e02
parent b139bcd892
2 changed files with 22 additions and 21 deletions
--- a/Misc/NEWS.d/next/Library/2023-01-08-12-10-17.gh-issue-100833.f6cT7E.rst
+++ b/Misc/NEWS.d/next/Library/2023-01-08-12-10-17.gh-issue-100833.f6cT7E.rst
@ -0,0 +1 @@
 Speed up :func:`math.fsum` by removing defensive ``volatile`` qualifiers.
--- a/Modules/mathmodule.c
+++ b/Modules/mathmodule.c
@ -1358,30 +1358,30 @@ FUNC1(tanh, tanh, 0,
   Dickinson's post at <http://bugs.python.org/file10357/msum4.py>.
   See those links for more details, proofs and other references.
-   Note 1: IEEE 754R floating point semantics are assumed,
+   Note 1: IEEE 754 floating-point semantics with a rounding mode of
-   but the current implementation does not re-establish special
+   roundTiesToEven are assumed.
   value semantics across iterations (i.e. handling -Inf + Inf).
-   Note 2:  No provision is made for intermediate overflow handling;
+   Note 2: No provision is made for intermediate overflow handling;
-   therefore, sum([1e+308, 1e-308, 1e+308]) returns 1e+308 while
+   therefore, fsum([1e+308, -1e+308, 1e+308]) returns 1e+308 while
-   sum([1e+308, 1e+308, 1e-308]) raises an OverflowError due to the
+   fsum([1e+308, 1e+308, -1e+308]) raises an OverflowError due to the
   overflow of the first partial sum.
-   Note 3: The intermediate values lo, yr, and hi are declared volatile so
+   Note 3: The algorithm has two potential sources of fragility. First, C
-   aggressive compilers won't algebraically reduce lo to always be exactly 0.0.
+   permits arithmetic operations on `double`s to be performed in an
-   Also, the volatile declaration forces the values to be stored in memory as
+   intermediate format whose range and precision may be greater than those of
-   regular doubles instead of extended long precision (80-bit) values.  This
+   `double` (see for example C99 §5.2.4.2.2, paragraph 8). This can happen for
-   prevents double rounding because any addition or subtraction of two doubles
+   example on machines using the now largely historical x87 FPUs. In this case,
-   can be resolved exactly into double-sized hi and lo values.  As long as the
+   `fsum` can produce incorrect results. If `FLT_EVAL_METHOD` is `0` or `1`, or
-   hi value gets forced into a double before yr and lo are computed, the extra
+   `FLT_EVAL_METHOD` is `2` and `long double` is identical to `double`, then we
-   bits in downstream extended precision operations (x87 for example) will be
+   should be safe from this source of errors. Second, an aggressively
-   exactly zero and therefore can be losslessly stored back into a double,
+   optimizing compiler can re-associate operations so that (for example) the
-   thereby preventing double rounding.
+   statement `yr = hi - x;` is treated as `yr = (x + y) - x` and then
   re-associated as `yr = y + (x - x)`, giving `y = yr` and `lo = 0.0`. That
   re-association would be in violation of the C standard, and should not occur
   except possibly in the presence of unsafe optimizations (e.g., -ffast-math,
   -fassociative-math). Such optimizations should be avoided for this module.
-   Note 4: A similar implementation is in Modules/cmathmodule.c.
+   Note 4: The signature of math.fsum() differs from builtins.sum()
   Be sure to update both when making changes.
   Note 5: The signature of math.fsum() differs from builtins.sum()
   because the start argument doesn't make sense in the context of
   accurate summation.  Since the partials table is collapsed before
   returning a result, sum(seq2, start=sum(seq1)) may not equal the
@ -1467,7 +1467,7 @@ math_fsum(PyObject *module, PyObject *seq)
    Py_ssize_t i, j, n = 0, m = NUM_PARTIALS;
    double x, y, t, ps[NUM_PARTIALS], *p = ps;
    double xsave, special_sum = 0.0, inf_sum = 0.0;
-    volatile double hi, yr, lo;
+    double hi, yr, lo;
    iter = PyObject_GetIter(seq);
    if (iter == NULL)
		`@ -0,0 +1 @@`
							Speed up :func:`math.fsum` by removing defensive ``volatile`` qualifiers.