GH-77265: Document NaN handling in statistics functions that sort or count (#94676)

* Document NaN handling in functions that sort or count * Update Doc/library/statistics.rst Co-authored-by: Erlend Egeberg Aasland <erlend.aasland@protonmail.com> * Update Doc/library/statistics.rst Co-authored-by: Erlend Egeberg Aasland <erlend.aasland@protonmail.com> * Fix trailing whitespace and rewrap text Co-authored-by: Erlend Egeberg Aasland <erlend.aasland@protonmail.com>
2025-09-27 02:39:58 +00:00 · 2022-07-10 02:40:27 -05:00 · 2022-07-10 02:40:27 -05:00 · ef61b259e3
commit ef61b259e3
parent 264b3ddfd5
1 changed files with 29 additions and 0 deletions
--- a/Doc/library/statistics.rst
+++ b/Doc/library/statistics.rst
@ -35,6 +35,35 @@ and implementation-dependent.  If your input data consists of mixed types,
 you may be able to use :func:`map` to ensure a consistent result, for
 example: ``map(float, input_data)``.
 Some datasets use ``NaN`` (not a number) values to represent missing data.
 Since NaNs have unusual comparison semantics, they cause surprising or
 undefined behaviors in the statistics functions that sort data or that count
 occurrences.  The functions affected are ``median()``, ``median_low()``,
 ``median_high()``, ``median_grouped()``, ``mode()``, ``multimode()``, and
 ``quantiles()``.  The ``NaN`` values should be stripped before calling these
 functions::
    >>> from statistics import median
    >>> from math import isnan
    >>> from itertools import filterfalse
    >>> data = [20.7, float('NaN'),19.2, 18.3, float('NaN'), 14.4]
    >>> sorted(data)  # This has surprising behavior
    [20.7, nan, 14.4, 18.3, 19.2, nan]
    >>> median(data)  # This result is unexpected
    16.35
    >>> sum(map(isnan, data))    # Number of missing values
    2
    >>> clean = list(filterfalse(isnan, data))  # Strip NaN values
    >>> clean
    [20.7, 19.2, 18.3, 14.4]
    >>> sorted(clean)  # Sorting now works as expected
    [14.4, 18.3, 19.2, 20.7]
    >>> median(clean)       # This result is now well defined
    18.75
 Averages and measures of central location
 -----------------------------------------