mirror of
https://github.com/python/cpython.git
synced 2025-09-27 02:39:58 +00:00
GH-77265: Document NaN handling in statistics functions that sort or count (#94676)
* Document NaN handling in functions that sort or count * Update Doc/library/statistics.rst Co-authored-by: Erlend Egeberg Aasland <erlend.aasland@protonmail.com> * Update Doc/library/statistics.rst Co-authored-by: Erlend Egeberg Aasland <erlend.aasland@protonmail.com> * Fix trailing whitespace and rewrap text Co-authored-by: Erlend Egeberg Aasland <erlend.aasland@protonmail.com>
This commit is contained in:
parent
264b3ddfd5
commit
ef61b259e3
1 changed files with 29 additions and 0 deletions
|
@ -35,6 +35,35 @@ and implementation-dependent. If your input data consists of mixed types,
|
||||||
you may be able to use :func:`map` to ensure a consistent result, for
|
you may be able to use :func:`map` to ensure a consistent result, for
|
||||||
example: ``map(float, input_data)``.
|
example: ``map(float, input_data)``.
|
||||||
|
|
||||||
|
Some datasets use ``NaN`` (not a number) values to represent missing data.
|
||||||
|
Since NaNs have unusual comparison semantics, they cause surprising or
|
||||||
|
undefined behaviors in the statistics functions that sort data or that count
|
||||||
|
occurrences. The functions affected are ``median()``, ``median_low()``,
|
||||||
|
``median_high()``, ``median_grouped()``, ``mode()``, ``multimode()``, and
|
||||||
|
``quantiles()``. The ``NaN`` values should be stripped before calling these
|
||||||
|
functions::
|
||||||
|
|
||||||
|
>>> from statistics import median
|
||||||
|
>>> from math import isnan
|
||||||
|
>>> from itertools import filterfalse
|
||||||
|
|
||||||
|
>>> data = [20.7, float('NaN'),19.2, 18.3, float('NaN'), 14.4]
|
||||||
|
>>> sorted(data) # This has surprising behavior
|
||||||
|
[20.7, nan, 14.4, 18.3, 19.2, nan]
|
||||||
|
>>> median(data) # This result is unexpected
|
||||||
|
16.35
|
||||||
|
|
||||||
|
>>> sum(map(isnan, data)) # Number of missing values
|
||||||
|
2
|
||||||
|
>>> clean = list(filterfalse(isnan, data)) # Strip NaN values
|
||||||
|
>>> clean
|
||||||
|
[20.7, 19.2, 18.3, 14.4]
|
||||||
|
>>> sorted(clean) # Sorting now works as expected
|
||||||
|
[14.4, 18.3, 19.2, 20.7]
|
||||||
|
>>> median(clean) # This result is now well defined
|
||||||
|
18.75
|
||||||
|
|
||||||
|
|
||||||
Averages and measures of central location
|
Averages and measures of central location
|
||||||
-----------------------------------------
|
-----------------------------------------
|
||||||
|
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue