bpo-38490: statistics: Add covariance, Pearson's correlation, and simple linear regression (#16813)

Co-authored-by: Tymoteusz Wołodźko <twolodzko+gitkraken@gmail.com
This commit is contained in:
Tymoteusz Wołodźko 2021-04-25 13:45:09 +02:00 committed by GitHub
parent 172c0f2752
commit 09aa6f914d
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
6 changed files with 326 additions and 1 deletions

View file

@ -68,6 +68,17 @@ tends to deviate from the typical or average values.
:func:`variance` Sample variance of data.
======================= =============================================
Statistics for relations between two inputs
-------------------------------------------
These functions calculate statistics regarding relations between two inputs.
========================= =====================================================
:func:`covariance` Sample covariance for two variables.
:func:`correlation` Pearson's correlation coefficient for two variables.
:func:`linear_regression` Intercept and slope for simple linear regression.
========================= =====================================================
Function details
----------------
@ -566,6 +577,98 @@ However, for reading convenience, most of the examples show sorted sequences.
.. versionadded:: 3.8
.. function:: covariance(x, y, /)
Return the sample covariance of two inputs *x* and *y*. Covariance
is a measure of the joint variability of two inputs.
Both inputs must be of the same length (no less than two), otherwise
:exc:`StatisticsError` is raised.
Examples:
.. doctest::
>>> x = [1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> y = [1, 2, 3, 1, 2, 3, 1, 2, 3]
>>> covariance(x, y)
0.75
>>> z = [9, 8, 7, 6, 5, 4, 3, 2, 1]
>>> covariance(x, z)
-7.5
>>> covariance(z, x)
-7.5
.. versionadded:: 3.10
.. function:: correlation(x, y, /)
Return the `Pearson's correlation coefficient
<https://en.wikipedia.org/wiki/Pearson_correlation_coefficient>`_
for two inputs. Pearson's correlation coefficient *r* takes values
between -1 and +1. It measures the strength and direction of the linear
relationship, where +1 means very strong, positive linear relationship,
-1 very strong, negative linear relationship, and 0 no linear relationship.
Both inputs must be of the same length (no less than two), and need
not to be constant, otherwise :exc:`StatisticsError` is raised.
Examples:
.. doctest::
>>> x = [1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> y = [9, 8, 7, 6, 5, 4, 3, 2, 1]
>>> correlation(x, x)
1.0
>>> correlation(x, y)
-1.0
.. versionadded:: 3.10
.. function:: linear_regression(regressor, dependent_variable)
Return the intercept and slope of `simple linear regression
<https://en.wikipedia.org/wiki/Simple_linear_regression>`_
parameters estimated using ordinary least squares. Simple linear
regression describes relationship between *regressor* and
*dependent variable* in terms of linear function:
*dependent_variable = intercept + slope \* regressor + noise*
where ``intercept`` and ``slope`` are the regression parameters that are
estimated, and noise term is an unobserved random variable, for the
variability of the data that was not explained by the linear regression
(it is equal to the difference between prediction and the actual values
of dependent variable).
Both inputs must be of the same length (no less than two), and regressor
needs not to be constant, otherwise :exc:`StatisticsError` is raised.
For example, if we took the data on the data on `release dates of the Monty
Python films <https://en.wikipedia.org/wiki/Monty_Python#Films>`_, and used
it to predict the cumulative number of Monty Python films produced, we could
predict what would be the number of films they could have made till year
2019, assuming that they kept the pace.
.. doctest::
>>> year = [1971, 1975, 1979, 1982, 1983]
>>> films_total = [1, 2, 3, 4, 5]
>>> intercept, slope = linear_regression(year, films_total)
>>> round(intercept + slope * 2019)
16
We could also use it to "predict" how many Monty Python films existed when
Brian Cohen was born.
.. doctest::
>>> round(intercept + slope * 1)
-610
.. versionadded:: 3.10
Exceptions
----------