mirror of
https://github.com/python/cpython.git
synced 2025-08-04 17:08:35 +00:00
bpo-38490: statistics: Add covariance, Pearson's correlation, and simple linear regression (#16813)
Co-authored-by: Tymoteusz Wołodźko <twolodzko+gitkraken@gmail.com
This commit is contained in:
parent
172c0f2752
commit
09aa6f914d
6 changed files with 326 additions and 1 deletions
|
@ -68,6 +68,17 @@ tends to deviate from the typical or average values.
|
|||
:func:`variance` Sample variance of data.
|
||||
======================= =============================================
|
||||
|
||||
Statistics for relations between two inputs
|
||||
-------------------------------------------
|
||||
|
||||
These functions calculate statistics regarding relations between two inputs.
|
||||
|
||||
========================= =====================================================
|
||||
:func:`covariance` Sample covariance for two variables.
|
||||
:func:`correlation` Pearson's correlation coefficient for two variables.
|
||||
:func:`linear_regression` Intercept and slope for simple linear regression.
|
||||
========================= =====================================================
|
||||
|
||||
|
||||
Function details
|
||||
----------------
|
||||
|
@ -566,6 +577,98 @@ However, for reading convenience, most of the examples show sorted sequences.
|
|||
|
||||
.. versionadded:: 3.8
|
||||
|
||||
.. function:: covariance(x, y, /)
|
||||
|
||||
Return the sample covariance of two inputs *x* and *y*. Covariance
|
||||
is a measure of the joint variability of two inputs.
|
||||
|
||||
Both inputs must be of the same length (no less than two), otherwise
|
||||
:exc:`StatisticsError` is raised.
|
||||
|
||||
Examples:
|
||||
|
||||
.. doctest::
|
||||
|
||||
>>> x = [1, 2, 3, 4, 5, 6, 7, 8, 9]
|
||||
>>> y = [1, 2, 3, 1, 2, 3, 1, 2, 3]
|
||||
>>> covariance(x, y)
|
||||
0.75
|
||||
>>> z = [9, 8, 7, 6, 5, 4, 3, 2, 1]
|
||||
>>> covariance(x, z)
|
||||
-7.5
|
||||
>>> covariance(z, x)
|
||||
-7.5
|
||||
|
||||
.. versionadded:: 3.10
|
||||
|
||||
.. function:: correlation(x, y, /)
|
||||
|
||||
Return the `Pearson's correlation coefficient
|
||||
<https://en.wikipedia.org/wiki/Pearson_correlation_coefficient>`_
|
||||
for two inputs. Pearson's correlation coefficient *r* takes values
|
||||
between -1 and +1. It measures the strength and direction of the linear
|
||||
relationship, where +1 means very strong, positive linear relationship,
|
||||
-1 very strong, negative linear relationship, and 0 no linear relationship.
|
||||
|
||||
Both inputs must be of the same length (no less than two), and need
|
||||
not to be constant, otherwise :exc:`StatisticsError` is raised.
|
||||
|
||||
Examples:
|
||||
|
||||
.. doctest::
|
||||
|
||||
>>> x = [1, 2, 3, 4, 5, 6, 7, 8, 9]
|
||||
>>> y = [9, 8, 7, 6, 5, 4, 3, 2, 1]
|
||||
>>> correlation(x, x)
|
||||
1.0
|
||||
>>> correlation(x, y)
|
||||
-1.0
|
||||
|
||||
.. versionadded:: 3.10
|
||||
|
||||
.. function:: linear_regression(regressor, dependent_variable)
|
||||
|
||||
Return the intercept and slope of `simple linear regression
|
||||
<https://en.wikipedia.org/wiki/Simple_linear_regression>`_
|
||||
parameters estimated using ordinary least squares. Simple linear
|
||||
regression describes relationship between *regressor* and
|
||||
*dependent variable* in terms of linear function:
|
||||
|
||||
*dependent_variable = intercept + slope \* regressor + noise*
|
||||
|
||||
where ``intercept`` and ``slope`` are the regression parameters that are
|
||||
estimated, and noise term is an unobserved random variable, for the
|
||||
variability of the data that was not explained by the linear regression
|
||||
(it is equal to the difference between prediction and the actual values
|
||||
of dependent variable).
|
||||
|
||||
Both inputs must be of the same length (no less than two), and regressor
|
||||
needs not to be constant, otherwise :exc:`StatisticsError` is raised.
|
||||
|
||||
For example, if we took the data on the data on `release dates of the Monty
|
||||
Python films <https://en.wikipedia.org/wiki/Monty_Python#Films>`_, and used
|
||||
it to predict the cumulative number of Monty Python films produced, we could
|
||||
predict what would be the number of films they could have made till year
|
||||
2019, assuming that they kept the pace.
|
||||
|
||||
.. doctest::
|
||||
|
||||
>>> year = [1971, 1975, 1979, 1982, 1983]
|
||||
>>> films_total = [1, 2, 3, 4, 5]
|
||||
>>> intercept, slope = linear_regression(year, films_total)
|
||||
>>> round(intercept + slope * 2019)
|
||||
16
|
||||
|
||||
We could also use it to "predict" how many Monty Python films existed when
|
||||
Brian Cohen was born.
|
||||
|
||||
.. doctest::
|
||||
|
||||
>>> round(intercept + slope * 1)
|
||||
-610
|
||||
|
||||
.. versionadded:: 3.10
|
||||
|
||||
|
||||
Exceptions
|
||||
----------
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue