mirror of
https://github.com/python/cpython.git
synced 2025-09-25 17:59:57 +00:00
[3.12] Add another example to the statistics docs (GH-107904) (#107941)
Add another example to the statistics docs (GH-107904)
(cherry picked from commit 2b6dc2accc
)
Co-authored-by: Raymond Hettinger <rhettinger@users.noreply.github.com>
This commit is contained in:
parent
ba8ab4e8ae
commit
e8963a86ea
2 changed files with 57 additions and 0 deletions
BIN
Doc/library/kde_example.png
Normal file
BIN
Doc/library/kde_example.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 324 KiB |
|
@ -922,6 +922,10 @@ of applications in statistics.
|
||||||
:class:`NormalDist` Examples and Recipes
|
:class:`NormalDist` Examples and Recipes
|
||||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
|
||||||
|
Classic probability problems
|
||||||
|
****************************
|
||||||
|
|
||||||
:class:`NormalDist` readily solves classic probability problems.
|
:class:`NormalDist` readily solves classic probability problems.
|
||||||
|
|
||||||
For example, given `historical data for SAT exams
|
For example, given `historical data for SAT exams
|
||||||
|
@ -947,6 +951,10 @@ Find the `quartiles <https://en.wikipedia.org/wiki/Quartile>`_ and `deciles
|
||||||
>>> list(map(round, sat.quantiles(n=10)))
|
>>> list(map(round, sat.quantiles(n=10)))
|
||||||
[810, 896, 958, 1011, 1060, 1109, 1162, 1224, 1310]
|
[810, 896, 958, 1011, 1060, 1109, 1162, 1224, 1310]
|
||||||
|
|
||||||
|
|
||||||
|
Monte Carlo inputs for simulations
|
||||||
|
**********************************
|
||||||
|
|
||||||
To estimate the distribution for a model than isn't easy to solve
|
To estimate the distribution for a model than isn't easy to solve
|
||||||
analytically, :class:`NormalDist` can generate input samples for a `Monte
|
analytically, :class:`NormalDist` can generate input samples for a `Monte
|
||||||
Carlo simulation <https://en.wikipedia.org/wiki/Monte_Carlo_method>`_:
|
Carlo simulation <https://en.wikipedia.org/wiki/Monte_Carlo_method>`_:
|
||||||
|
@ -963,6 +971,9 @@ Carlo simulation <https://en.wikipedia.org/wiki/Monte_Carlo_method>`_:
|
||||||
>>> quantiles(map(model, X, Y, Z)) # doctest: +SKIP
|
>>> quantiles(map(model, X, Y, Z)) # doctest: +SKIP
|
||||||
[1.4591308524824727, 1.8035946855390597, 2.175091447274739]
|
[1.4591308524824727, 1.8035946855390597, 2.175091447274739]
|
||||||
|
|
||||||
|
Approximating binomial distributions
|
||||||
|
************************************
|
||||||
|
|
||||||
Normal distributions can be used to approximate `Binomial
|
Normal distributions can be used to approximate `Binomial
|
||||||
distributions <https://mathworld.wolfram.com/BinomialDistribution.html>`_
|
distributions <https://mathworld.wolfram.com/BinomialDistribution.html>`_
|
||||||
when the sample size is large and when the probability of a successful
|
when the sample size is large and when the probability of a successful
|
||||||
|
@ -1000,6 +1011,10 @@ probability that the Python room will stay within its capacity limits?
|
||||||
>>> mean(trial() <= k for i in range(10_000))
|
>>> mean(trial() <= k for i in range(10_000))
|
||||||
0.8398
|
0.8398
|
||||||
|
|
||||||
|
|
||||||
|
Naive bayesian classifier
|
||||||
|
*************************
|
||||||
|
|
||||||
Normal distributions commonly arise in machine learning problems.
|
Normal distributions commonly arise in machine learning problems.
|
||||||
|
|
||||||
Wikipedia has a `nice example of a Naive Bayesian Classifier
|
Wikipedia has a `nice example of a Naive Bayesian Classifier
|
||||||
|
@ -1054,6 +1069,48 @@ The final prediction goes to the largest posterior. This is known as the
|
||||||
'female'
|
'female'
|
||||||
|
|
||||||
|
|
||||||
|
Kernel density estimation
|
||||||
|
*************************
|
||||||
|
|
||||||
|
It is possible to estimate a continuous probability density function
|
||||||
|
from a fixed number of discrete samples.
|
||||||
|
|
||||||
|
The basic idea is to smooth the data using `a kernel function such as a
|
||||||
|
normal distribution, triangular distribution, or uniform distribution
|
||||||
|
<https://en.wikipedia.org/wiki/Kernel_(statistics)#Kernel_functions_in_common_use>`_.
|
||||||
|
The degree of smoothing is controlled by a single
|
||||||
|
parameter, ``h``, representing the variance of the kernel function.
|
||||||
|
|
||||||
|
.. testcode::
|
||||||
|
|
||||||
|
import math
|
||||||
|
|
||||||
|
def kde_normal(sample, h):
|
||||||
|
"Create a continous probability density function from a sample."
|
||||||
|
# Smooth the sample with a normal distribution of variance h.
|
||||||
|
kernel_h = NormalDist(0.0, math.sqrt(h)).pdf
|
||||||
|
n = len(sample)
|
||||||
|
def pdf(x):
|
||||||
|
return sum(kernel_h(x - x_i) for x_i in sample) / n
|
||||||
|
return pdf
|
||||||
|
|
||||||
|
`Wikipedia has an example
|
||||||
|
<https://en.wikipedia.org/wiki/Kernel_density_estimation#Example>`_
|
||||||
|
where we can use the ``kde_normal()`` recipe to generate and plot
|
||||||
|
a probability density function estimated from a small sample:
|
||||||
|
|
||||||
|
.. doctest::
|
||||||
|
|
||||||
|
>>> sample = [-2.1, -1.3, -0.4, 1.9, 5.1, 6.2]
|
||||||
|
>>> f_hat = kde_normal(sample, h=2.25)
|
||||||
|
>>> xarr = [i/100 for i in range(-750, 1100)]
|
||||||
|
>>> yarr = [f_hat(x) for x in xarr]
|
||||||
|
|
||||||
|
The points in ``xarr`` and ``yarr`` can be used to make a PDF plot:
|
||||||
|
|
||||||
|
.. image:: kde_example.png
|
||||||
|
:alt: Scatter plot of the estimated probability density function.
|
||||||
|
|
||||||
..
|
..
|
||||||
# This modelines must appear within the last ten lines of the file.
|
# This modelines must appear within the last ten lines of the file.
|
||||||
kate: indent-width 3; remove-trailing-space on; replace-tabs on; encoding utf-8;
|
kate: indent-width 3; remove-trailing-space on; replace-tabs on; encoding utf-8;
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue