Fixed #26608 -- Added support for window expressions (OVER clause).

Thanks Josh Smeaton, Mariusz Felisiak, Sergey Fedoseev, Simon Charettes,
Adam Chainz/Johnson and Tim Graham for comments and reviews and Jamie
Cockburn for initial patch.
This commit is contained in:
Mads Jensen 2017-09-18 15:42:29 +02:00 committed by Tim Graham
parent da1ba03f1d
commit d549b88050
25 changed files with 1627 additions and 8 deletions

View file

@ -819,3 +819,132 @@ Usage example::
'minute': 'minute': datetime.datetime(2014, 6, 15, 14, 30, tzinfo=<UTC>),
'second': datetime.datetime(2014, 6, 15, 14, 30, 50, tzinfo=<UTC>)
}
.. _window-functions:
Window functions
================
.. versionadded:: 2.0
There are a number of functions to use in a
:class:`~django.db.models.expressions.Window` expression for computing the rank
of elements or the :class:`Ntile` of some rows.
``CumeDist``
------------
.. class:: CumeDist(*expressions, **extra)
Calculates the cumulative distribution of a value within a window or partition.
The cumulative distribution is defined as the number of rows preceding or
peered with the current row divided by the total number of rows in the frame.
``DenseRank``
-------------
.. class:: DenseRank(*expressions, **extra)
Equivalent to :class:`Rank` but does not have gaps.
``FirstValue``
--------------
.. class:: FirstValue(expression, **extra)
Returns the value evaluated at the row that's the first row of the window
frame, or ``None`` if no such value exists.
``Lag``
-------
.. class:: Lag(expression, offset=1, default=None, **extra)
Calculates the value offset by ``offset``, and if no row exists there, returns
``default``.
``default`` must have the same type as the ``expression``, however, this is
only validated by the database and not in Python.
``LastValue``
-------------
.. class:: LastValue(expression, **extra)
Comparable to :class:`FirstValue`, it calculates the last value in a given
frame clause.
``Lead``
--------
.. class:: Lead(expression, offset=1, default=None, **extra)
Calculates the leading value in a given :ref:`frame <window-frames>`. Both
``offset`` and ``default`` are evaluated with respect to the current row.
``default`` must have the same type as the ``expression``, however, this is
only validated by the database and not in Python.
``NthValue``
------------
.. class:: NthValue(expression, nth=1, **extra)
Computes the row relative to the offset ``nth`` (must be a positive value)
within the window. Returns ``None`` if no row exists.
Some databases may handle a nonexistent nth-value differently. For example,
Oracle returns an empty string rather than ``None`` for character-based
expressions. Django doesn't do any conversions in these cases.
``Ntile``
---------
.. class:: Ntile(num_buckets=1, **extra)
Calculates a partition for each of the rows in the frame clause, distributing
numbers as evenly as possible between 1 and ``num_buckets``. If the rows don't
divide evenly into a number of buckets, one or more buckets will be represented
more frequently.
``PercentRank``
---------------
.. class:: PercentRank(*expressions, **extra)
Computes the percentile rank of the rows in the frame clause. This
computation is equivalent to evaluating::
(rank - 1) / (total rows - 1)
The following table explains the calculation for the percentile rank of a row:
===== ===== ==== ============ ============
Row # Value Rank Calculation Percent Rank
===== ===== ==== ============ ============
1 15 1 (1-1)/(7-1) 0.0000
2 20 2 (2-1)/(7-1) 0.1666
3 20 2 (2-1)/(7-1) 0.1666
4 20 2 (2-1)/(7-1) 0.1666
5 30 5 (5-1)/(7-1) 0.6666
6 30 5 (5-1)/(7-1) 0.6666
7 40 7 (7-1)/(7-1) 1.0000
===== ===== ==== ============ ============
``Rank``
--------
.. class:: Rank(*expressions, **extra)
Comparable to ``RowNumber``, this function ranks rows in the window. The
computed rank contains gaps. Use :class:`DenseRank` to compute rank without
gaps.
``RowNumber``
-------------
.. class:: RowNumber(*expressions, **extra)
Computes the row number according to the ordering of either the frame clause
or the ordering of the whole query if there is no partitioning of the
:ref:`window frame <window-frames>`.

View file

@ -353,6 +353,13 @@ The ``Aggregate`` API is as follows:
generated. Specifically, the ``function`` will be interpolated as the
``function`` placeholder within :attr:`template`. Defaults to ``None``.
.. attribute:: window_compatible
.. versionadded:: 2.0
Defaults to ``True`` since most aggregate functions can be used as the
source expression in :class:`~django.db.models.expressions.Window`.
The ``expression`` argument can be the name of a field on the model, or another
expression. It will be converted to a string and used as the ``expressions``
placeholder within the ``template``.
@ -649,6 +656,184 @@ should avoid them if possible.
force you to acknowledge that you're not interpolating your SQL with user
provided data.
Window functions
----------------
.. versionadded:: 2.0
Window functions provide a way to apply functions on partitions. Unlike a
normal aggregation function which computes a final result for each set defined
by the group by, window functions operate on :ref:`frames <window-frames>` and
partitions, and compute the result for each row.
You can specify multiple windows in the same query which in Django ORM would be
equivalent to including multiple expressions in a :doc:`QuerySet.annotate()
</topics/db/aggregation>` call. The ORM doesn't make use of named windows,
instead they are part of the selected columns.
.. class:: Window(expression, partition_by=None, order_by=None, frame=None, output_field=None)
.. attribute:: filterable
Defaults to ``False``. The SQL standard disallows referencing window
functions in the ``WHERE`` clause and Django raises an exception when
constructing a ``QuerySet`` that would do that.
.. attribute:: template
Defaults to ``%(expression)s OVER (%(window)s)'``. If only the
``expression`` argument is provided, the window clause will be blank.
The ``Window`` class is the main expression for an ``OVER`` clause.
The ``expression`` argument is either a :ref:`window function
<window-functions>`, an :ref:`aggregate function <aggregation-functions>`, or
an expression that's compatible in a window clause.
The ``partition_by`` argument is a list of expressions (column names should be
wrapped in an ``F``-object) that control the partitioning of the rows.
Partitioning narrows which rows are used to compute the result set.
The ``output_field`` is specified either as an argument or by the expression.
The ``order_by`` argument accepts a sequence of expressions on which you can
call :meth:`~django.db.models.Expression.asc` and
:meth:`~django.db.models.Expression.desc`. The ordering controls the order in
which the expression is applied. For example, if you sum over the rows in a
partition, the first result is just the value of the first row, the second is
the sum of first and second row.
The ``frame`` parameter specifies which other rows that should be used in the
computation. See :ref:`window-frames` for details.
For example, to annotate each movie with the average rating for the movies by
the same studio in the same genre and release year::
>>> from django.db.models import Avg, ExtractYear, F, Window
>>> Movie.objects.annotate(
>>> avg_rating=Window(
>>> expression=Avg('rating'),
>>> partition_by=[F('studio'), F('genre')],
>>> order_by=ExtractYear('released').asc(),
>>> ),
>>> )
This makes it easy to check if a movie is rated better or worse than its peers.
You may want to apply multiple expressions over the same window, i.e., the
same partition and frame. For example, you could modify the previous example
to also include the best and worst rating in each movie's group (same studio,
genre, and release year) by using three window functions in the same query. The
partition and ordering from the previous example is extracted into a dictionary
to reduce repetition::
>>> from django.db.models import Avg, ExtractYear, F, Max, Min, Window
>>> window = {
>>> 'partition': [F('studio'), F('genre')],
>>> 'order_by': ExtractYear('released').asc(),
>>> }
>>> Movie.objects.annotate(
>>> avg_rating=Window(
>>> expression=Avg('rating'), **window,
>>> ),
>>> best=Window(
>>> expression=Max('rating'), **window,
>>> ),
>>> worst=Window(
>>> expression=Min('rating'), **window,
>>> ),
>>> )
Among Django's built-in database backends, MySQL 8.0.2+, PostgreSQL, and Oracle
support window expressions. Support for different window expression features
varies among the different databases. For example, the options in
:meth:`~django.db.models.Expression.asc` and
:meth:`~django.db.models.Expression.desc` may not be supported. Consult the
documentation for your database as needed.
.. _window-frames:
Frames
~~~~~~
For a window frame, you can choose either a range-based sequence of rows or an
ordinary sequence of rows.
.. class:: ValueRange(start=None, end=None)
.. attribute:: frame_type
This attribute is set to ``'RANGE'``.
PostgreSQL has limited support for ``ValueRange`` and only supports use of
the standard start and end points, such as ``CURRENT ROW`` and ``UNBOUNDED
FOLLOWING``.
.. class:: RowRange(start=None, end=None)
.. attribute:: frame_type
This attribute is set to ``'ROWS'``.
Both classes return SQL with the template::
%(frame_type)s BETWEEN %(start)s AND %(end)s
Frames narrow the rows that are used for computing the result. They shift from
some start point to some specified end point. Frames can be used with and
without partitions, but it's often a good idea to specify an ordering of the
window to ensure a deterministic result. In a frame, a peer in a frame is a row
with an equivalent value, or all rows if an ordering clause isn't present.
The default starting point for a frame is ``UNBOUNDED PRECEDING`` which is the
first row of the partition. The end point is always explicitly included in the
SQL generated by the ORM and is by default ``UNBOUNDED FOLLOWING``. The default
frame includes all rows from the partition to the last row in the set.
The accepted values for the ``start`` and ``end`` arguments are ``None``, an
integer, or zero. A negative integer for ``start`` results in ``N preceding``,
while ``None`` yields ``UNBOUNDED PRECEDING``. For both ``start`` and ``end``,
zero will return ``CURRENT ROW``. Positive integers are accepted for ``end``.
There's a difference in what ``CURRENT ROW`` includes. When specified in
``ROWS`` mode, the frame starts or ends with the current row. When specified in
``RANGE`` mode, the frame starts or ends at the first or last peer according to
the ordering clause. Thus, ``RANGE CURRENT ROW`` evaluates the expression for
rows which have the same value specified by the ordering. Because the template
includes both the ``start`` and ``end`` points, this may be expressed with::
ValueRange(start=0, end=0)
If a movie's "peers" are described as movies released by the same studio in the
same genre in the same year, this ``RowRange`` example annotates each movie
with the average rating of a movie's two prior and two following peers::
>>> from django.db.models import Avg, ExtractYear, F, RowRange, Window
>>> Movie.objects.annotate(
>>> avg_rating=Window(
>>> expression=Avg('rating'),
>>> partition_by=[F('studio'), F('genre')],
>>> order_by=ExtractYear('released').asc(),
>>> frame=RowRange(start=-2, end=2),
>>> ),
>>> )
If the database supports it, you can specify the start and end points based on
values of an expression in the partition. If the ``released`` field of the
``Movie`` model stores the release month of each movies, this ``ValueRange``
example annotates each movie with the average rating of a movie's peers
released between twelve months before and twelve months after the each movie.
>>> from django.db.models import Avg, ExpressionList, F, ValueRange, Window
>>> Movie.objects.annotate(
>>> avg_rating=Window(
>>> expression=Avg('rating'),
>>> partition_by=[F('studio'), F('genre')],
>>> order_by=F('released').asc(),
>>> frame=ValueRange(start=-12, end=12),
>>> ),
>>> )
.. currentmodule:: django.db.models
Technical Information
@ -677,6 +862,30 @@ calling the appropriate methods on the wrapped expression.
Tells Django that this expression contains an aggregate and that a
``GROUP BY`` clause needs to be added to the query.
.. attribute:: contains_over_clause
.. versionadded:: 2.0
Tells Django that this expression contains a
:class:`~django.db.models.expressions.Window` expression. It's used,
for example, to disallow window function expressions in queries that
modify data. Defaults to ``True``.
.. attribute:: filterable
.. versionadded:: 2.0
Tells Django that this expression can be referenced in
:meth:`.QuerySet.filter`. Defaults to ``True``.
.. attribute:: window_compatible
.. versionadded:: 2.0
Tells Django that this expression can be used as the source expression
in :class:`~django.db.models.expressions.Window`. Defaults to
``False``.
.. method:: resolve_expression(query=None, allow_joins=True, reuse=None, summarize=False, for_save=False)
Provides the chance to do any pre-processing or validation of

View file

@ -1681,6 +1681,11 @@ raised if ``select_for_update()`` is used in autocommit mode.
``select_for_update()`` you should use
:class:`~django.test.TransactionTestCase`.
.. admonition:: Certain expressions may not be supported
PostgreSQL doesn't support ``select_for_update()`` with
:class:`~django.db.models.expressions.Window` expressions.
.. versionchanged:: 1.11
The ``skip_locked`` argument was added.

View file

@ -52,6 +52,14 @@ Mobile-friendly ``contrib.admin``
The admin is now responsive and supports all major mobile devices.
Older browser may experience varying levels of graceful degradation.
Window expressions
------------------
The new :class:`~django.db.models.expressions.Window` expression allows
adding an ``OVER`` clause to querysets. You can use :ref:`window functions
<window-functions>` and :ref:`aggregate functions <aggregation-functions>` in
the expression.
Minor features
--------------
@ -404,6 +412,11 @@ backends.
requires that the arguments to ``OF`` be columns rather than tables, set
``DatabaseFeatures.select_for_update_of_column = True``.
* To enable support for :class:`~django.db.models.expressions.Window`
expressions, set ``DatabaseFeatures.supports_over_clause`` to ``True``. You
may need to customize the ``DatabaseOperations.window_start_rows_start_end()``
and/or ``window_start_range_start_end()`` methods.
* Third-party database backends should add a
``DatabaseOperations.cast_char_field_without_max_length`` attribute with the
database data type that will be used in the