mirror of
https://github.com/django/django.git
synced 2025-08-03 10:34:04 +00:00
Fixed #26608 -- Added support for window expressions (OVER clause).
Thanks Josh Smeaton, Mariusz Felisiak, Sergey Fedoseev, Simon Charettes, Adam Chainz/Johnson and Tim Graham for comments and reviews and Jamie Cockburn for initial patch.
This commit is contained in:
parent
da1ba03f1d
commit
d549b88050
25 changed files with 1627 additions and 8 deletions
|
@ -819,3 +819,132 @@ Usage example::
|
|||
'minute': 'minute': datetime.datetime(2014, 6, 15, 14, 30, tzinfo=<UTC>),
|
||||
'second': datetime.datetime(2014, 6, 15, 14, 30, 50, tzinfo=<UTC>)
|
||||
}
|
||||
|
||||
.. _window-functions:
|
||||
|
||||
Window functions
|
||||
================
|
||||
|
||||
.. versionadded:: 2.0
|
||||
|
||||
There are a number of functions to use in a
|
||||
:class:`~django.db.models.expressions.Window` expression for computing the rank
|
||||
of elements or the :class:`Ntile` of some rows.
|
||||
|
||||
``CumeDist``
|
||||
------------
|
||||
|
||||
.. class:: CumeDist(*expressions, **extra)
|
||||
|
||||
Calculates the cumulative distribution of a value within a window or partition.
|
||||
The cumulative distribution is defined as the number of rows preceding or
|
||||
peered with the current row divided by the total number of rows in the frame.
|
||||
|
||||
``DenseRank``
|
||||
-------------
|
||||
|
||||
.. class:: DenseRank(*expressions, **extra)
|
||||
|
||||
Equivalent to :class:`Rank` but does not have gaps.
|
||||
|
||||
``FirstValue``
|
||||
--------------
|
||||
|
||||
.. class:: FirstValue(expression, **extra)
|
||||
|
||||
Returns the value evaluated at the row that's the first row of the window
|
||||
frame, or ``None`` if no such value exists.
|
||||
|
||||
``Lag``
|
||||
-------
|
||||
|
||||
.. class:: Lag(expression, offset=1, default=None, **extra)
|
||||
|
||||
Calculates the value offset by ``offset``, and if no row exists there, returns
|
||||
``default``.
|
||||
|
||||
``default`` must have the same type as the ``expression``, however, this is
|
||||
only validated by the database and not in Python.
|
||||
|
||||
``LastValue``
|
||||
-------------
|
||||
|
||||
.. class:: LastValue(expression, **extra)
|
||||
|
||||
Comparable to :class:`FirstValue`, it calculates the last value in a given
|
||||
frame clause.
|
||||
|
||||
``Lead``
|
||||
--------
|
||||
|
||||
.. class:: Lead(expression, offset=1, default=None, **extra)
|
||||
|
||||
Calculates the leading value in a given :ref:`frame <window-frames>`. Both
|
||||
``offset`` and ``default`` are evaluated with respect to the current row.
|
||||
|
||||
``default`` must have the same type as the ``expression``, however, this is
|
||||
only validated by the database and not in Python.
|
||||
|
||||
``NthValue``
|
||||
------------
|
||||
|
||||
.. class:: NthValue(expression, nth=1, **extra)
|
||||
|
||||
Computes the row relative to the offset ``nth`` (must be a positive value)
|
||||
within the window. Returns ``None`` if no row exists.
|
||||
|
||||
Some databases may handle a nonexistent nth-value differently. For example,
|
||||
Oracle returns an empty string rather than ``None`` for character-based
|
||||
expressions. Django doesn't do any conversions in these cases.
|
||||
|
||||
``Ntile``
|
||||
---------
|
||||
|
||||
.. class:: Ntile(num_buckets=1, **extra)
|
||||
|
||||
Calculates a partition for each of the rows in the frame clause, distributing
|
||||
numbers as evenly as possible between 1 and ``num_buckets``. If the rows don't
|
||||
divide evenly into a number of buckets, one or more buckets will be represented
|
||||
more frequently.
|
||||
|
||||
``PercentRank``
|
||||
---------------
|
||||
|
||||
.. class:: PercentRank(*expressions, **extra)
|
||||
|
||||
Computes the percentile rank of the rows in the frame clause. This
|
||||
computation is equivalent to evaluating::
|
||||
|
||||
(rank - 1) / (total rows - 1)
|
||||
|
||||
The following table explains the calculation for the percentile rank of a row:
|
||||
|
||||
===== ===== ==== ============ ============
|
||||
Row # Value Rank Calculation Percent Rank
|
||||
===== ===== ==== ============ ============
|
||||
1 15 1 (1-1)/(7-1) 0.0000
|
||||
2 20 2 (2-1)/(7-1) 0.1666
|
||||
3 20 2 (2-1)/(7-1) 0.1666
|
||||
4 20 2 (2-1)/(7-1) 0.1666
|
||||
5 30 5 (5-1)/(7-1) 0.6666
|
||||
6 30 5 (5-1)/(7-1) 0.6666
|
||||
7 40 7 (7-1)/(7-1) 1.0000
|
||||
===== ===== ==== ============ ============
|
||||
|
||||
``Rank``
|
||||
--------
|
||||
|
||||
.. class:: Rank(*expressions, **extra)
|
||||
|
||||
Comparable to ``RowNumber``, this function ranks rows in the window. The
|
||||
computed rank contains gaps. Use :class:`DenseRank` to compute rank without
|
||||
gaps.
|
||||
|
||||
``RowNumber``
|
||||
-------------
|
||||
|
||||
.. class:: RowNumber(*expressions, **extra)
|
||||
|
||||
Computes the row number according to the ordering of either the frame clause
|
||||
or the ordering of the whole query if there is no partitioning of the
|
||||
:ref:`window frame <window-frames>`.
|
||||
|
|
|
@ -353,6 +353,13 @@ The ``Aggregate`` API is as follows:
|
|||
generated. Specifically, the ``function`` will be interpolated as the
|
||||
``function`` placeholder within :attr:`template`. Defaults to ``None``.
|
||||
|
||||
.. attribute:: window_compatible
|
||||
|
||||
.. versionadded:: 2.0
|
||||
|
||||
Defaults to ``True`` since most aggregate functions can be used as the
|
||||
source expression in :class:`~django.db.models.expressions.Window`.
|
||||
|
||||
The ``expression`` argument can be the name of a field on the model, or another
|
||||
expression. It will be converted to a string and used as the ``expressions``
|
||||
placeholder within the ``template``.
|
||||
|
@ -649,6 +656,184 @@ should avoid them if possible.
|
|||
force you to acknowledge that you're not interpolating your SQL with user
|
||||
provided data.
|
||||
|
||||
Window functions
|
||||
----------------
|
||||
|
||||
.. versionadded:: 2.0
|
||||
|
||||
Window functions provide a way to apply functions on partitions. Unlike a
|
||||
normal aggregation function which computes a final result for each set defined
|
||||
by the group by, window functions operate on :ref:`frames <window-frames>` and
|
||||
partitions, and compute the result for each row.
|
||||
|
||||
You can specify multiple windows in the same query which in Django ORM would be
|
||||
equivalent to including multiple expressions in a :doc:`QuerySet.annotate()
|
||||
</topics/db/aggregation>` call. The ORM doesn't make use of named windows,
|
||||
instead they are part of the selected columns.
|
||||
|
||||
.. class:: Window(expression, partition_by=None, order_by=None, frame=None, output_field=None)
|
||||
|
||||
.. attribute:: filterable
|
||||
|
||||
Defaults to ``False``. The SQL standard disallows referencing window
|
||||
functions in the ``WHERE`` clause and Django raises an exception when
|
||||
constructing a ``QuerySet`` that would do that.
|
||||
|
||||
.. attribute:: template
|
||||
|
||||
Defaults to ``%(expression)s OVER (%(window)s)'``. If only the
|
||||
``expression`` argument is provided, the window clause will be blank.
|
||||
|
||||
The ``Window`` class is the main expression for an ``OVER`` clause.
|
||||
|
||||
The ``expression`` argument is either a :ref:`window function
|
||||
<window-functions>`, an :ref:`aggregate function <aggregation-functions>`, or
|
||||
an expression that's compatible in a window clause.
|
||||
|
||||
The ``partition_by`` argument is a list of expressions (column names should be
|
||||
wrapped in an ``F``-object) that control the partitioning of the rows.
|
||||
Partitioning narrows which rows are used to compute the result set.
|
||||
|
||||
The ``output_field`` is specified either as an argument or by the expression.
|
||||
|
||||
The ``order_by`` argument accepts a sequence of expressions on which you can
|
||||
call :meth:`~django.db.models.Expression.asc` and
|
||||
:meth:`~django.db.models.Expression.desc`. The ordering controls the order in
|
||||
which the expression is applied. For example, if you sum over the rows in a
|
||||
partition, the first result is just the value of the first row, the second is
|
||||
the sum of first and second row.
|
||||
|
||||
The ``frame`` parameter specifies which other rows that should be used in the
|
||||
computation. See :ref:`window-frames` for details.
|
||||
|
||||
For example, to annotate each movie with the average rating for the movies by
|
||||
the same studio in the same genre and release year::
|
||||
|
||||
>>> from django.db.models import Avg, ExtractYear, F, Window
|
||||
>>> Movie.objects.annotate(
|
||||
>>> avg_rating=Window(
|
||||
>>> expression=Avg('rating'),
|
||||
>>> partition_by=[F('studio'), F('genre')],
|
||||
>>> order_by=ExtractYear('released').asc(),
|
||||
>>> ),
|
||||
>>> )
|
||||
|
||||
This makes it easy to check if a movie is rated better or worse than its peers.
|
||||
|
||||
You may want to apply multiple expressions over the same window, i.e., the
|
||||
same partition and frame. For example, you could modify the previous example
|
||||
to also include the best and worst rating in each movie's group (same studio,
|
||||
genre, and release year) by using three window functions in the same query. The
|
||||
partition and ordering from the previous example is extracted into a dictionary
|
||||
to reduce repetition::
|
||||
|
||||
>>> from django.db.models import Avg, ExtractYear, F, Max, Min, Window
|
||||
>>> window = {
|
||||
>>> 'partition': [F('studio'), F('genre')],
|
||||
>>> 'order_by': ExtractYear('released').asc(),
|
||||
>>> }
|
||||
>>> Movie.objects.annotate(
|
||||
>>> avg_rating=Window(
|
||||
>>> expression=Avg('rating'), **window,
|
||||
>>> ),
|
||||
>>> best=Window(
|
||||
>>> expression=Max('rating'), **window,
|
||||
>>> ),
|
||||
>>> worst=Window(
|
||||
>>> expression=Min('rating'), **window,
|
||||
>>> ),
|
||||
>>> )
|
||||
|
||||
Among Django's built-in database backends, MySQL 8.0.2+, PostgreSQL, and Oracle
|
||||
support window expressions. Support for different window expression features
|
||||
varies among the different databases. For example, the options in
|
||||
:meth:`~django.db.models.Expression.asc` and
|
||||
:meth:`~django.db.models.Expression.desc` may not be supported. Consult the
|
||||
documentation for your database as needed.
|
||||
|
||||
.. _window-frames:
|
||||
|
||||
Frames
|
||||
~~~~~~
|
||||
|
||||
For a window frame, you can choose either a range-based sequence of rows or an
|
||||
ordinary sequence of rows.
|
||||
|
||||
.. class:: ValueRange(start=None, end=None)
|
||||
|
||||
.. attribute:: frame_type
|
||||
|
||||
This attribute is set to ``'RANGE'``.
|
||||
|
||||
PostgreSQL has limited support for ``ValueRange`` and only supports use of
|
||||
the standard start and end points, such as ``CURRENT ROW`` and ``UNBOUNDED
|
||||
FOLLOWING``.
|
||||
|
||||
.. class:: RowRange(start=None, end=None)
|
||||
|
||||
.. attribute:: frame_type
|
||||
|
||||
This attribute is set to ``'ROWS'``.
|
||||
|
||||
Both classes return SQL with the template::
|
||||
|
||||
%(frame_type)s BETWEEN %(start)s AND %(end)s
|
||||
|
||||
Frames narrow the rows that are used for computing the result. They shift from
|
||||
some start point to some specified end point. Frames can be used with and
|
||||
without partitions, but it's often a good idea to specify an ordering of the
|
||||
window to ensure a deterministic result. In a frame, a peer in a frame is a row
|
||||
with an equivalent value, or all rows if an ordering clause isn't present.
|
||||
|
||||
The default starting point for a frame is ``UNBOUNDED PRECEDING`` which is the
|
||||
first row of the partition. The end point is always explicitly included in the
|
||||
SQL generated by the ORM and is by default ``UNBOUNDED FOLLOWING``. The default
|
||||
frame includes all rows from the partition to the last row in the set.
|
||||
|
||||
The accepted values for the ``start`` and ``end`` arguments are ``None``, an
|
||||
integer, or zero. A negative integer for ``start`` results in ``N preceding``,
|
||||
while ``None`` yields ``UNBOUNDED PRECEDING``. For both ``start`` and ``end``,
|
||||
zero will return ``CURRENT ROW``. Positive integers are accepted for ``end``.
|
||||
|
||||
There's a difference in what ``CURRENT ROW`` includes. When specified in
|
||||
``ROWS`` mode, the frame starts or ends with the current row. When specified in
|
||||
``RANGE`` mode, the frame starts or ends at the first or last peer according to
|
||||
the ordering clause. Thus, ``RANGE CURRENT ROW`` evaluates the expression for
|
||||
rows which have the same value specified by the ordering. Because the template
|
||||
includes both the ``start`` and ``end`` points, this may be expressed with::
|
||||
|
||||
ValueRange(start=0, end=0)
|
||||
|
||||
If a movie's "peers" are described as movies released by the same studio in the
|
||||
same genre in the same year, this ``RowRange`` example annotates each movie
|
||||
with the average rating of a movie's two prior and two following peers::
|
||||
|
||||
>>> from django.db.models import Avg, ExtractYear, F, RowRange, Window
|
||||
>>> Movie.objects.annotate(
|
||||
>>> avg_rating=Window(
|
||||
>>> expression=Avg('rating'),
|
||||
>>> partition_by=[F('studio'), F('genre')],
|
||||
>>> order_by=ExtractYear('released').asc(),
|
||||
>>> frame=RowRange(start=-2, end=2),
|
||||
>>> ),
|
||||
>>> )
|
||||
|
||||
If the database supports it, you can specify the start and end points based on
|
||||
values of an expression in the partition. If the ``released`` field of the
|
||||
``Movie`` model stores the release month of each movies, this ``ValueRange``
|
||||
example annotates each movie with the average rating of a movie's peers
|
||||
released between twelve months before and twelve months after the each movie.
|
||||
|
||||
>>> from django.db.models import Avg, ExpressionList, F, ValueRange, Window
|
||||
>>> Movie.objects.annotate(
|
||||
>>> avg_rating=Window(
|
||||
>>> expression=Avg('rating'),
|
||||
>>> partition_by=[F('studio'), F('genre')],
|
||||
>>> order_by=F('released').asc(),
|
||||
>>> frame=ValueRange(start=-12, end=12),
|
||||
>>> ),
|
||||
>>> )
|
||||
|
||||
.. currentmodule:: django.db.models
|
||||
|
||||
Technical Information
|
||||
|
@ -677,6 +862,30 @@ calling the appropriate methods on the wrapped expression.
|
|||
Tells Django that this expression contains an aggregate and that a
|
||||
``GROUP BY`` clause needs to be added to the query.
|
||||
|
||||
.. attribute:: contains_over_clause
|
||||
|
||||
.. versionadded:: 2.0
|
||||
|
||||
Tells Django that this expression contains a
|
||||
:class:`~django.db.models.expressions.Window` expression. It's used,
|
||||
for example, to disallow window function expressions in queries that
|
||||
modify data. Defaults to ``True``.
|
||||
|
||||
.. attribute:: filterable
|
||||
|
||||
.. versionadded:: 2.0
|
||||
|
||||
Tells Django that this expression can be referenced in
|
||||
:meth:`.QuerySet.filter`. Defaults to ``True``.
|
||||
|
||||
.. attribute:: window_compatible
|
||||
|
||||
.. versionadded:: 2.0
|
||||
|
||||
Tells Django that this expression can be used as the source expression
|
||||
in :class:`~django.db.models.expressions.Window`. Defaults to
|
||||
``False``.
|
||||
|
||||
.. method:: resolve_expression(query=None, allow_joins=True, reuse=None, summarize=False, for_save=False)
|
||||
|
||||
Provides the chance to do any pre-processing or validation of
|
||||
|
|
|
@ -1681,6 +1681,11 @@ raised if ``select_for_update()`` is used in autocommit mode.
|
|||
``select_for_update()`` you should use
|
||||
:class:`~django.test.TransactionTestCase`.
|
||||
|
||||
.. admonition:: Certain expressions may not be supported
|
||||
|
||||
PostgreSQL doesn't support ``select_for_update()`` with
|
||||
:class:`~django.db.models.expressions.Window` expressions.
|
||||
|
||||
.. versionchanged:: 1.11
|
||||
|
||||
The ``skip_locked`` argument was added.
|
||||
|
|
|
@ -52,6 +52,14 @@ Mobile-friendly ``contrib.admin``
|
|||
The admin is now responsive and supports all major mobile devices.
|
||||
Older browser may experience varying levels of graceful degradation.
|
||||
|
||||
Window expressions
|
||||
------------------
|
||||
|
||||
The new :class:`~django.db.models.expressions.Window` expression allows
|
||||
adding an ``OVER`` clause to querysets. You can use :ref:`window functions
|
||||
<window-functions>` and :ref:`aggregate functions <aggregation-functions>` in
|
||||
the expression.
|
||||
|
||||
Minor features
|
||||
--------------
|
||||
|
||||
|
@ -404,6 +412,11 @@ backends.
|
|||
requires that the arguments to ``OF`` be columns rather than tables, set
|
||||
``DatabaseFeatures.select_for_update_of_column = True``.
|
||||
|
||||
* To enable support for :class:`~django.db.models.expressions.Window`
|
||||
expressions, set ``DatabaseFeatures.supports_over_clause`` to ``True``. You
|
||||
may need to customize the ``DatabaseOperations.window_start_rows_start_end()``
|
||||
and/or ``window_start_range_start_end()`` methods.
|
||||
|
||||
* Third-party database backends should add a
|
||||
``DatabaseOperations.cast_char_field_without_max_length`` attribute with the
|
||||
database data type that will be used in the
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue