Fixed #17003 - prefetch_related should support foreign keys/one-to-one

Support for `GenericForeignKey` is also included.

git-svn-id: http://code.djangoproject.com/svn/django/trunk@16939 bcc190cf-cafb-0310-a4f2-bffc1f526a37
This commit is contained in:
Luke Plant 2011-10-07 16:05:53 +00:00
parent 672f2db24a
commit 052a011ee6
8 changed files with 366 additions and 119 deletions

View file

@ -696,14 +696,26 @@ prefetch_related
.. versionadded:: 1.4
Returns a ``QuerySet`` that will automatically retrieve, in a single batch,
related many-to-many and many-to-one objects for each of the specified lookups.
related objects for each of the specified lookups.
This is similar to ``select_related`` for the 'many related objects' case, but
note that ``prefetch_related`` causes a separate query to be issued for each set
of related objects that you request, unlike ``select_related`` which modifies
the original query with joins in order to get the related objects. With
``prefetch_related``, the additional queries are done as soon as the QuerySet
begins to be evaluated.
This has a similar purpose to ``select_related``, in that both are designed to
stop the deluge of database queries that is caused by accessing related objects,
but the strategy is quite different.
``select_related`` works by creating a SQL join and including the fields of the
related object in the SELECT statement. For this reason, ``select_related`` gets
the related objects in the same database query. However, to avoid the much
larger result set that would result from joining across a 'many' relationship,
``select_related`` is limited to single-valued relationships - foreign key and
one-to-one.
``prefetch_related``, on the other hand, does a separate lookup for each
relationship, and does the 'joining' in Python. This allows it to prefetch
many-to-many and many-to-one objects, which cannot be done using
``select_related``, in addition to the foreign key and one-to-one relationships
that are supported by ``select_related``. It also supports prefetching of
:class:`~django.contrib.contenttypes.generic.GenericRelation` and
:class:`~django.contrib.contenttypes.generic.GenericForeignKey`.
For example, suppose you have these models::
@ -733,14 +745,17 @@ All the relevant toppings will be fetched in a single query, and used to make
``QuerySets`` that have a pre-filled cache of the relevant results. These
``QuerySets`` are then used in the ``self.toppings.all()`` calls.
Please note that use of ``prefetch_related`` will mean that the additional
queries run will **always** be executed - even if you never use the related
objects - and it always fully populates the result cache on the primary
``QuerySet`` (which can sometimes be avoided in other cases).
The additional queries are executed after the QuerySet has begun to be evaluated
and the primary query has been executed. Note that the result cache of the
primary QuerySet and all specified related objects will then be fully loaded
into memory, which is often avoided in other cases - even after a query has been
executed in the database, QuerySet normally tries to make uses of chunking
between the database to avoid loading all objects into memory before you need
them.
Also remember that, as always with QuerySets, any subsequent chained methods
will ignore previously cached results, and retrieve data using a fresh database
query. So, if you write the following:
which imply a different database query will ignore previously cached results,
and retrieve data using a fresh database query. So, if you write the following:
>>> pizzas = Pizza.objects.prefetch_related('toppings')
>>> [list(pizza.toppings.filter(spicy=True)) for pizza in pizzas]
@ -749,12 +764,6 @@ query. So, if you write the following:
you - in fact it hurts performance, since you have done a database query that
you haven't used. So use this feature with caution!
The lookups that must be supplied to this method can be any attributes on the
model instances which represent related queries that return multiple
objects. This includes attributes representing the 'many' side of ``ForeignKey``
relationships, forward and reverse ``ManyToManyField`` attributes, and also any
``GenericRelations``.
You can also use the normal join syntax to do related fields of related
fields. Suppose we have an additional model to the example above::
@ -770,24 +779,40 @@ This will prefetch all pizzas belonging to restaurants, and all toppings
belonging to those pizzas. This will result in a total of 3 database queries -
one for the restaurants, one for the pizzas, and one for the toppings.
>>> Restaurant.objects.select_related('best_pizza').prefetch_related('best_pizza__toppings')
>>> Restaurant.objects.prefetch_related('best_pizza__toppings')
This will fetch the best pizza and all the toppings for the best pizza for each
restaurant. This will be done in 2 database queries - one for the restaurants
and 'best pizzas' combined (achieved through use of ``select_related``), and one
for the toppings.
restaurant. This will be done in 3 database queries - one for the restaurants,
one for the 'best pizzas', and one for one for the toppings.
Chaining ``prefetch_related`` calls will accumulate the fields that should have
this behavior applied. To clear any ``prefetch_related`` behavior, pass `None`
as a parameter::
Of course, the ``best_pizza`` relationship could also be fetched using
``select_related`` to reduce the query count to 2:
>>> Restaurant.objects.select_related('best_pizza').prefetch_related('best_pizza__toppings')
Since the prefetch is executed after the main query (which includes the joins
needed by ``select_related``), it is able to detect that the ``best_pizza``
objects have already been fetched, and it will skip fetching them again.
Chaining ``prefetch_related`` calls will accumulate the lookups that are
prefetched. To clear any ``prefetch_related`` behavior, pass `None` as a
parameter::
>>> non_prefetched = qs.prefetch_related(None)
One difference when using ``prefetch_related`` is that, in some circumstances,
objects created by a query can be shared between the different objects that they
are related to i.e. a single Python model instance can appear at more than one
point in the tree of objects that are returned. Normally this behavior will not
be a problem, and will in fact save both memory and CPU time.
One difference to note when using ``prefetch_related`` is that objects created
by a query can be shared between the different objects that they are related to
i.e. a single Python model instance can appear at more than one point in the
tree of objects that are returned. This will normally happen with foreign key
relationships. Typically this behavior will not be a problem, and will in fact
save both memory and CPU time.
While ``prefetch_related`` supports prefetching ``GenericForeignKey``
relationships, the number of queries will depend on the data. Since a
``GenericForeignKey`` can reference data in multiple tables, one query per table
referenced is needed, rather than one query for all the items. There could be
additional queries on the ``ContentType`` table if the relevant rows have not
already been fetched.
extra
~~~~~

View file

@ -66,15 +66,18 @@ information.
``QuerySet.prefetch_related``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Analagous to :meth:`~django.db.models.query.QuerySet.select_related` but for
many-to-many relationships,
Similar to :meth:`~django.db.models.query.QuerySet.select_related` but with a
different strategy and broader scope,
:meth:`~django.db.models.query.QuerySet.prefetch_related` has been added to
:class:`~django.db.models.query.QuerySet`. This method returns a new ``QuerySet``
that will prefetch in a single batch each of the specified related lookups as
soon as it begins to be evaluated (e.g. by iterating over it). This enables you
to fix many instances of a very common performance problem, in which your code
ends up doing O(n) database queries (or worse) if objects on your primary
``QuerySet`` each have many related objects that you also need.
:class:`~django.db.models.query.QuerySet`. This method returns a new
``QuerySet`` that will prefetch in a single batch each of the specified related
lookups as soon as it begins to be evaluated. Unlike ``select_related``, it does
the joins in Python, not in the database, and supports many-to-many
relationships, :class:`~django.contrib.contenttypes.generic.GenericForeignKey`
and more. This enables you to fix many instances of a very common performance
problem, in which your code ends up doing O(n) database queries (or worse) if
objects on your primary ``QuerySet`` each have many related objects that you
also need.
HTML5
~~~~~