Fixed #7052 -- Added support for natural keys in serialization.

git-svn-id: http://code.djangoproject.com/svn/django/trunk@11863 bcc190cf-cafb-0310-a4f2-bffc1f526a37
This commit is contained in:
Russell Keith-Magee 2009-12-14 12:39:20 +00:00
parent 44b9076bbe
commit 35cc439228
20 changed files with 927 additions and 37 deletions

View file

@ -234,6 +234,17 @@ name to ``dumpdata``, the dumped output will be restricted to that model,
rather than the entire application. You can also mix application names and
model names.
.. django-admin-option:: --natural
.. versionadded:: 1.2
Use :ref:`natural keys <topics-serialization-natural-keys>` to represent
any foreign key and many-to-many relationship with a model that provides
a natural key definition. If you are dumping ``contrib.auth`` ``Permission``
objects or ``contrib.contenttypes`` ``ContentType`` objects, you should
probably be using this flag.
flush
-----
@ -701,7 +712,7 @@ information.
.. versionadded:: 1.2
Use the ``--failfast`` option to stop running tests and report the failure
Use the ``--failfast`` option to stop running tests and report the failure
immediately after a test fails.
testserver <fixture fixture ...>

View file

@ -267,3 +267,13 @@ include %}`` tags).
As a side effect, it is now much easier to support non-Django template
languages. For more details, see the :ref:`notes on supporting
non-Django template languages<topic-template-alternate-language>`.
Natural keys in fixtures
------------------------
Fixtures can refer to remote objects using
:ref:`topics-serialization-natural-keys`. This lookup scheme is an
alternative to the normal primary-key based object references in a
fixture, improving readability, and resolving problems referring to
objects whose primary key value may not be predictable or known.

View file

@ -154,10 +154,10 @@ to install third-party Python modules:
.. _PyYAML: http://www.pyyaml.org/
Notes for specific serialization formats
----------------------------------------
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
json
~~~~
^^^^
If you're using UTF-8 (or any other non-ASCII encoding) data with the JSON
serializer, you must pass ``ensure_ascii=False`` as a parameter to the
@ -191,3 +191,191 @@ them. Something like this will work::
.. _special encoder: http://svn.red-bean.com/bob/simplejson/tags/simplejson-1.7/docs/index.html
.. _topics-serialization-natural-keys:
Natural keys
------------
The default serialization strategy for foreign keys and many-to-many
relations is to serialize the value of the primary key(s) of the
objects in the relation. This strategy works well for most types of
object, but it can cause difficulty in some circumstances.
Consider the case of a list of objects that have foreign key on
:class:`ContentType`. If you're going to serialize an object that
refers to a content type, you need to have a way to refer to that
content type. Content Types are automatically created by Django as
part of the database synchronization process, so you don't need to
include content types in a fixture or other serialized data. As a
result, the primary key of any given content type isn't easy to
predict - it will depend on how and when :djadmin:`syncdb` was
executed to create the content types.
There is also the matter of convenience. An integer id isn't always
the most convenient way to refer to an object; sometimes, a
more natural reference would be helpful.
Deserialization of natural keys
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
It is for these reasons that Django provides `natural keys`. A natural
key is a tuple of values that can be used to uniquely identify an
object instance without using the primary key value.
Consider the following two models::
from django.db import models
class Person(models.Model):
first_name = models.CharField(max_length=100)
last_name = models.CharField(max_length=100)
birthdate = models.DateField()
class Book(models.Model):
name = models.CharField(max_length=100)
author = models.ForeignKey(Person)
Ordinarily, serialized data for ``Book`` would use an integer to refer to
the author. For example, in JSON, a Book might be serialized as::
...
{
"pk": 1,
"model": "store.book",
"fields": {
"name": "Mostly Harmless",
"author": 42
}
}
...
This isn't a particularly natural way to refer to an author. It
requires that you know the primary key value for the author; it also
requires that this primary key value is stable and predictable.
However, if we add natural key handling to Person, the fixture becomes
much more humane. To add natural key handling, you define a default
Manager for Person with a ``get_by_natural_key()`` method. In the case
of a Person, a good natural key might be the pair of first and last
name::
from django.db import models
class PersonManager(models.Manager):
def get_by_natural_key(self, first_name, last_name):
return self.filter(first_name=first_name, last_name=last_name)
class Person(models.Model):
objects = PersonManager()
first_name = models.CharField(max_length=100)
last_name = models.CharField(max_length=100)
birthdate = models.DateField()
Now books can use that natural key to refer to ``Person`` objects::
...
{
"pk": 1,
"model": "store.book",
"fields": {
"name": "Mostly Harmless",
"author": ["Douglas", "Adams"]
}
}
...
When you try to load this serialized data, Django will use the
``get_by_natural_key()`` method to resolve ``["Douglas", "Adams"]``
into the primary key of an actual ``Person`` object.
Serialization of natural keys
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
So how do you get Django to emit a natural key when serializing an object?
Firstly, you need to add another method -- this time to the model itself::
class Person(models.Model):
objects = PersonManager()
first_name = models.CharField(max_length=100)
last_name = models.CharField(max_length=100)
birthdate = models.DateField()
def natural_key(self):
return (self.first_name, self.last_name)
Then, when you call ``serializers.serialize()``, you provide a
``use_natural_keys=True`` argument::
>>> serializers.serialize([book1, book2], format='json', indent=2, use_natural_keys=True)
When ``use_natural_keys=True`` is specified, Django will use the
``natural_key()`` method to serialize any reference to objects of the
type that defines the method.
If you are using :djadmin:`dumpdata` to generate serialized data, you
use the `--natural` command line flag to generate natural keys.
.. note::
You don't need to define both ``natural_key()`` and
``get_by_natural_key()``. If you don't want Django to output
natural keys during serialization, but you want to retain the
ability to load natural keys, then you can opt to not implement
the ``natural_key()`` method.
Conversely, if (for some strange reason) you want Django to output
natural keys during serialization, but *not* be able to load those
key values, just don't define the ``get_by_natural_key()`` method.
Dependencies during serialization
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Since natural keys rely on database lookups to resolve references, it
is important that data exists before it is referenced. You can't make
a `forward reference` with natural keys - the data you are referencing
must exist before you include a natural key reference to that data.
To accommodate this limitation, calls to :djadmin:`dumpdata` that use
the :djadminopt:`--natural` optionwill serialize any model with a
``natural_key()`` method before it serializes normal key objects.
However, this may not always be enough. If your natural key refers to
another object (by using a foreign key or natural key to another object
as part of a natural key), then you need to be able to ensure that
the objects on which a natural key depends occur in the serialized data
before the natural key requires them.
To control this ordering, you can define dependencies on your
``natural_key()`` methods. You do this by setting a ``dependencies``
attribute on the ``natural_key()`` method itself.
For example, consider the ``Permission`` model in ``contrib.auth``.
The following is a simplified version of the ``Permission`` model::
class Permission(models.Model):
name = models.CharField(max_length=50)
content_type = models.ForeignKey(ContentType)
codename = models.CharField(max_length=100)
# ...
def natural_key(self):
return (self.codename,) + self.content_type.natural_key()
The natural key for a ``Permission`` is a combination of the codename for the
``Permission``, and the ``ContentType`` to which the ``Permission`` applies. This means
that ``ContentType`` must be serialized before ``Permission``. To define this
dependency, we add one extra line::
class Permission(models.Model):
# ...
def natural_key(self):
return (self.codename,) + self.content_type.natural_key()
natural_key.dependencies = ['contenttypes.contenttype']
This definition ensures that ``ContentType`` models are serialized before
``Permission`` models. In turn, any object referencing ``Permission`` will
be serialized after both ``ContentType`` and ``Permission``.