mirror of
				https://github.com/python/cpython.git
				synced 2025-11-03 19:34:08 +00:00 
			
		
		
		
	svn+ssh://pythondev@svn.python.org/python/trunk
........
  r53624 | peter.astrand | 2007-02-02 20:06:36 +0100 (Fri, 02 Feb 2007) | 1 line
  We had several if statements checking the value of a fd. This is unsafe, since valid fds might be zero. We should check for not None instead.
........
  r53635 | kurt.kaiser | 2007-02-05 07:03:18 +0100 (Mon, 05 Feb 2007) | 2 lines
  Add 'raw' support to configHandler. Patch 1650174 Tal Einat.
........
  r53641 | kurt.kaiser | 2007-02-06 00:02:16 +0100 (Tue, 06 Feb 2007) | 5 lines
  1. Calltips now 'handle' tuples in the argument list (display '<tuple>' :)
     Suggested solution by Christos Georgiou, Bug 791968.
  2. Clean up tests, were not failing when they should have been.
  4. Remove some camelcase and an unneeded try/except block.
........
  r53644 | kurt.kaiser | 2007-02-06 04:21:40 +0100 (Tue, 06 Feb 2007) | 2 lines
  Clean up ModifiedInterpreter.runcode() structure
........
  r53646 | peter.astrand | 2007-02-06 16:37:50 +0100 (Tue, 06 Feb 2007) | 1 line
  Applied patch 1124861.3.patch to solve bug #1124861: Automatically create pipes on Windows, if GetStdHandle fails. Will backport.
........
  r53648 | lars.gustaebel | 2007-02-06 19:38:13 +0100 (Tue, 06 Feb 2007) | 4 lines
  Patch #1652681: create nonexistent files in append mode and
  allow appending to empty files.
........
  r53649 | kurt.kaiser | 2007-02-06 20:09:43 +0100 (Tue, 06 Feb 2007) | 4 lines
  Updated patch (CodeContext.061217.patch) to
  [ 1362975 ] CodeContext - Improved text indentation
  Tal Einat 16Dec06
........
  r53650 | kurt.kaiser | 2007-02-06 20:21:19 +0100 (Tue, 06 Feb 2007) | 2 lines
  narrow exception per [ 1540849 ] except too broad
........
  r53653 | kurt.kaiser | 2007-02-07 04:39:41 +0100 (Wed, 07 Feb 2007) | 4 lines
  [ 1621265 ] Auto-completion list placement
  Move AC window below input line unless not enough space, then put it above.
  Patch: Tal Einat
........
  r53654 | kurt.kaiser | 2007-02-07 09:07:13 +0100 (Wed, 07 Feb 2007) | 2 lines
  Handle AttributeError during calltip lookup
........
  r53656 | raymond.hettinger | 2007-02-07 21:08:22 +0100 (Wed, 07 Feb 2007) | 3 lines
  SF #1615701:  make d.update(m) honor __getitem__() and keys() in dict subclasses
........
  r53658 | raymond.hettinger | 2007-02-07 22:04:20 +0100 (Wed, 07 Feb 2007) | 1 line
  SF: 1397711 Set docs conflated immutable and hashable
........
  r53660 | raymond.hettinger | 2007-02-07 22:42:17 +0100 (Wed, 07 Feb 2007) | 1 line
  Check for a common user error with defaultdict().
........
  r53662 | raymond.hettinger | 2007-02-07 23:24:07 +0100 (Wed, 07 Feb 2007) | 1 line
  Bug #1575169: operator.isSequenceType() now returns False for subclasses of dict.
........
  r53664 | raymond.hettinger | 2007-02-08 00:49:03 +0100 (Thu, 08 Feb 2007) | 1 line
  Silence compiler warning
........
  r53666 | raymond.hettinger | 2007-02-08 01:07:32 +0100 (Thu, 08 Feb 2007) | 1 line
  Do not let overflows in enumerate() and count() pass silently.
........
  r53668 | raymond.hettinger | 2007-02-08 01:50:39 +0100 (Thu, 08 Feb 2007) | 1 line
  Bypass set specific optimizations for set and frozenset subclasses.
........
  r53670 | raymond.hettinger | 2007-02-08 02:42:35 +0100 (Thu, 08 Feb 2007) | 1 line
  Fix docstring bug
........
  r53671 | martin.v.loewis | 2007-02-08 10:13:36 +0100 (Thu, 08 Feb 2007) | 3 lines
  Bug #1653736: Complain about keyword arguments to time.isoformat.
  Will backport to 2.5.
........
  r53679 | kurt.kaiser | 2007-02-08 23:58:18 +0100 (Thu, 08 Feb 2007) | 6 lines
  Corrected some bugs in AutoComplete.  Also, Page Up/Down in ACW implemented;
  mouse and cursor selection in ACWindow implemented; double Tab inserts current
  selection and closes ACW (similar to double-click and Return); scroll wheel now
  works in ACW.  Added AutoComplete instructions to IDLE Help.
........
  r53689 | martin.v.loewis | 2007-02-09 13:19:32 +0100 (Fri, 09 Feb 2007) | 3 lines
  Bug #1653736: Properly discard third argument to slot_nb_inplace_power.
  Will backport.
........
  r53691 | martin.v.loewis | 2007-02-09 13:36:48 +0100 (Fri, 09 Feb 2007) | 4 lines
  Bug #1600860: Search for shared python library in LIBDIR, not
  lib/python/config, on "linux" and "gnu" systems.
  Will backport.
........
  r53693 | martin.v.loewis | 2007-02-09 13:58:49 +0100 (Fri, 09 Feb 2007) | 2 lines
  Update broken link. Will backport to 2.5.
........
  r53697 | georg.brandl | 2007-02-09 19:48:41 +0100 (Fri, 09 Feb 2007) | 2 lines
  Bug #1656078: typo in in profile docs.
........
  r53731 | brett.cannon | 2007-02-11 06:36:00 +0100 (Sun, 11 Feb 2007) | 3 lines
  Change a very minor inconsistency (that is purely cosmetic) in the AST
  definition.
........
  r53735 | skip.montanaro | 2007-02-11 19:24:37 +0100 (Sun, 11 Feb 2007) | 1 line
  fix trace.py --ignore-dir
........
  r53741 | brett.cannon | 2007-02-11 20:44:41 +0100 (Sun, 11 Feb 2007) | 3 lines
  Check in changed Python-ast.c from a cosmetic change to Python.asdl (in
  r53731).
........
  r53751 | brett.cannon | 2007-02-12 04:51:02 +0100 (Mon, 12 Feb 2007) | 5 lines
  Modify Parser/asdl_c.py so that the __version__ number for Python/Python-ast.c
  is specified at the top of the file.  Also add a note that Python/Python-ast.c
  needs to be committed separately after a change to the AST grammar to capture
  the revision number of the change (which is what __version__ is set to).
........
  r53752 | lars.gustaebel | 2007-02-12 10:25:53 +0100 (Mon, 12 Feb 2007) | 3 lines
  Bug #1656581: Point out that external file objects are supposed to be
  at position 0.
........
  r53754 | martin.v.loewis | 2007-02-12 13:21:10 +0100 (Mon, 12 Feb 2007) | 3 lines
  Patch 1463026: Support default namespace in XMLGenerator.
  Fixes #847665. Will backport.
........
  r53757 | armin.rigo | 2007-02-12 17:23:24 +0100 (Mon, 12 Feb 2007) | 4 lines
  Fix the line to what is my guess at the original author's meaning.
  (The line has no effect anyway, but is present because it's
  customary call the base class __init__).
........
  r53763 | martin.v.loewis | 2007-02-13 09:34:45 +0100 (Tue, 13 Feb 2007) | 3 lines
  Patch #685268: Consider a package's __path__ in imputil.
  Will backport.
........
  r53765 | martin.v.loewis | 2007-02-13 10:49:38 +0100 (Tue, 13 Feb 2007) | 2 lines
  Patch #698833: Support file decryption in zipfile.
........
  r53766 | martin.v.loewis | 2007-02-13 11:10:39 +0100 (Tue, 13 Feb 2007) | 3 lines
  Patch #1517891: Make 'a' create the file if it doesn't exist.
  Fixes #1514451.
........
  r53767 | martin.v.loewis | 2007-02-13 13:08:24 +0100 (Tue, 13 Feb 2007) | 3 lines
  Bug #1658794: Remove extraneous 'this'.
  Will backport to 2.5.
........
  r53769 | martin.v.loewis | 2007-02-13 13:14:19 +0100 (Tue, 13 Feb 2007) | 3 lines
  Patch #1657276: Make NETLINK_DNRTMSG conditional.
  Will backport.
........
  r53771 | lars.gustaebel | 2007-02-13 17:09:24 +0100 (Tue, 13 Feb 2007) | 4 lines
  Patch #1647484: Renamed GzipFile's filename attribute to name. The
  filename attribute is still accessible as a property that emits a
  DeprecationWarning.
........
  r53772 | lars.gustaebel | 2007-02-13 17:24:00 +0100 (Tue, 13 Feb 2007) | 3 lines
  Strip the '.gz' extension from the filename that is written to the
  gzip header.
........
  r53774 | martin.v.loewis | 2007-02-14 11:07:37 +0100 (Wed, 14 Feb 2007) | 2 lines
  Patch #1432399: Add HCI sockets.
........
  r53775 | martin.v.loewis | 2007-02-14 12:30:07 +0100 (Wed, 14 Feb 2007) | 2 lines
  Update 1432399 to removal of _BT_SOCKADDR_MEMB.
........
  r53776 | martin.v.loewis | 2007-02-14 12:30:56 +0100 (Wed, 14 Feb 2007) | 3 lines
  Ignore directory time stamps when considering
  whether to rerun libffi configure.
........
  r53778 | lars.gustaebel | 2007-02-14 15:45:12 +0100 (Wed, 14 Feb 2007) | 4 lines
  A missing binary mode in AppendTest caused failures in Windows
  Buildbot.
........
  r53782 | martin.v.loewis | 2007-02-15 10:51:35 +0100 (Thu, 15 Feb 2007) | 2 lines
  Patch #1397848: add the reasoning behind no-resize-on-shrinkage.
........
  r53783 | georg.brandl | 2007-02-15 11:37:59 +0100 (Thu, 15 Feb 2007) | 2 lines
  Make functools.wraps() docs a bit clearer.
........
  r53785 | georg.brandl | 2007-02-15 12:29:04 +0100 (Thu, 15 Feb 2007) | 2 lines
  Patch #1494140: Add documentation for the new struct.Struct object.
........
  r53787 | georg.brandl | 2007-02-15 12:29:55 +0100 (Thu, 15 Feb 2007) | 2 lines
  Add missing \versionadded.
........
  r53800 | brett.cannon | 2007-02-15 23:54:39 +0100 (Thu, 15 Feb 2007) | 11 lines
  Update the encoding package's search function to use absolute imports when
  calling __import__.  This helps make the expected search locations for encoding
  modules be more explicit.
  One could use an explicit value for __path__ when making the call to __import__
  to force the exact location searched for encodings.  This would give the most
  strict search path possible if one is worried about malicious code being
  imported.  The unfortunate side-effect of that is that if __path__ was modified
  on 'encodings' on purpose in a safe way it would not be picked up in future
  __import__ calls.
........
  r53801 | brett.cannon | 2007-02-16 20:33:01 +0100 (Fri, 16 Feb 2007) | 2 lines
  Make the __import__ call in encodings.__init__ absolute with a level 0 call.
........
  r53809 | vinay.sajip | 2007-02-16 23:36:24 +0100 (Fri, 16 Feb 2007) | 1 line
  Minor fix for currentframe (SF #1652788).
........
  r53818 | raymond.hettinger | 2007-02-19 03:03:19 +0100 (Mon, 19 Feb 2007) | 3 lines
  Extend work on revision 52962:  Eliminate redundant calls to PyObject_Hash().
........
  r53820 | raymond.hettinger | 2007-02-19 05:08:43 +0100 (Mon, 19 Feb 2007) | 1 line
  Add merge() function to heapq.
........
  r53821 | raymond.hettinger | 2007-02-19 06:28:28 +0100 (Mon, 19 Feb 2007) | 1 line
  Add tie-breaker count to preserve sort stability.
........
  r53822 | raymond.hettinger | 2007-02-19 07:59:32 +0100 (Mon, 19 Feb 2007) | 1 line
  Use C heapreplace() instead of slower _siftup() in pure python.
........
  r53823 | raymond.hettinger | 2007-02-19 08:30:21 +0100 (Mon, 19 Feb 2007) | 1 line
  Add test for merge stability
........
  r53824 | raymond.hettinger | 2007-02-19 10:14:10 +0100 (Mon, 19 Feb 2007) | 1 line
  Provide an example of defaultdict with non-zero constant factory function.
........
  r53825 | lars.gustaebel | 2007-02-19 10:54:47 +0100 (Mon, 19 Feb 2007) | 2 lines
  Moved misplaced news item.
........
  r53826 | martin.v.loewis | 2007-02-19 11:55:19 +0100 (Mon, 19 Feb 2007) | 3 lines
  Patch #1490190: posixmodule now includes os.chflags() and os.lchflags()
  functions on platforms where the underlying system calls are available.
........
  r53827 | raymond.hettinger | 2007-02-19 19:15:04 +0100 (Mon, 19 Feb 2007) | 1 line
  Fixup docstrings for merge().
........
  r53829 | raymond.hettinger | 2007-02-19 21:44:04 +0100 (Mon, 19 Feb 2007) | 1 line
  Fixup set/dict interoperability.
........
  r53837 | raymond.hettinger | 2007-02-21 06:20:38 +0100 (Wed, 21 Feb 2007) | 1 line
  Add itertools.izip_longest().
........
  r53838 | raymond.hettinger | 2007-02-21 18:22:05 +0100 (Wed, 21 Feb 2007) | 1 line
  Remove filler struct item and fix leak.
........
		
	
			
		
			
				
	
	
		
			270 lines
		
	
	
	
		
			12 KiB
		
	
	
	
		
			Text
		
	
	
	
	
	
			
		
		
	
	
			270 lines
		
	
	
	
		
			12 KiB
		
	
	
	
		
			Text
		
	
	
	
	
	
NOTES ON OPTIMIZING DICTIONARIES
 | 
						|
================================
 | 
						|
 | 
						|
 | 
						|
Principal Use Cases for Dictionaries
 | 
						|
------------------------------------
 | 
						|
 | 
						|
Passing keyword arguments
 | 
						|
    Typically, one read and one write for 1 to 3 elements.
 | 
						|
    Occurs frequently in normal python code.
 | 
						|
 | 
						|
Class method lookup
 | 
						|
    Dictionaries vary in size with 8 to 16 elements being common.
 | 
						|
    Usually written once with many lookups.
 | 
						|
    When base classes are used, there are many failed lookups
 | 
						|
        followed by a lookup in a base class.
 | 
						|
 | 
						|
Instance attribute lookup and Global variables
 | 
						|
    Dictionaries vary in size.  4 to 10 elements are common.
 | 
						|
    Both reads and writes are common.
 | 
						|
 | 
						|
Builtins
 | 
						|
    Frequent reads.  Almost never written.
 | 
						|
    Size 126 interned strings (as of Py2.3b1).
 | 
						|
    A few keys are accessed much more frequently than others.
 | 
						|
 | 
						|
Uniquification
 | 
						|
    Dictionaries of any size.  Bulk of work is in creation.
 | 
						|
    Repeated writes to a smaller set of keys.
 | 
						|
    Single read of each key.
 | 
						|
    Some use cases have two consecutive accesses to the same key.
 | 
						|
 | 
						|
    * Removing duplicates from a sequence.
 | 
						|
        dict.fromkeys(seqn).keys()
 | 
						|
 | 
						|
    * Counting elements in a sequence.
 | 
						|
        for e in seqn:
 | 
						|
          d[e] = d.get(e,0) + 1
 | 
						|
 | 
						|
    * Accumulating references in a dictionary of lists:
 | 
						|
 | 
						|
        for pagenumber, page in enumerate(pages):
 | 
						|
          for word in page:
 | 
						|
            d.setdefault(word, []).append(pagenumber)
 | 
						|
 | 
						|
    Note, the second example is a use case characterized by a get and set
 | 
						|
    to the same key.  There are similar use cases with a __contains__
 | 
						|
    followed by a get, set, or del to the same key.  Part of the
 | 
						|
    justification for d.setdefault is combining the two lookups into one.
 | 
						|
 | 
						|
Membership Testing
 | 
						|
    Dictionaries of any size.  Created once and then rarely changes.
 | 
						|
    Single write to each key.
 | 
						|
    Many calls to __contains__() or has_key().
 | 
						|
    Similar access patterns occur with replacement dictionaries
 | 
						|
        such as with the % formatting operator.
 | 
						|
 | 
						|
Dynamic Mappings
 | 
						|
    Characterized by deletions interspersed with adds and replacements.
 | 
						|
    Performance benefits greatly from the re-use of dummy entries.
 | 
						|
 | 
						|
 | 
						|
Data Layout (assuming a 32-bit box with 64 bytes per cache line)
 | 
						|
----------------------------------------------------------------
 | 
						|
 | 
						|
Smalldicts (8 entries) are attached to the dictobject structure
 | 
						|
and the whole group nearly fills two consecutive cache lines.
 | 
						|
 | 
						|
Larger dicts use the first half of the dictobject structure (one cache
 | 
						|
line) and a separate, continuous block of entries (at 12 bytes each
 | 
						|
for a total of 5.333 entries per cache line).
 | 
						|
 | 
						|
 | 
						|
Tunable Dictionary Parameters
 | 
						|
-----------------------------
 | 
						|
 | 
						|
* PyDict_MINSIZE.  Currently set to 8.
 | 
						|
    Must be a power of two.  New dicts have to zero-out every cell.
 | 
						|
    Each additional 8 consumes 1.5 cache lines.  Increasing improves
 | 
						|
    the sparseness of small dictionaries but costs time to read in
 | 
						|
    the additional cache lines if they are not already in cache.
 | 
						|
    That case is common when keyword arguments are passed.
 | 
						|
 | 
						|
* Maximum dictionary load in PyDict_SetItem.  Currently set to 2/3.
 | 
						|
    Increasing this ratio makes dictionaries more dense resulting
 | 
						|
    in more collisions.  Decreasing it improves sparseness at the
 | 
						|
    expense of spreading entries over more cache lines and at the
 | 
						|
    cost of total memory consumed.
 | 
						|
 | 
						|
    The load test occurs in highly time sensitive code.  Efforts
 | 
						|
    to make the test more complex (for example, varying the load
 | 
						|
    for different sizes) have degraded performance.
 | 
						|
 | 
						|
* Growth rate upon hitting maximum load.  Currently set to *2.
 | 
						|
    Raising this to *4 results in half the number of resizes,
 | 
						|
    less effort to resize, better sparseness for some (but not
 | 
						|
    all dict sizes), and potentially doubles memory consumption
 | 
						|
    depending on the size of the dictionary.  Setting to *4
 | 
						|
    eliminates every other resize step.
 | 
						|
 | 
						|
* Maximum sparseness (minimum dictionary load).  What percentage
 | 
						|
    of entries can be unused before the dictionary shrinks to
 | 
						|
    free up memory and speed up iteration?  (The current CPython
 | 
						|
    code does not represent this parameter directly.)
 | 
						|
 | 
						|
* Shrinkage rate upon exceeding maximum sparseness.  The current
 | 
						|
    CPython code never even checks sparseness when deleting a
 | 
						|
    key.  When a new key is added, it resizes based on the number
 | 
						|
    of active keys, so that the addition may trigger shrinkage
 | 
						|
    rather than growth.
 | 
						|
 | 
						|
Tune-ups should be measured across a broad range of applications and
 | 
						|
use cases.  A change to any parameter will help in some situations and
 | 
						|
hurt in others.  The key is to find settings that help the most common
 | 
						|
cases and do the least damage to the less common cases.  Results will
 | 
						|
vary dramatically depending on the exact number of keys, whether the
 | 
						|
keys are all strings, whether reads or writes dominate, the exact
 | 
						|
hash values of the keys (some sets of values have fewer collisions than
 | 
						|
others).  Any one test or benchmark is likely to prove misleading.
 | 
						|
 | 
						|
While making a dictionary more sparse reduces collisions, it impairs
 | 
						|
iteration and key listing.  Those methods loop over every potential
 | 
						|
entry.  Doubling the size of dictionary results in twice as many
 | 
						|
non-overlapping memory accesses for keys(), items(), values(),
 | 
						|
__iter__(), iterkeys(), iteritems(), itervalues(), and update().
 | 
						|
Also, every dictionary iterates at least twice, once for the memset()
 | 
						|
when it is created and once by dealloc().
 | 
						|
 | 
						|
Dictionary operations involving only a single key can be O(1) unless 
 | 
						|
resizing is possible.  By checking for a resize only when the 
 | 
						|
dictionary can grow (and may *require* resizing), other operations
 | 
						|
remain O(1), and the odds of resize thrashing or memory fragmentation
 | 
						|
are reduced. In particular, an algorithm that empties a dictionary
 | 
						|
by repeatedly invoking .pop will see no resizing, which might
 | 
						|
not be necessary at all because the dictionary is eventually
 | 
						|
discarded entirely.
 | 
						|
 | 
						|
 | 
						|
Results of Cache Locality Experiments
 | 
						|
-------------------------------------
 | 
						|
 | 
						|
When an entry is retrieved from memory, 4.333 adjacent entries are also
 | 
						|
retrieved into a cache line.  Since accessing items in cache is *much*
 | 
						|
cheaper than a cache miss, an enticing idea is to probe the adjacent
 | 
						|
entries as a first step in collision resolution.  Unfortunately, the
 | 
						|
introduction of any regularity into collision searches results in more
 | 
						|
collisions than the current random chaining approach.
 | 
						|
 | 
						|
Exploiting cache locality at the expense of additional collisions fails
 | 
						|
to payoff when the entries are already loaded in cache (the expense
 | 
						|
is paid with no compensating benefit).  This occurs in small dictionaries
 | 
						|
where the whole dictionary fits into a pair of cache lines.  It also
 | 
						|
occurs frequently in large dictionaries which have a common access pattern
 | 
						|
where some keys are accessed much more frequently than others.  The
 | 
						|
more popular entries *and* their collision chains tend to remain in cache.
 | 
						|
 | 
						|
To exploit cache locality, change the collision resolution section
 | 
						|
in lookdict() and lookdict_string().  Set i^=1 at the top of the
 | 
						|
loop and move the  i = (i << 2) + i + perturb + 1 to an unrolled
 | 
						|
version of the loop.
 | 
						|
 | 
						|
This optimization strategy can be leveraged in several ways:
 | 
						|
 | 
						|
* If the dictionary is kept sparse (through the tunable parameters),
 | 
						|
then the occurrence of additional collisions is lessened.
 | 
						|
 | 
						|
* If lookdict() and lookdict_string() are specialized for small dicts
 | 
						|
and for largedicts, then the versions for large_dicts can be given
 | 
						|
an alternate search strategy without increasing collisions in small dicts
 | 
						|
which already have the maximum benefit of cache locality.
 | 
						|
 | 
						|
* If the use case for a dictionary is known to have a random key
 | 
						|
access pattern (as opposed to a more common pattern with a Zipf's law
 | 
						|
distribution), then there will be more benefit for large dictionaries
 | 
						|
because any given key is no more likely than another to already be
 | 
						|
in cache.
 | 
						|
 | 
						|
* In use cases with paired accesses to the same key, the second access
 | 
						|
is always in cache and gets no benefit from efforts to further improve
 | 
						|
cache locality.
 | 
						|
 | 
						|
Optimizing the Search of Small Dictionaries
 | 
						|
-------------------------------------------
 | 
						|
 | 
						|
If lookdict() and lookdict_string() are specialized for smaller dictionaries,
 | 
						|
then a custom search approach can be implemented that exploits the small
 | 
						|
search space and cache locality.
 | 
						|
 | 
						|
* The simplest example is a linear search of contiguous entries.  This is
 | 
						|
  simple to implement, guaranteed to terminate rapidly, never searches
 | 
						|
  the same entry twice, and precludes the need to check for dummy entries.
 | 
						|
 | 
						|
* A more advanced example is a self-organizing search so that the most
 | 
						|
  frequently accessed entries get probed first.  The organization
 | 
						|
  adapts if the access pattern changes over time.  Treaps are ideally
 | 
						|
  suited for self-organization with the most common entries at the
 | 
						|
  top of the heap and a rapid binary search pattern.  Most probes and
 | 
						|
  results are all located at the top of the tree allowing them all to
 | 
						|
  be located in one or two cache lines.
 | 
						|
 | 
						|
* Also, small dictionaries may be made more dense, perhaps filling all
 | 
						|
  eight cells to take the maximum advantage of two cache lines.
 | 
						|
 | 
						|
 | 
						|
Strategy Pattern
 | 
						|
----------------
 | 
						|
 | 
						|
Consider allowing the user to set the tunable parameters or to select a
 | 
						|
particular search method.  Since some dictionary use cases have known
 | 
						|
sizes and access patterns, the user may be able to provide useful hints.
 | 
						|
 | 
						|
1) For example, if membership testing or lookups dominate runtime and memory
 | 
						|
   is not at a premium, the user may benefit from setting the maximum load
 | 
						|
   ratio at 5% or 10% instead of the usual 66.7%.  This will sharply
 | 
						|
   curtail the number of collisions but will increase iteration time.
 | 
						|
   The builtin namespace is a prime example of a dictionary that can
 | 
						|
   benefit from being highly sparse.
 | 
						|
 | 
						|
2) Dictionary creation time can be shortened in cases where the ultimate
 | 
						|
   size of the dictionary is known in advance.  The dictionary can be
 | 
						|
   pre-sized so that no resize operations are required during creation.
 | 
						|
   Not only does this save resizes, but the key insertion will go
 | 
						|
   more quickly because the first half of the keys will be inserted into
 | 
						|
   a more sparse environment than before.  The preconditions for this
 | 
						|
   strategy arise whenever a dictionary is created from a key or item
 | 
						|
   sequence and the number of *unique* keys is known.
 | 
						|
 | 
						|
3) If the key space is large and the access pattern is known to be random,
 | 
						|
   then search strategies exploiting cache locality can be fruitful.
 | 
						|
   The preconditions for this strategy arise in simulations and
 | 
						|
   numerical analysis.
 | 
						|
 | 
						|
4) If the keys are fixed and the access pattern strongly favors some of
 | 
						|
   the keys, then the entries can be stored contiguously and accessed
 | 
						|
   with a linear search or treap.  This exploits knowledge of the data,
 | 
						|
   cache locality, and a simplified search routine.  It also eliminates
 | 
						|
   the need to test for dummy entries on each probe.  The preconditions
 | 
						|
   for this strategy arise in symbol tables and in the builtin dictionary.
 | 
						|
 | 
						|
 | 
						|
Readonly Dictionaries
 | 
						|
---------------------
 | 
						|
Some dictionary use cases pass through a build stage and then move to a
 | 
						|
more heavily exercised lookup stage with no further changes to the
 | 
						|
dictionary.
 | 
						|
 | 
						|
An idea that emerged on python-dev is to be able to convert a dictionary
 | 
						|
to a read-only state.  This can help prevent programming errors and also
 | 
						|
provide knowledge that can be exploited for lookup optimization.
 | 
						|
 | 
						|
The dictionary can be immediately rebuilt (eliminating dummy entries),
 | 
						|
resized (to an appropriate level of sparseness), and the keys can be
 | 
						|
jostled (to minimize collisions).  The lookdict() routine can then
 | 
						|
eliminate the test for dummy entries (saving about 1/4 of the time
 | 
						|
spent in the collision resolution loop).
 | 
						|
 | 
						|
An additional possibility is to insert links into the empty spaces
 | 
						|
so that dictionary iteration can proceed in len(d) steps instead of
 | 
						|
(mp->mask + 1) steps.  Alternatively, a separate tuple of keys can be
 | 
						|
kept just for iteration.
 | 
						|
 | 
						|
 | 
						|
Caching Lookups
 | 
						|
---------------
 | 
						|
The idea is to exploit key access patterns by anticipating future lookups
 | 
						|
based on previous lookups.
 | 
						|
 | 
						|
The simplest incarnation is to save the most recently accessed entry.
 | 
						|
This gives optimal performance for use cases where every get is followed
 | 
						|
by a set or del to the same key.
 |