mirror of
				https://github.com/python/cpython.git
				synced 2025-10-26 16:27:06 +00:00 
			
		
		
		
	
		
			
				
	
	
		
			932 lines
		
	
	
	
		
			40 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
			
		
		
	
	
			932 lines
		
	
	
	
		
			40 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
| ======================
 | |
| Design and History FAQ
 | |
| ======================
 | |
| 
 | |
| Why does Python use indentation for grouping of statements?
 | |
| -----------------------------------------------------------
 | |
| 
 | |
| Guido van Rossum believes that using indentation for grouping is extremely
 | |
| elegant and contributes a lot to the clarity of the average Python program.
 | |
| Most people learn to love this feature after a while.
 | |
| 
 | |
| Since there are no begin/end brackets there cannot be a disagreement between
 | |
| grouping perceived by the parser and the human reader.  Occasionally C
 | |
| programmers will encounter a fragment of code like this::
 | |
| 
 | |
|    if (x <= y)
 | |
|            x++;
 | |
|            y--;
 | |
|    z++;
 | |
| 
 | |
| Only the ``x++`` statement is executed if the condition is true, but the
 | |
| indentation leads you to believe otherwise.  Even experienced C programmers will
 | |
| sometimes stare at it a long time wondering why ``y`` is being decremented even
 | |
| for ``x > y``.
 | |
| 
 | |
| Because there are no begin/end brackets, Python is much less prone to
 | |
| coding-style conflicts.  In C there are many different ways to place the braces.
 | |
| If you're used to reading and writing code that uses one style, you will feel at
 | |
| least slightly uneasy when reading (or being required to write) another style.
 | |
| 
 | |
| Many coding styles place begin/end brackets on a line by themself.  This makes
 | |
| programs considerably longer and wastes valuable screen space, making it harder
 | |
| to get a good overview of a program.  Ideally, a function should fit on one
 | |
| screen (say, 20-30 lines).  20 lines of Python can do a lot more work than 20
 | |
| lines of C.  This is not solely due to the lack of begin/end brackets -- the
 | |
| lack of declarations and the high-level data types are also responsible -- but
 | |
| the indentation-based syntax certainly helps.
 | |
| 
 | |
| 
 | |
| Why am I getting strange results with simple arithmetic operations?
 | |
| -------------------------------------------------------------------
 | |
| 
 | |
| See the next question.
 | |
| 
 | |
| 
 | |
| Why are floating point calculations so inaccurate?
 | |
| --------------------------------------------------
 | |
| 
 | |
| People are often very surprised by results like this::
 | |
| 
 | |
|    >>> 1.2 - 1.0
 | |
|    0.199999999999999996
 | |
| 
 | |
| and think it is a bug in Python. It's not.  This has nothing to do with Python,
 | |
| but with how the underlying C platform handles floating point numbers, and
 | |
| ultimately with the inaccuracies introduced when writing down numbers as a
 | |
| string of a fixed number of digits.
 | |
| 
 | |
| The internal representation of floating point numbers uses a fixed number of
 | |
| binary digits to represent a decimal number.  Some decimal numbers can't be
 | |
| represented exactly in binary, resulting in small roundoff errors.
 | |
| 
 | |
| In decimal math, there are many numbers that can't be represented with a fixed
 | |
| number of decimal digits, e.g.  1/3 = 0.3333333333.......
 | |
| 
 | |
| In base 2, 1/2 = 0.1, 1/4 = 0.01, 1/8 = 0.001, etc.  .2 equals 2/10 equals 1/5,
 | |
| resulting in the binary fractional number 0.001100110011001...
 | |
| 
 | |
| Floating point numbers only have 32 or 64 bits of precision, so the digits are
 | |
| cut off at some point, and the resulting number is 0.199999999999999996 in
 | |
| decimal, not 0.2.
 | |
| 
 | |
| A floating point number's ``repr()`` function prints as many digits are
 | |
| necessary to make ``eval(repr(f)) == f`` true for any float f.  The ``str()``
 | |
| function prints fewer digits and this often results in the more sensible number
 | |
| that was probably intended::
 | |
| 
 | |
|    >>> 1.1 - 0.9
 | |
|    0.20000000000000007
 | |
|    >>> print 1.1 - 0.9
 | |
|    0.2
 | |
| 
 | |
| One of the consequences of this is that it is error-prone to compare the result
 | |
| of some computation to a float with ``==``. Tiny inaccuracies may mean that
 | |
| ``==`` fails.  Instead, you have to check that the difference between the two
 | |
| numbers is less than a certain threshold::
 | |
| 
 | |
|    epsilon = 0.0000000000001  # Tiny allowed error
 | |
|    expected_result = 0.4
 | |
| 
 | |
|    if expected_result-epsilon <= computation() <= expected_result+epsilon:
 | |
|        ...
 | |
| 
 | |
| Please see the chapter on :ref:`floating point arithmetic <tut-fp-issues>` in
 | |
| the Python tutorial for more information.
 | |
| 
 | |
| 
 | |
| Why are Python strings immutable?
 | |
| ---------------------------------
 | |
| 
 | |
| There are several advantages.
 | |
| 
 | |
| One is performance: knowing that a string is immutable means we can allocate
 | |
| space for it at creation time, and the storage requirements are fixed and
 | |
| unchanging.  This is also one of the reasons for the distinction between tuples
 | |
| and lists.
 | |
| 
 | |
| Another advantage is that strings in Python are considered as "elemental" as
 | |
| numbers.  No amount of activity will change the value 8 to anything else, and in
 | |
| Python, no amount of activity will change the string "eight" to anything else.
 | |
| 
 | |
| 
 | |
| .. _why-self:
 | |
| 
 | |
| Why must 'self' be used explicitly in method definitions and calls?
 | |
| -------------------------------------------------------------------
 | |
| 
 | |
| The idea was borrowed from Modula-3.  It turns out to be very useful, for a
 | |
| variety of reasons.
 | |
| 
 | |
| First, it's more obvious that you are using a method or instance attribute
 | |
| instead of a local variable.  Reading ``self.x`` or ``self.meth()`` makes it
 | |
| absolutely clear that an instance variable or method is used even if you don't
 | |
| know the class definition by heart.  In C++, you can sort of tell by the lack of
 | |
| a local variable declaration (assuming globals are rare or easily recognizable)
 | |
| -- but in Python, there are no local variable declarations, so you'd have to
 | |
| look up the class definition to be sure.  Some C++ and Java coding standards
 | |
| call for instance attributes to have an ``m_`` prefix, so this explicitness is
 | |
| still useful in those languages, too.
 | |
| 
 | |
| Second, it means that no special syntax is necessary if you want to explicitly
 | |
| reference or call the method from a particular class.  In C++, if you want to
 | |
| use a method from a base class which is overridden in a derived class, you have
 | |
| to use the ``::`` operator -- in Python you can write
 | |
| ``baseclass.methodname(self, <argument list>)``.  This is particularly useful
 | |
| for :meth:`__init__` methods, and in general in cases where a derived class
 | |
| method wants to extend the base class method of the same name and thus has to
 | |
| call the base class method somehow.
 | |
| 
 | |
| Finally, for instance variables it solves a syntactic problem with assignment:
 | |
| since local variables in Python are (by definition!) those variables to which a
 | |
| value is assigned in a function body (and that aren't explicitly declared
 | |
| global), there has to be some way to tell the interpreter that an assignment was
 | |
| meant to assign to an instance variable instead of to a local variable, and it
 | |
| should preferably be syntactic (for efficiency reasons).  C++ does this through
 | |
| declarations, but Python doesn't have declarations and it would be a pity having
 | |
| to introduce them just for this purpose.  Using the explicit ``self.var`` solves
 | |
| this nicely.  Similarly, for using instance variables, having to write
 | |
| ``self.var`` means that references to unqualified names inside a method don't
 | |
| have to search the instance's directories.  To put it another way, local
 | |
| variables and instance variables live in two different namespaces, and you need
 | |
| to tell Python which namespace to use.
 | |
| 
 | |
| 
 | |
| Why can't I use an assignment in an expression?
 | |
| -----------------------------------------------
 | |
| 
 | |
| Many people used to C or Perl complain that they want to use this C idiom:
 | |
| 
 | |
| .. code-block:: c
 | |
| 
 | |
|    while (line = readline(f)) {
 | |
|        // do something with line
 | |
|    }
 | |
| 
 | |
| where in Python you're forced to write this::
 | |
| 
 | |
|    while True:
 | |
|        line = f.readline()
 | |
|        if not line:
 | |
|            break
 | |
|        ... # do something with line
 | |
| 
 | |
| The reason for not allowing assignment in Python expressions is a common,
 | |
| hard-to-find bug in those other languages, caused by this construct:
 | |
| 
 | |
| .. code-block:: c
 | |
| 
 | |
|     if (x = 0) {
 | |
|         // error handling
 | |
|     }
 | |
|     else {
 | |
|         // code that only works for nonzero x
 | |
|     }
 | |
| 
 | |
| The error is a simple typo: ``x = 0``, which assigns 0 to the variable ``x``,
 | |
| was written while the comparison ``x == 0`` is certainly what was intended.
 | |
| 
 | |
| Many alternatives have been proposed.  Most are hacks that save some typing but
 | |
| use arbitrary or cryptic syntax or keywords, and fail the simple criterion for
 | |
| language change proposals: it should intuitively suggest the proper meaning to a
 | |
| human reader who has not yet been introduced to the construct.
 | |
| 
 | |
| An interesting phenomenon is that most experienced Python programmers recognize
 | |
| the ``while True`` idiom and don't seem to be missing the assignment in
 | |
| expression construct much; it's only newcomers who express a strong desire to
 | |
| add this to the language.
 | |
| 
 | |
| There's an alternative way of spelling this that seems attractive but is
 | |
| generally less robust than the "while True" solution::
 | |
| 
 | |
|    line = f.readline()
 | |
|    while line:
 | |
|        ... # do something with line...
 | |
|        line = f.readline()
 | |
| 
 | |
| The problem with this is that if you change your mind about exactly how you get
 | |
| the next line (e.g. you want to change it into ``sys.stdin.readline()``) you
 | |
| have to remember to change two places in your program -- the second occurrence
 | |
| is hidden at the bottom of the loop.
 | |
| 
 | |
| The best approach is to use iterators, making it possible to loop through
 | |
| objects using the ``for`` statement.  For example, in the current version of
 | |
| Python file objects support the iterator protocol, so you can now write simply::
 | |
| 
 | |
|    for line in f:
 | |
|        ... # do something with line...
 | |
| 
 | |
| 
 | |
| 
 | |
| Why does Python use methods for some functionality (e.g. list.index()) but functions for other (e.g. len(list))?
 | |
| ----------------------------------------------------------------------------------------------------------------
 | |
| 
 | |
| The major reason is history. Functions were used for those operations that were
 | |
| generic for a group of types and which were intended to work even for objects
 | |
| that didn't have methods at all (e.g. tuples).  It is also convenient to have a
 | |
| function that can readily be applied to an amorphous collection of objects when
 | |
| you use the functional features of Python (``map()``, ``apply()`` et al).
 | |
| 
 | |
| In fact, implementing ``len()``, ``max()``, ``min()`` as a built-in function is
 | |
| actually less code than implementing them as methods for each type.  One can
 | |
| quibble about individual cases but it's a part of Python, and it's too late to
 | |
| make such fundamental changes now. The functions have to remain to avoid massive
 | |
| code breakage.
 | |
| 
 | |
| .. XXX talk about protocols?
 | |
| 
 | |
| .. note::
 | |
| 
 | |
|    For string operations, Python has moved from external functions (the
 | |
|    ``string`` module) to methods.  However, ``len()`` is still a function.
 | |
| 
 | |
| 
 | |
| Why is join() a string method instead of a list or tuple method?
 | |
| ----------------------------------------------------------------
 | |
| 
 | |
| Strings became much more like other standard types starting in Python 1.6, when
 | |
| methods were added which give the same functionality that has always been
 | |
| available using the functions of the string module.  Most of these new methods
 | |
| have been widely accepted, but the one which appears to make some programmers
 | |
| feel uncomfortable is::
 | |
| 
 | |
|    ", ".join(['1', '2', '4', '8', '16'])
 | |
| 
 | |
| which gives the result::
 | |
| 
 | |
|    "1, 2, 4, 8, 16"
 | |
| 
 | |
| There are two common arguments against this usage.
 | |
| 
 | |
| The first runs along the lines of: "It looks really ugly using a method of a
 | |
| string literal (string constant)", to which the answer is that it might, but a
 | |
| string literal is just a fixed value. If the methods are to be allowed on names
 | |
| bound to strings there is no logical reason to make them unavailable on
 | |
| literals.
 | |
| 
 | |
| The second objection is typically cast as: "I am really telling a sequence to
 | |
| join its members together with a string constant".  Sadly, you aren't.  For some
 | |
| reason there seems to be much less difficulty with having :meth:`~str.split` as
 | |
| a string method, since in that case it is easy to see that ::
 | |
| 
 | |
|    "1, 2, 4, 8, 16".split(", ")
 | |
| 
 | |
| is an instruction to a string literal to return the substrings delimited by the
 | |
| given separator (or, by default, arbitrary runs of white space).  In this case a
 | |
| Unicode string returns a list of Unicode strings, an ASCII string returns a list
 | |
| of ASCII strings, and everyone is happy.
 | |
| 
 | |
| :meth:`~str.join` is a string method because in using it you are telling the
 | |
| separator string to iterate over a sequence of strings and insert itself between
 | |
| adjacent elements.  This method can be used with any argument which obeys the
 | |
| rules for sequence objects, including any new classes you might define yourself.
 | |
| 
 | |
| Because this is a string method it can work for Unicode strings as well as plain
 | |
| ASCII strings.  If ``join()`` were a method of the sequence types then the
 | |
| sequence types would have to decide which type of string to return depending on
 | |
| the type of the separator.
 | |
| 
 | |
| .. XXX remove next paragraph eventually
 | |
| 
 | |
| If none of these arguments persuade you, then for the moment you can continue to
 | |
| use the ``join()`` function from the string module, which allows you to write ::
 | |
| 
 | |
|    string.join(['1', '2', '4', '8', '16'], ", ")
 | |
| 
 | |
| 
 | |
| How fast are exceptions?
 | |
| ------------------------
 | |
| 
 | |
| A try/except block is extremely efficient.  Actually catching an exception is
 | |
| expensive.  In versions of Python prior to 2.0 it was common to use this idiom::
 | |
| 
 | |
|    try:
 | |
|        value = mydict[key]
 | |
|    except KeyError:
 | |
|        mydict[key] = getvalue(key)
 | |
|        value = mydict[key]
 | |
| 
 | |
| This only made sense when you expected the dict to have the key almost all the
 | |
| time.  If that wasn't the case, you coded it like this::
 | |
| 
 | |
|    if mydict.has_key(key):
 | |
|        value = mydict[key]
 | |
|    else:
 | |
|        mydict[key] = getvalue(key)
 | |
|        value = mydict[key]
 | |
| 
 | |
| .. note::
 | |
| 
 | |
|    In Python 2.0 and higher, you can code this as ``value =
 | |
|    mydict.setdefault(key, getvalue(key))``.
 | |
| 
 | |
| 
 | |
| Why isn't there a switch or case statement in Python?
 | |
| -----------------------------------------------------
 | |
| 
 | |
| You can do this easily enough with a sequence of ``if... elif... elif... else``.
 | |
| There have been some proposals for switch statement syntax, but there is no
 | |
| consensus (yet) on whether and how to do range tests.  See :pep:`275` for
 | |
| complete details and the current status.
 | |
| 
 | |
| For cases where you need to choose from a very large number of possibilities,
 | |
| you can create a dictionary mapping case values to functions to call.  For
 | |
| example::
 | |
| 
 | |
|    def function_1(...):
 | |
|        ...
 | |
| 
 | |
|    functions = {'a': function_1,
 | |
|                 'b': function_2,
 | |
|                 'c': self.method_1, ...}
 | |
| 
 | |
|    func = functions[value]
 | |
|    func()
 | |
| 
 | |
| For calling methods on objects, you can simplify yet further by using the
 | |
| :func:`getattr` built-in to retrieve methods with a particular name::
 | |
| 
 | |
|    def visit_a(self, ...):
 | |
|        ...
 | |
|    ...
 | |
| 
 | |
|    def dispatch(self, value):
 | |
|        method_name = 'visit_' + str(value)
 | |
|        method = getattr(self, method_name)
 | |
|        method()
 | |
| 
 | |
| It's suggested that you use a prefix for the method names, such as ``visit_`` in
 | |
| this example.  Without such a prefix, if values are coming from an untrusted
 | |
| source, an attacker would be able to call any method on your object.
 | |
| 
 | |
| 
 | |
| Can't you emulate threads in the interpreter instead of relying on an OS-specific thread implementation?
 | |
| --------------------------------------------------------------------------------------------------------
 | |
| 
 | |
| Answer 1: Unfortunately, the interpreter pushes at least one C stack frame for
 | |
| each Python stack frame.  Also, extensions can call back into Python at almost
 | |
| random moments.  Therefore, a complete threads implementation requires thread
 | |
| support for C.
 | |
| 
 | |
| Answer 2: Fortunately, there is `Stackless Python <http://www.stackless.com>`_,
 | |
| which has a completely redesigned interpreter loop that avoids the C stack.
 | |
| It's still experimental but looks very promising.  Although it is binary
 | |
| compatible with standard Python, it's still unclear whether Stackless will make
 | |
| it into the core -- maybe it's just too revolutionary.
 | |
| 
 | |
| 
 | |
| Why can't lambda forms contain statements?
 | |
| ------------------------------------------
 | |
| 
 | |
| Python lambda forms cannot contain statements because Python's syntactic
 | |
| framework can't handle statements nested inside expressions.  However, in
 | |
| Python, this is not a serious problem.  Unlike lambda forms in other languages,
 | |
| where they add functionality, Python lambdas are only a shorthand notation if
 | |
| you're too lazy to define a function.
 | |
| 
 | |
| Functions are already first class objects in Python, and can be declared in a
 | |
| local scope.  Therefore the only advantage of using a lambda form instead of a
 | |
| locally-defined function is that you don't need to invent a name for the
 | |
| function -- but that's just a local variable to which the function object (which
 | |
| is exactly the same type of object that a lambda form yields) is assigned!
 | |
| 
 | |
| 
 | |
| Can Python be compiled to machine code, C or some other language?
 | |
| -----------------------------------------------------------------
 | |
| 
 | |
| Not easily.  Python's high level data types, dynamic typing of objects and
 | |
| run-time invocation of the interpreter (using :func:`eval` or :keyword:`exec`)
 | |
| together mean that a "compiled" Python program would probably consist mostly of
 | |
| calls into the Python run-time system, even for seemingly simple operations like
 | |
| ``x+1``.
 | |
| 
 | |
| Several projects described in the Python newsgroup or at past `Python
 | |
| conferences <http://python.org/community/workshops/>`_ have shown that this
 | |
| approach is feasible, although the speedups reached so far are only modest
 | |
| (e.g. 2x).  Jython uses the same strategy for compiling to Java bytecode.  (Jim
 | |
| Hugunin has demonstrated that in combination with whole-program analysis,
 | |
| speedups of 1000x are feasible for small demo programs.  See the proceedings
 | |
| from the `1997 Python conference
 | |
| <http://python.org/workshops/1997-10/proceedings/>`_ for more information.)
 | |
| 
 | |
| Internally, Python source code is always translated into a bytecode
 | |
| representation, and this bytecode is then executed by the Python virtual
 | |
| machine.  In order to avoid the overhead of repeatedly parsing and translating
 | |
| modules that rarely change, this byte code is written into a file whose name
 | |
| ends in ".pyc" whenever a module is parsed.  When the corresponding .py file is
 | |
| changed, it is parsed and translated again and the .pyc file is rewritten.
 | |
| 
 | |
| There is no performance difference once the .pyc file has been loaded, as the
 | |
| bytecode read from the .pyc file is exactly the same as the bytecode created by
 | |
| direct translation.  The only difference is that loading code from a .pyc file
 | |
| is faster than parsing and translating a .py file, so the presence of
 | |
| precompiled .pyc files improves the start-up time of Python scripts.  If
 | |
| desired, the Lib/compileall.py module can be used to create valid .pyc files for
 | |
| a given set of modules.
 | |
| 
 | |
| Note that the main script executed by Python, even if its filename ends in .py,
 | |
| is not compiled to a .pyc file.  It is compiled to bytecode, but the bytecode is
 | |
| not saved to a file.  Usually main scripts are quite short, so this doesn't cost
 | |
| much speed.
 | |
| 
 | |
| .. XXX check which of these projects are still alive
 | |
| 
 | |
| There are also several programs which make it easier to intermingle Python and C
 | |
| code in various ways to increase performance.  See, for example, `Psyco
 | |
| <http://psyco.sourceforge.net/>`_, `Pyrex
 | |
| <http://www.cosc.canterbury.ac.nz/~greg/python/Pyrex/>`_, `PyInline
 | |
| <http://pyinline.sourceforge.net/>`_, `Py2Cmod
 | |
| <http://sourceforge.net/projects/py2cmod/>`_, and `Weave
 | |
| <http://www.scipy.org/Weave>`_.
 | |
| 
 | |
| 
 | |
| How does Python manage memory?
 | |
| ------------------------------
 | |
| 
 | |
| The details of Python memory management depend on the implementation.  The
 | |
| standard C implementation of Python uses reference counting to detect
 | |
| inaccessible objects, and another mechanism to collect reference cycles,
 | |
| periodically executing a cycle detection algorithm which looks for inaccessible
 | |
| cycles and deletes the objects involved. The :mod:`gc` module provides functions
 | |
| to perform a garbage collection, obtain debugging statistics, and tune the
 | |
| collector's parameters.
 | |
| 
 | |
| Jython relies on the Java runtime so the JVM's garbage collector is used.  This
 | |
| difference can cause some subtle porting problems if your Python code depends on
 | |
| the behavior of the reference counting implementation.
 | |
| 
 | |
| .. XXX relevant for Python 2.6?
 | |
| 
 | |
| Sometimes objects get stuck in tracebacks temporarily and hence are not
 | |
| deallocated when you might expect.  Clear the tracebacks with::
 | |
| 
 | |
|    import sys
 | |
|    sys.exc_clear()
 | |
|    sys.exc_traceback = sys.last_traceback = None
 | |
| 
 | |
| Tracebacks are used for reporting errors, implementing debuggers and related
 | |
| things.  They contain a portion of the program state extracted during the
 | |
| handling of an exception (usually the most recent exception).
 | |
| 
 | |
| In the absence of circularities and tracebacks, Python programs do not need to
 | |
| manage memory explicitly.
 | |
| 
 | |
| Why doesn't Python use a more traditional garbage collection scheme?  For one
 | |
| thing, this is not a C standard feature and hence it's not portable.  (Yes, we
 | |
| know about the Boehm GC library.  It has bits of assembler code for *most*
 | |
| common platforms, not for all of them, and although it is mostly transparent, it
 | |
| isn't completely transparent; patches are required to get Python to work with
 | |
| it.)
 | |
| 
 | |
| Traditional GC also becomes a problem when Python is embedded into other
 | |
| applications.  While in a standalone Python it's fine to replace the standard
 | |
| malloc() and free() with versions provided by the GC library, an application
 | |
| embedding Python may want to have its *own* substitute for malloc() and free(),
 | |
| and may not want Python's.  Right now, Python works with anything that
 | |
| implements malloc() and free() properly.
 | |
| 
 | |
| In Jython, the following code (which is fine in CPython) will probably run out
 | |
| of file descriptors long before it runs out of memory::
 | |
| 
 | |
|    for file in very_long_list_of_files:
 | |
|        f = open(file)
 | |
|        c = f.read(1)
 | |
| 
 | |
| Using the current reference counting and destructor scheme, each new assignment
 | |
| to f closes the previous file.  Using GC, this is not guaranteed.  If you want
 | |
| to write code that will work with any Python implementation, you should
 | |
| explicitly close the file or use the :keyword:`with` statement; this will work
 | |
| regardless of GC::
 | |
| 
 | |
|    for file in very_long_list_of_files:
 | |
|        with open(file) as f:
 | |
|            c = f.read(1)
 | |
| 
 | |
| 
 | |
| Why isn't all memory freed when Python exits?
 | |
| ---------------------------------------------
 | |
| 
 | |
| Objects referenced from the global namespaces of Python modules are not always
 | |
| deallocated when Python exits.  This may happen if there are circular
 | |
| references.  There are also certain bits of memory that are allocated by the C
 | |
| library that are impossible to free (e.g. a tool like Purify will complain about
 | |
| these).  Python is, however, aggressive about cleaning up memory on exit and
 | |
| does try to destroy every single object.
 | |
| 
 | |
| If you want to force Python to delete certain things on deallocation use the
 | |
| :mod:`atexit` module to run a function that will force those deletions.
 | |
| 
 | |
| 
 | |
| Why are there separate tuple and list data types?
 | |
| -------------------------------------------------
 | |
| 
 | |
| Lists and tuples, while similar in many respects, are generally used in
 | |
| fundamentally different ways.  Tuples can be thought of as being similar to
 | |
| Pascal records or C structs; they're small collections of related data which may
 | |
| be of different types which are operated on as a group.  For example, a
 | |
| Cartesian coordinate is appropriately represented as a tuple of two or three
 | |
| numbers.
 | |
| 
 | |
| Lists, on the other hand, are more like arrays in other languages.  They tend to
 | |
| hold a varying number of objects all of which have the same type and which are
 | |
| operated on one-by-one.  For example, ``os.listdir('.')`` returns a list of
 | |
| strings representing the files in the current directory.  Functions which
 | |
| operate on this output would generally not break if you added another file or
 | |
| two to the directory.
 | |
| 
 | |
| Tuples are immutable, meaning that once a tuple has been created, you can't
 | |
| replace any of its elements with a new value.  Lists are mutable, meaning that
 | |
| you can always change a list's elements.  Only immutable elements can be used as
 | |
| dictionary keys, and hence only tuples and not lists can be used as keys.
 | |
| 
 | |
| 
 | |
| How are lists implemented?
 | |
| --------------------------
 | |
| 
 | |
| Python's lists are really variable-length arrays, not Lisp-style linked lists.
 | |
| The implementation uses a contiguous array of references to other objects, and
 | |
| keeps a pointer to this array and the array's length in a list head structure.
 | |
| 
 | |
| This makes indexing a list ``a[i]`` an operation whose cost is independent of
 | |
| the size of the list or the value of the index.
 | |
| 
 | |
| When items are appended or inserted, the array of references is resized.  Some
 | |
| cleverness is applied to improve the performance of appending items repeatedly;
 | |
| when the array must be grown, some extra space is allocated so the next few
 | |
| times don't require an actual resize.
 | |
| 
 | |
| 
 | |
| How are dictionaries implemented?
 | |
| ---------------------------------
 | |
| 
 | |
| Python's dictionaries are implemented as resizable hash tables.  Compared to
 | |
| B-trees, this gives better performance for lookup (the most common operation by
 | |
| far) under most circumstances, and the implementation is simpler.
 | |
| 
 | |
| Dictionaries work by computing a hash code for each key stored in the dictionary
 | |
| using the :func:`hash` built-in function.  The hash code varies widely depending
 | |
| on the key; for example, "Python" hashes to -539294296 while "python", a string
 | |
| that differs by a single bit, hashes to 1142331976.  The hash code is then used
 | |
| to calculate a location in an internal array where the value will be stored.
 | |
| Assuming that you're storing keys that all have different hash values, this
 | |
| means that dictionaries take constant time -- O(1), in computer science notation
 | |
| -- to retrieve a key.  It also means that no sorted order of the keys is
 | |
| maintained, and traversing the array as the ``.keys()`` and ``.items()`` do will
 | |
| output the dictionary's content in some arbitrary jumbled order.
 | |
| 
 | |
| 
 | |
| Why must dictionary keys be immutable?
 | |
| --------------------------------------
 | |
| 
 | |
| The hash table implementation of dictionaries uses a hash value calculated from
 | |
| the key value to find the key.  If the key were a mutable object, its value
 | |
| could change, and thus its hash could also change.  But since whoever changes
 | |
| the key object can't tell that it was being used as a dictionary key, it can't
 | |
| move the entry around in the dictionary.  Then, when you try to look up the same
 | |
| object in the dictionary it won't be found because its hash value is different.
 | |
| If you tried to look up the old value it wouldn't be found either, because the
 | |
| value of the object found in that hash bin would be different.
 | |
| 
 | |
| If you want a dictionary indexed with a list, simply convert the list to a tuple
 | |
| first; the function ``tuple(L)`` creates a tuple with the same entries as the
 | |
| list ``L``.  Tuples are immutable and can therefore be used as dictionary keys.
 | |
| 
 | |
| Some unacceptable solutions that have been proposed:
 | |
| 
 | |
| - Hash lists by their address (object ID).  This doesn't work because if you
 | |
|   construct a new list with the same value it won't be found; e.g.::
 | |
| 
 | |
|      mydict = {[1, 2]: '12'}
 | |
|      print mydict[[1, 2]]
 | |
| 
 | |
|   would raise a KeyError exception because the id of the ``[1, 2]`` used in the
 | |
|   second line differs from that in the first line.  In other words, dictionary
 | |
|   keys should be compared using ``==``, not using :keyword:`is`.
 | |
| 
 | |
| - Make a copy when using a list as a key.  This doesn't work because the list,
 | |
|   being a mutable object, could contain a reference to itself, and then the
 | |
|   copying code would run into an infinite loop.
 | |
| 
 | |
| - Allow lists as keys but tell the user not to modify them.  This would allow a
 | |
|   class of hard-to-track bugs in programs when you forgot or modified a list by
 | |
|   accident. It also invalidates an important invariant of dictionaries: every
 | |
|   value in ``d.keys()`` is usable as a key of the dictionary.
 | |
| 
 | |
| - Mark lists as read-only once they are used as a dictionary key.  The problem
 | |
|   is that it's not just the top-level object that could change its value; you
 | |
|   could use a tuple containing a list as a key.  Entering anything as a key into
 | |
|   a dictionary would require marking all objects reachable from there as
 | |
|   read-only -- and again, self-referential objects could cause an infinite loop.
 | |
| 
 | |
| There is a trick to get around this if you need to, but use it at your own risk:
 | |
| You can wrap a mutable structure inside a class instance which has both a
 | |
| :meth:`__eq__` and a :meth:`__hash__` method.  You must then make sure that the
 | |
| hash value for all such wrapper objects that reside in a dictionary (or other
 | |
| hash based structure), remain fixed while the object is in the dictionary (or
 | |
| other structure). ::
 | |
| 
 | |
|    class ListWrapper:
 | |
|        def __init__(self, the_list):
 | |
|            self.the_list = the_list
 | |
|        def __eq__(self, other):
 | |
|            return self.the_list == other.the_list
 | |
|        def __hash__(self):
 | |
|            l = self.the_list
 | |
|            result = 98767 - len(l)*555
 | |
|            for i, el in enumerate(l):
 | |
|                try:
 | |
|                    result = result + (hash(el) % 9999999) * 1001 + i
 | |
|                except Exception:
 | |
|                    result = (result % 7777777) + i * 333
 | |
|            return result
 | |
| 
 | |
| Note that the hash computation is complicated by the possibility that some
 | |
| members of the list may be unhashable and also by the possibility of arithmetic
 | |
| overflow.
 | |
| 
 | |
| Furthermore it must always be the case that if ``o1 == o2`` (ie ``o1.__eq__(o2)
 | |
| is True``) then ``hash(o1) == hash(o2)`` (ie, ``o1.__hash__() == o2.__hash__()``),
 | |
| regardless of whether the object is in a dictionary or not.  If you fail to meet
 | |
| these restrictions dictionaries and other hash based structures will misbehave.
 | |
| 
 | |
| In the case of ListWrapper, whenever the wrapper object is in a dictionary the
 | |
| wrapped list must not change to avoid anomalies.  Don't do this unless you are
 | |
| prepared to think hard about the requirements and the consequences of not
 | |
| meeting them correctly.  Consider yourself warned.
 | |
| 
 | |
| 
 | |
| Why doesn't list.sort() return the sorted list?
 | |
| -----------------------------------------------
 | |
| 
 | |
| In situations where performance matters, making a copy of the list just to sort
 | |
| it would be wasteful. Therefore, :meth:`list.sort` sorts the list in place. In
 | |
| order to remind you of that fact, it does not return the sorted list.  This way,
 | |
| you won't be fooled into accidentally overwriting a list when you need a sorted
 | |
| copy but also need to keep the unsorted version around.
 | |
| 
 | |
| In Python 2.4 a new built-in function -- :func:`sorted` -- has been added.
 | |
| This function creates a new list from a provided iterable, sorts it and returns
 | |
| it.  For example, here's how to iterate over the keys of a dictionary in sorted
 | |
| order::
 | |
| 
 | |
|    for key in sorted(mydict):
 | |
|        ... # do whatever with mydict[key]...
 | |
| 
 | |
| 
 | |
| How do you specify and enforce an interface spec in Python?
 | |
| -----------------------------------------------------------
 | |
| 
 | |
| An interface specification for a module as provided by languages such as C++ and
 | |
| Java describes the prototypes for the methods and functions of the module.  Many
 | |
| feel that compile-time enforcement of interface specifications helps in the
 | |
| construction of large programs.
 | |
| 
 | |
| Python 2.6 adds an :mod:`abc` module that lets you define Abstract Base Classes
 | |
| (ABCs).  You can then use :func:`isinstance` and :func:`issubclass` to check
 | |
| whether an instance or a class implements a particular ABC.  The
 | |
| :mod:`collections` modules defines a set of useful ABCs such as
 | |
| :class:`Iterable`, :class:`Container`, and :class:`MutableMapping`.
 | |
| 
 | |
| For Python, many of the advantages of interface specifications can be obtained
 | |
| by an appropriate test discipline for components.  There is also a tool,
 | |
| PyChecker, which can be used to find problems due to subclassing.
 | |
| 
 | |
| A good test suite for a module can both provide a regression test and serve as a
 | |
| module interface specification and a set of examples.  Many Python modules can
 | |
| be run as a script to provide a simple "self test."  Even modules which use
 | |
| complex external interfaces can often be tested in isolation using trivial
 | |
| "stub" emulations of the external interface.  The :mod:`doctest` and
 | |
| :mod:`unittest` modules or third-party test frameworks can be used to construct
 | |
| exhaustive test suites that exercise every line of code in a module.
 | |
| 
 | |
| An appropriate testing discipline can help build large complex applications in
 | |
| Python as well as having interface specifications would.  In fact, it can be
 | |
| better because an interface specification cannot test certain properties of a
 | |
| program.  For example, the :meth:`append` method is expected to add new elements
 | |
| to the end of some internal list; an interface specification cannot test that
 | |
| your :meth:`append` implementation will actually do this correctly, but it's
 | |
| trivial to check this property in a test suite.
 | |
| 
 | |
| Writing test suites is very helpful, and you might want to design your code with
 | |
| an eye to making it easily tested.  One increasingly popular technique,
 | |
| test-directed development, calls for writing parts of the test suite first,
 | |
| before you write any of the actual code.  Of course Python allows you to be
 | |
| sloppy and not write test cases at all.
 | |
| 
 | |
| 
 | |
| Why are default values shared between objects?
 | |
| ----------------------------------------------
 | |
| 
 | |
| This type of bug commonly bites neophyte programmers.  Consider this function::
 | |
| 
 | |
|    def foo(mydict={}):  # Danger: shared reference to one dict for all calls
 | |
|        ... compute something ...
 | |
|        mydict[key] = value
 | |
|        return mydict
 | |
| 
 | |
| The first time you call this function, ``mydict`` contains a single item.  The
 | |
| second time, ``mydict`` contains two items because when ``foo()`` begins
 | |
| executing, ``mydict`` starts out with an item already in it.
 | |
| 
 | |
| It is often expected that a function call creates new objects for default
 | |
| values. This is not what happens. Default values are created exactly once, when
 | |
| the function is defined.  If that object is changed, like the dictionary in this
 | |
| example, subsequent calls to the function will refer to this changed object.
 | |
| 
 | |
| By definition, immutable objects such as numbers, strings, tuples, and ``None``,
 | |
| are safe from change. Changes to mutable objects such as dictionaries, lists,
 | |
| and class instances can lead to confusion.
 | |
| 
 | |
| Because of this feature, it is good programming practice to not use mutable
 | |
| objects as default values.  Instead, use ``None`` as the default value and
 | |
| inside the function, check if the parameter is ``None`` and create a new
 | |
| list/dictionary/whatever if it is.  For example, don't write::
 | |
| 
 | |
|    def foo(mydict={}):
 | |
|        ...
 | |
| 
 | |
| but::
 | |
| 
 | |
|    def foo(mydict=None):
 | |
|        if mydict is None:
 | |
|            mydict = {}  # create a new dict for local namespace
 | |
| 
 | |
| This feature can be useful.  When you have a function that's time-consuming to
 | |
| compute, a common technique is to cache the parameters and the resulting value
 | |
| of each call to the function, and return the cached value if the same value is
 | |
| requested again.  This is called "memoizing", and can be implemented like this::
 | |
| 
 | |
|    # Callers will never provide a third parameter for this function.
 | |
|    def expensive (arg1, arg2, _cache={}):
 | |
|        if (arg1, arg2) in _cache:
 | |
|            return _cache[(arg1, arg2)]
 | |
| 
 | |
|        # Calculate the value
 | |
|        result = ... expensive computation ...
 | |
|        _cache[(arg1, arg2)] = result           # Store result in the cache
 | |
|        return result
 | |
| 
 | |
| You could use a global variable containing a dictionary instead of the default
 | |
| value; it's a matter of taste.
 | |
| 
 | |
| 
 | |
| Why is there no goto?
 | |
| ---------------------
 | |
| 
 | |
| You can use exceptions to provide a "structured goto" that even works across
 | |
| function calls.  Many feel that exceptions can conveniently emulate all
 | |
| reasonable uses of the "go" or "goto" constructs of C, Fortran, and other
 | |
| languages.  For example::
 | |
| 
 | |
|    class label: pass  # declare a label
 | |
| 
 | |
|    try:
 | |
|         ...
 | |
|         if (condition): raise label()  # goto label
 | |
|         ...
 | |
|    except label:  # where to goto
 | |
|         pass
 | |
|    ...
 | |
| 
 | |
| This doesn't allow you to jump into the middle of a loop, but that's usually
 | |
| considered an abuse of goto anyway.  Use sparingly.
 | |
| 
 | |
| 
 | |
| Why can't raw strings (r-strings) end with a backslash?
 | |
| -------------------------------------------------------
 | |
| 
 | |
| More precisely, they can't end with an odd number of backslashes: the unpaired
 | |
| backslash at the end escapes the closing quote character, leaving an
 | |
| unterminated string.
 | |
| 
 | |
| Raw strings were designed to ease creating input for processors (chiefly regular
 | |
| expression engines) that want to do their own backslash escape processing. Such
 | |
| processors consider an unmatched trailing backslash to be an error anyway, so
 | |
| raw strings disallow that.  In return, they allow you to pass on the string
 | |
| quote character by escaping it with a backslash.  These rules work well when
 | |
| r-strings are used for their intended purpose.
 | |
| 
 | |
| If you're trying to build Windows pathnames, note that all Windows system calls
 | |
| accept forward slashes too::
 | |
| 
 | |
|    f = open("/mydir/file.txt")  # works fine!
 | |
| 
 | |
| If you're trying to build a pathname for a DOS command, try e.g. one of ::
 | |
| 
 | |
|    dir = r"\this\is\my\dos\dir" "\\"
 | |
|    dir = r"\this\is\my\dos\dir\ "[:-1]
 | |
|    dir = "\\this\\is\\my\\dos\\dir\\"
 | |
| 
 | |
| 
 | |
| Why doesn't Python have a "with" statement for attribute assignments?
 | |
| ---------------------------------------------------------------------
 | |
| 
 | |
| Python has a 'with' statement that wraps the execution of a block, calling code
 | |
| on the entrance and exit from the block.  Some language have a construct that
 | |
| looks like this::
 | |
| 
 | |
|    with obj:
 | |
|        a = 1               # equivalent to obj.a = 1
 | |
|        total = total + 1   # obj.total = obj.total + 1
 | |
| 
 | |
| In Python, such a construct would be ambiguous.
 | |
| 
 | |
| Other languages, such as Object Pascal, Delphi, and C++, use static types, so
 | |
| it's possible to know, in an unambiguous way, what member is being assigned
 | |
| to. This is the main point of static typing -- the compiler *always* knows the
 | |
| scope of every variable at compile time.
 | |
| 
 | |
| Python uses dynamic types. It is impossible to know in advance which attribute
 | |
| will be referenced at runtime. Member attributes may be added or removed from
 | |
| objects on the fly. This makes it impossible to know, from a simple reading,
 | |
| what attribute is being referenced: a local one, a global one, or a member
 | |
| attribute?
 | |
| 
 | |
| For instance, take the following incomplete snippet::
 | |
| 
 | |
|    def foo(a):
 | |
|        with a:
 | |
|            print x
 | |
| 
 | |
| The snippet assumes that "a" must have a member attribute called "x".  However,
 | |
| there is nothing in Python that tells the interpreter this. What should happen
 | |
| if "a" is, let us say, an integer?  If there is a global variable named "x",
 | |
| will it be used inside the with block?  As you see, the dynamic nature of Python
 | |
| makes such choices much harder.
 | |
| 
 | |
| The primary benefit of "with" and similar language features (reduction of code
 | |
| volume) can, however, easily be achieved in Python by assignment.  Instead of::
 | |
| 
 | |
|    function(args).mydict[index][index].a = 21
 | |
|    function(args).mydict[index][index].b = 42
 | |
|    function(args).mydict[index][index].c = 63
 | |
| 
 | |
| write this::
 | |
| 
 | |
|    ref = function(args).mydict[index][index]
 | |
|    ref.a = 21
 | |
|    ref.b = 42
 | |
|    ref.c = 63
 | |
| 
 | |
| This also has the side-effect of increasing execution speed because name
 | |
| bindings are resolved at run-time in Python, and the second version only needs
 | |
| to perform the resolution once.
 | |
| 
 | |
| 
 | |
| Why are colons required for the if/while/def/class statements?
 | |
| --------------------------------------------------------------
 | |
| 
 | |
| The colon is required primarily to enhance readability (one of the results of
 | |
| the experimental ABC language).  Consider this::
 | |
| 
 | |
|    if a == b
 | |
|        print a
 | |
| 
 | |
| versus ::
 | |
| 
 | |
|    if a == b:
 | |
|        print a
 | |
| 
 | |
| Notice how the second one is slightly easier to read.  Notice further how a
 | |
| colon sets off the example in this FAQ answer; it's a standard usage in English.
 | |
| 
 | |
| Another minor reason is that the colon makes it easier for editors with syntax
 | |
| highlighting; they can look for colons to decide when indentation needs to be
 | |
| increased instead of having to do a more elaborate parsing of the program text.
 | |
| 
 | |
| 
 | |
| Why does Python allow commas at the end of lists and tuples?
 | |
| ------------------------------------------------------------
 | |
| 
 | |
| Python lets you add a trailing comma at the end of lists, tuples, and
 | |
| dictionaries::
 | |
| 
 | |
|    [1, 2, 3,]
 | |
|    ('a', 'b', 'c',)
 | |
|    d = {
 | |
|        "A": [1, 5],
 | |
|        "B": [6, 7],  # last trailing comma is optional but good style
 | |
|    }
 | |
| 
 | |
| 
 | |
| There are several reasons to allow this.
 | |
| 
 | |
| When you have a literal value for a list, tuple, or dictionary spread across
 | |
| multiple lines, it's easier to add more elements because you don't have to
 | |
| remember to add a comma to the previous line.  The lines can also be sorted in
 | |
| your editor without creating a syntax error.
 | |
| 
 | |
| Accidentally omitting the comma can lead to errors that are hard to diagnose.
 | |
| For example::
 | |
| 
 | |
|        x = [
 | |
|          "fee",
 | |
|          "fie"
 | |
|          "foo",
 | |
|          "fum"
 | |
|        ]
 | |
| 
 | |
| This list looks like it has four elements, but it actually contains three:
 | |
| "fee", "fiefoo" and "fum".  Always adding the comma avoids this source of error.
 | |
| 
 | |
| Allowing the trailing comma may also make programmatic code generation easier.
 | 
