Move the 2.6 reST doc tree in place.

2025-08-04 17:08:35 +00:00 · 2007-08-15 14:28:01 +00:00 · 2007-08-15 14:28:01 +00:00 · 8ec7f65613
commit 8ec7f65613
parent f56181ff53
445 changed files with 136056 additions and 0 deletions
--- a/Doc/library/string.rst
+++ b/Doc/library/string.rst
@ -0,0 +1,494 @@
+
+:mod:`string` --- Common string operations
+==========================================
+
+.. module:: string
+   :synopsis: Common string operations.
+
+
+.. index:: module: re
+
+The :mod:`string` module contains a number of useful constants and
+classes, as well as some deprecated legacy functions that are also
+available as methods on strings. In addition, Python's built-in string
+classes support the sequence type methods described in the
+:ref:`typesseq` section, and also the string-specific methods described
+in the :ref:`string-methods` section. To output formatted strings use
+template strings or the ``%`` operator described in the
+:ref:`string-formatting` section. Also, see the :mod:`re` module for
+string functions based on regular expressions.
+
+
+String constants
+----------------
+
+The constants defined in this module are:
+
+
+.. data:: ascii_letters
+
+   The concatenation of the :const:`ascii_lowercase` and :const:`ascii_uppercase`
+   constants described below.  This value is not locale-dependent.
+
+
+.. data:: ascii_lowercase
+
+   The lowercase letters ``'abcdefghijklmnopqrstuvwxyz'``.  This value is not
+   locale-dependent and will not change.
+
+
+.. data:: ascii_uppercase
+
+   The uppercase letters ``'ABCDEFGHIJKLMNOPQRSTUVWXYZ'``.  This value is not
+   locale-dependent and will not change.
+
+
+.. data:: digits
+
+   The string ``'0123456789'``.
+
+
+.. data:: hexdigits
+
+   The string ``'0123456789abcdefABCDEF'``.
+
+
+.. data:: letters
+
+   The concatenation of the strings :const:`lowercase` and :const:`uppercase`
+   described below.  The specific value is locale-dependent, and will be updated
+   when :func:`locale.setlocale` is called.
+
+
+.. data:: lowercase
+
+   A string containing all the characters that are considered lowercase letters.
+   On most systems this is the string ``'abcdefghijklmnopqrstuvwxyz'``.  Do not
+   change its definition --- the effect on the routines :func:`upper` and
+   :func:`swapcase` is undefined.  The specific value is locale-dependent, and will
+   be updated when :func:`locale.setlocale` is called.
+
+
+.. data:: octdigits
+
+   The string ``'01234567'``.
+
+
+.. data:: punctuation
+
+   String of ASCII characters which are considered punctuation characters in the
+   ``C`` locale.
+
+
+.. data:: printable
+
+   String of characters which are considered printable.  This is a combination of
+   :const:`digits`, :const:`letters`, :const:`punctuation`, and
+   :const:`whitespace`.
+
+
+.. data:: uppercase
+
+   A string containing all the characters that are considered uppercase letters.
+   On most systems this is the string ``'ABCDEFGHIJKLMNOPQRSTUVWXYZ'``.  Do not
+   change its definition --- the effect on the routines :func:`lower` and
+   :func:`swapcase` is undefined.  The specific value is locale-dependent, and will
+   be updated when :func:`locale.setlocale` is called.
+
+
+.. data:: whitespace
+
+   A string containing all characters that are considered whitespace. On most
+   systems this includes the characters space, tab, linefeed, return, formfeed, and
+   vertical tab.  Do not change its definition --- the effect on the routines
+   :func:`strip` and :func:`split` is undefined.
+
+
+Template strings
+----------------
+
+Templates provide simpler string substitutions as described in :pep:`292`.
+Instead of the normal ``%``\ -based substitutions, Templates support ``$``\
+-based substitutions, using the following rules:
+
+* ``$$`` is an escape; it is replaced with a single ``$``.
+
+* ``$identifier`` names a substitution placeholder matching a mapping key of
+  ``"identifier"``.  By default, ``"identifier"`` must spell a Python
+  identifier.  The first non-identifier character after the ``$`` character
+  terminates this placeholder specification.
+
+* ``${identifier}`` is equivalent to ``$identifier``.  It is required when valid
+  identifier characters follow the placeholder but are not part of the
+  placeholder, such as ``"${noun}ification"``.
+
+Any other appearance of ``$`` in the string will result in a :exc:`ValueError`
+being raised.
+
+.. versionadded:: 2.4
+
+The :mod:`string` module provides a :class:`Template` class that implements
+these rules.  The methods of :class:`Template` are:
+
+
+.. class:: Template(template)
+
+   The constructor takes a single argument which is the template string.
+
+
+.. method:: Template.substitute(mapping[, **kws])
+
+   Performs the template substitution, returning a new string.  *mapping* is any
+   dictionary-like object with keys that match the placeholders in the template.
+   Alternatively, you can provide keyword arguments, where the keywords are the
+   placeholders.  When both *mapping* and *kws* are given and there are duplicates,
+   the placeholders from *kws* take precedence.
+
+
+.. method:: Template.safe_substitute(mapping[, **kws])
+
+   Like :meth:`substitute`, except that if placeholders are missing from *mapping*
+   and *kws*, instead of raising a :exc:`KeyError` exception, the original
+   placeholder will appear in the resulting string intact.  Also, unlike with
+   :meth:`substitute`, any other appearances of the ``$`` will simply return ``$``
+   instead of raising :exc:`ValueError`.
+
+   While other exceptions may still occur, this method is called "safe" because
+   substitutions always tries to return a usable string instead of raising an
+   exception.  In another sense, :meth:`safe_substitute` may be anything other than
+   safe, since it will silently ignore malformed templates containing dangling
+   delimiters, unmatched braces, or placeholders that are not valid Python
+   identifiers.
+
+:class:`Template` instances also provide one public data attribute:
+
+
+.. attribute:: string.template
+
+   This is the object passed to the constructor's *template* argument.  In general,
+   you shouldn't change it, but read-only access is not enforced.
+
+Here is an example of how to use a Template::
+
+   >>> from string import Template
+   >>> s = Template('$who likes $what')
+   >>> s.substitute(who='tim', what='kung pao')
+   'tim likes kung pao'
+   >>> d = dict(who='tim')
+   >>> Template('Give $who $100').substitute(d)
+   Traceback (most recent call last):
+   [...]
+   ValueError: Invalid placeholder in string: line 1, col 10
+   >>> Template('$who likes $what').substitute(d)
+   Traceback (most recent call last):
+   [...]
+   KeyError: 'what'
+   >>> Template('$who likes $what').safe_substitute(d)
+   'tim likes $what'
+
+Advanced usage: you can derive subclasses of :class:`Template` to customize the
+placeholder syntax, delimiter character, or the entire regular expression used
+to parse template strings.  To do this, you can override these class attributes:
+
+* *delimiter* -- This is the literal string describing a placeholder introducing
+  delimiter.  The default value ``$``.  Note that this should *not* be a regular
+  expression, as the implementation will call :meth:`re.escape` on this string as
+  needed.
+
+* *idpattern* -- This is the regular expression describing the pattern for
+  non-braced placeholders (the braces will be added automatically as
+  appropriate).  The default value is the regular expression
+  ``[_a-z][_a-z0-9]*``.
+
+Alternatively, you can provide the entire regular expression pattern by
+overriding the class attribute *pattern*.  If you do this, the value must be a
+regular expression object with four named capturing groups.  The capturing
+groups correspond to the rules given above, along with the invalid placeholder
+rule:
+
+* *escaped* -- This group matches the escape sequence, e.g. ``$$``, in the
+  default pattern.
+
+* *named* -- This group matches the unbraced placeholder name; it should not
+  include the delimiter in capturing group.
+
+* *braced* -- This group matches the brace enclosed placeholder name; it should
+  not include either the delimiter or braces in the capturing group.
+
+* *invalid* -- This group matches any other delimiter pattern (usually a single
+  delimiter), and it should appear last in the regular expression.
+
+
+String functions
+----------------
+
+The following functions are available to operate on string and Unicode objects.
+They are not available as string methods.
+
+
+.. function:: capwords(s)
+
+   Split the argument into words using :func:`split`, capitalize each word using
+   :func:`capitalize`, and join the capitalized words using :func:`join`.  Note
+   that this replaces runs of whitespace characters by a single space, and removes
+   leading and trailing whitespace.
+
+
+.. function:: maketrans(from, to)
+
+   Return a translation table suitable for passing to :func:`translate`, that will
+   map each character in *from* into the character at the same position in *to*;
+   *from* and *to* must have the same length.
+
+   .. warning::
+
+      Don't use strings derived from :const:`lowercase` and :const:`uppercase` as
+      arguments; in some locales, these don't have the same length.  For case
+      conversions, always use :func:`lower` and :func:`upper`.
+
+
+Deprecated string functions
+---------------------------
+
+The following list of functions are also defined as methods of string and
+Unicode objects; see section :ref:`string-methods` for more information on
+those.  You should consider these functions as deprecated, although they will
+not be removed until Python 3.0.  The functions defined in this module are:
+
+
+.. function:: atof(s)
+
+   .. deprecated:: 2.0
+      Use the :func:`float` built-in function.
+
+   .. index:: builtin: float
+
+   Convert a string to a floating point number.  The string must have the standard
+   syntax for a floating point literal in Python, optionally preceded by a sign
+   (``+`` or ``-``).  Note that this behaves identical to the built-in function
+   :func:`float` when passed a string.
+
+   .. note::
+
+      .. index::
+         single: NaN
+         single: Infinity
+
+      When passing in a string, values for NaN and Infinity may be returned, depending
+      on the underlying C library.  The specific set of strings accepted which cause
+      these values to be returned depends entirely on the C library and is known to
+      vary.
+
+
+.. function:: atoi(s[, base])
+
+   .. deprecated:: 2.0
+      Use the :func:`int` built-in function.
+
+   .. index:: builtin: eval
+
+   Convert string *s* to an integer in the given *base*.  The string must consist
+   of one or more digits, optionally preceded by a sign (``+`` or ``-``).  The
+   *base* defaults to 10.  If it is 0, a default base is chosen depending on the
+   leading characters of the string (after stripping the sign): ``0x`` or ``0X``
+   means 16, ``0`` means 8, anything else means 10.  If *base* is 16, a leading
+   ``0x`` or ``0X`` is always accepted, though not required.  This behaves
+   identically to the built-in function :func:`int` when passed a string.  (Also
+   note: for a more flexible interpretation of numeric literals, use the built-in
+   function :func:`eval`.)
+
+
+.. function:: atol(s[, base])
+
+   .. deprecated:: 2.0
+      Use the :func:`long` built-in function.
+
+   .. index:: builtin: long
+
+   Convert string *s* to a long integer in the given *base*. The string must
+   consist of one or more digits, optionally preceded by a sign (``+`` or ``-``).
+   The *base* argument has the same meaning as for :func:`atoi`.  A trailing ``l``
+   or ``L`` is not allowed, except if the base is 0.  Note that when invoked
+   without *base* or with *base* set to 10, this behaves identical to the built-in
+   function :func:`long` when passed a string.
+
+
+.. function:: capitalize(word)
+
+   Return a copy of *word* with only its first character capitalized.
+
+
+.. function:: expandtabs(s[, tabsize])
+
+   Expand tabs in a string replacing them by one or more spaces, depending on the
+   current column and the given tab size.  The column number is reset to zero after
+   each newline occurring in the string. This doesn't understand other non-printing
+   characters or escape sequences.  The tab size defaults to 8.
+
+
+.. function:: find(s, sub[, start[,end]])
+
+   Return the lowest index in *s* where the substring *sub* is found such that
+   *sub* is wholly contained in ``s[start:end]``.  Return ``-1`` on failure.
+   Defaults for *start* and *end* and interpretation of negative values is the same
+   as for slices.
+
+
+.. function:: rfind(s, sub[, start[, end]])
+
+   Like :func:`find` but find the highest index.
+
+
+.. function:: index(s, sub[, start[, end]])
+
+   Like :func:`find` but raise :exc:`ValueError` when the substring is not found.
+
+
+.. function:: rindex(s, sub[, start[, end]])
+
+   Like :func:`rfind` but raise :exc:`ValueError` when the substring is not found.
+
+
+.. function:: count(s, sub[, start[, end]])
+
+   Return the number of (non-overlapping) occurrences of substring *sub* in string
+   ``s[start:end]``. Defaults for *start* and *end* and interpretation of negative
+   values are the same as for slices.
+
+
+.. function:: lower(s)
+
+   Return a copy of *s*, but with upper case letters converted to lower case.
+
+
+.. function:: split(s[, sep[, maxsplit]])
+
+   Return a list of the words of the string *s*.  If the optional second argument
+   *sep* is absent or ``None``, the words are separated by arbitrary strings of
+   whitespace characters (space, tab,  newline, return, formfeed).  If the second
+   argument *sep* is present and not ``None``, it specifies a string to be used as
+   the  word separator.  The returned list will then have one more item than the
+   number of non-overlapping occurrences of the separator in the string.  The
+   optional third argument *maxsplit* defaults to 0.  If it is nonzero, at most
+   *maxsplit* number of splits occur, and the remainder of the string is returned
+   as the final element of the list (thus, the list will have at most
+   ``maxsplit+1`` elements).
+
+   The behavior of split on an empty string depends on the value of *sep*. If *sep*
+   is not specified, or specified as ``None``, the result will be an empty list.
+   If *sep* is specified as any string, the result will be a list containing one
+   element which is an empty string.
+
+
+.. function:: rsplit(s[, sep[, maxsplit]])
+
+   Return a list of the words of the string *s*, scanning *s* from the end.  To all
+   intents and purposes, the resulting list of words is the same as returned by
+   :func:`split`, except when the optional third argument *maxsplit* is explicitly
+   specified and nonzero.  When *maxsplit* is nonzero, at most *maxsplit* number of
+   splits -- the *rightmost* ones -- occur, and the remainder of the string is
+   returned as the first element of the list (thus, the list will have at most
+   ``maxsplit+1`` elements).
+
+   .. versionadded:: 2.4
+
+
+.. function:: splitfields(s[, sep[, maxsplit]])
+
+   This function behaves identically to :func:`split`.  (In the past, :func:`split`
+   was only used with one argument, while :func:`splitfields` was only used with
+   two arguments.)
+
+
+.. function:: join(words[, sep])
+
+   Concatenate a list or tuple of words with intervening occurrences of  *sep*.
+   The default value for *sep* is a single space character.  It is always true that
+   ``string.join(string.split(s, sep), sep)`` equals *s*.
+
+
+.. function:: joinfields(words[, sep])
+
+   This function behaves identically to :func:`join`.  (In the past,  :func:`join`
+   was only used with one argument, while :func:`joinfields` was only used with two
+   arguments.) Note that there is no :meth:`joinfields` method on string objects;
+   use the :meth:`join` method instead.
+
+
+.. function:: lstrip(s[, chars])
+
+   Return a copy of the string with leading characters removed.  If *chars* is
+   omitted or ``None``, whitespace characters are removed.  If given and not
+   ``None``, *chars* must be a string; the characters in the string will be
+   stripped from the beginning of the string this method is called on.
+
+   .. versionchanged:: 2.2.3
+      The *chars* parameter was added.  The *chars* parameter cannot be passed in
+      earlier 2.2 versions.
+
+
+.. function:: rstrip(s[, chars])
+
+   Return a copy of the string with trailing characters removed.  If *chars* is
+   omitted or ``None``, whitespace characters are removed.  If given and not
+   ``None``, *chars* must be a string; the characters in the string will be
+   stripped from the end of the string this method is called on.
+
+   .. versionchanged:: 2.2.3
+      The *chars* parameter was added.  The *chars* parameter cannot be passed in
+      earlier 2.2 versions.
+
+
+.. function:: strip(s[, chars])
+
+   Return a copy of the string with leading and trailing characters removed.  If
+   *chars* is omitted or ``None``, whitespace characters are removed.  If given and
+   not ``None``, *chars* must be a string; the characters in the string will be
+   stripped from the both ends of the string this method is called on.
+
+   .. versionchanged:: 2.2.3
+      The *chars* parameter was added.  The *chars* parameter cannot be passed in
+      earlier 2.2 versions.
+
+
+.. function:: swapcase(s)
+
+   Return a copy of *s*, but with lower case letters converted to upper case and
+   vice versa.
+
+
+.. function:: translate(s, table[, deletechars])
+
+   Delete all characters from *s* that are in *deletechars* (if  present), and then
+   translate the characters using *table*, which  must be a 256-character string
+   giving the translation for each character value, indexed by its ordinal.  If
+   *table* is ``None``, then only the character deletion step is performed.
+
+
+.. function:: upper(s)
+
+   Return a copy of *s*, but with lower case letters converted to upper case.
+
+
+.. function:: ljust(s, width)
+              rjust(s, width)
+              center(s, width)
+
+   These functions respectively left-justify, right-justify and center a string in
+   a field of given width.  They return a string that is at least *width*
+   characters wide, created by padding the string *s* with spaces until the given
+   width on the right, left or both sides.  The string is never truncated.
+
+
+.. function:: zfill(s, width)
+
+   Pad a numeric string on the left with zero digits until the given width is
+   reached.  Strings starting with a sign are handled correctly.
+
+
+.. function:: replace(str, old, new[, maxreplace])
+
+   Return a copy of string *str* with all occurrences of substring *old* replaced
+   by *new*.  If the optional argument *maxreplace* is given, the first
+   *maxreplace* occurrences are replaced.
+