mirror of
https://github.com/python/cpython.git
synced 2025-11-03 11:23:31 +00:00
gh-99146 struct module documentation should have more predictable examples/warnings (GH-99141)
* nail down a couple examples to have more predictable output * update a number of things, but this is really just a stash... * added an applications section to describe typical uses for native and machine-independent formats * make sure all format strings use a format prefix character * responding to comments from @gpshead. Not likely finished yet. * This got more involved than I expected... * respond to several PR comments * a lot of wordsmithing * try and be more consistent in use of ``x`` vs ``'x'`` * expand examples a bit * update the "see also" to be more up-to-date * original examples relied on import * so present all examples as if * reformat based on @gpshead comment (missed before) * responding to comments * missed this * one more suggested edit * wordsmithing
This commit is contained in:
parent
5d41833cc0
commit
22d91c16bb
1 changed files with 204 additions and 74 deletions
|
|
@ -12,21 +12,25 @@
|
||||||
|
|
||||||
--------------
|
--------------
|
||||||
|
|
||||||
This module performs conversions between Python values and C structs represented
|
This module converts between Python values and C structs represented
|
||||||
as Python :class:`bytes` objects. This can be used in handling binary data
|
as Python :class:`bytes` objects. Compact :ref:`format strings <struct-format-strings>`
|
||||||
stored in files or from network connections, among other sources. It uses
|
describe the intended conversions to/from Python values.
|
||||||
:ref:`struct-format-strings` as compact descriptions of the layout of the C
|
The module's functions and objects can be used for two largely
|
||||||
structs and the intended conversion to/from Python values.
|
distinct applications, data exchange with external sources (files or
|
||||||
|
network connections), or data transfer between the Python application
|
||||||
|
and the C layer.
|
||||||
|
|
||||||
.. note::
|
.. note::
|
||||||
|
|
||||||
By default, the result of packing a given C struct includes pad bytes in
|
When no prefix character is given, native mode is the default. It
|
||||||
order to maintain proper alignment for the C types involved; similarly,
|
packs or unpacks data based on the platform and compiler on which
|
||||||
alignment is taken into account when unpacking. This behavior is chosen so
|
the Python interpreter was built.
|
||||||
that the bytes of a packed struct correspond exactly to the layout in memory
|
The result of packing a given C struct includes pad bytes which
|
||||||
of the corresponding C struct. To handle platform-independent data formats
|
maintain proper alignment for the C types involved; similarly,
|
||||||
or omit implicit pad bytes, use ``standard`` size and alignment instead of
|
alignment is taken into account when unpacking. In contrast, when
|
||||||
``native`` size and alignment: see :ref:`struct-alignment` for details.
|
communicating data between external sources, the programmer is
|
||||||
|
responsible for defining byte ordering and padding between elements.
|
||||||
|
See :ref:`struct-alignment` for details.
|
||||||
|
|
||||||
Several :mod:`struct` functions (and methods of :class:`Struct`) take a *buffer*
|
Several :mod:`struct` functions (and methods of :class:`Struct`) take a *buffer*
|
||||||
argument. This refers to objects that implement the :ref:`bufferobjects` and
|
argument. This refers to objects that implement the :ref:`bufferobjects` and
|
||||||
|
|
@ -102,10 +106,13 @@ The module defines the following exception and functions:
|
||||||
Format Strings
|
Format Strings
|
||||||
--------------
|
--------------
|
||||||
|
|
||||||
Format strings are the mechanism used to specify the expected layout when
|
Format strings describe the data layout when
|
||||||
packing and unpacking data. They are built up from :ref:`format-characters`,
|
packing and unpacking data. They are built up from :ref:`format characters<format-characters>`,
|
||||||
which specify the type of data being packed/unpacked. In addition, there are
|
which specify the type of data being packed/unpacked. In addition,
|
||||||
special characters for controlling the :ref:`struct-alignment`.
|
special characters control the :ref:`byte order, size and alignment<struct-alignment>`.
|
||||||
|
Each format string consists of an optional prefix character which
|
||||||
|
describes the overall properties of the data and one or more format
|
||||||
|
characters which describe the actual data values and padding.
|
||||||
|
|
||||||
|
|
||||||
.. _struct-alignment:
|
.. _struct-alignment:
|
||||||
|
|
@ -116,6 +123,11 @@ Byte Order, Size, and Alignment
|
||||||
By default, C types are represented in the machine's native format and byte
|
By default, C types are represented in the machine's native format and byte
|
||||||
order, and properly aligned by skipping pad bytes if necessary (according to the
|
order, and properly aligned by skipping pad bytes if necessary (according to the
|
||||||
rules used by the C compiler).
|
rules used by the C compiler).
|
||||||
|
This behavior is chosen so
|
||||||
|
that the bytes of a packed struct correspond exactly to the memory layout
|
||||||
|
of the corresponding C struct.
|
||||||
|
Whether to use native byte ordering
|
||||||
|
and padding or standard formats depends on the application.
|
||||||
|
|
||||||
.. index::
|
.. index::
|
||||||
single: @ (at); in struct format strings
|
single: @ (at); in struct format strings
|
||||||
|
|
@ -144,12 +156,10 @@ following table:
|
||||||
|
|
||||||
If the first character is not one of these, ``'@'`` is assumed.
|
If the first character is not one of these, ``'@'`` is assumed.
|
||||||
|
|
||||||
Native byte order is big-endian or little-endian, depending on the host
|
Native byte order is big-endian or little-endian, depending on the
|
||||||
system. For example, Intel x86 and AMD64 (x86-64) are little-endian;
|
host system. For example, Intel x86, AMD64 (x86-64), and Apple M1 are
|
||||||
IBM z and most legacy architectures are big-endian;
|
little-endian; IBM z and many legacy architectures are big-endian.
|
||||||
and ARM, RISC-V and IBM Power feature switchable endianness
|
Use :data:`sys.byteorder` to check the endianness of your system.
|
||||||
(bi-endian, though the former two are nearly always little-endian in practice).
|
|
||||||
Use ``sys.byteorder`` to check the endianness of your system.
|
|
||||||
|
|
||||||
Native size and alignment are determined using the C compiler's
|
Native size and alignment are determined using the C compiler's
|
||||||
``sizeof`` expression. This is always combined with native byte order.
|
``sizeof`` expression. This is always combined with native byte order.
|
||||||
|
|
@ -231,9 +241,9 @@ platform-dependent.
|
||||||
+--------+--------------------------+--------------------+----------------+------------+
|
+--------+--------------------------+--------------------+----------------+------------+
|
||||||
| ``d`` | :c:expr:`double` | float | 8 | \(4) |
|
| ``d`` | :c:expr:`double` | float | 8 | \(4) |
|
||||||
+--------+--------------------------+--------------------+----------------+------------+
|
+--------+--------------------------+--------------------+----------------+------------+
|
||||||
| ``s`` | :c:expr:`char[]` | bytes | | |
|
| ``s`` | :c:expr:`char[]` | bytes | | \(9) |
|
||||||
+--------+--------------------------+--------------------+----------------+------------+
|
+--------+--------------------------+--------------------+----------------+------------+
|
||||||
| ``p`` | :c:expr:`char[]` | bytes | | |
|
| ``p`` | :c:expr:`char[]` | bytes | | \(8) |
|
||||||
+--------+--------------------------+--------------------+----------------+------------+
|
+--------+--------------------------+--------------------+----------------+------------+
|
||||||
| ``P`` | :c:expr:`void \*` | integer | | \(5) |
|
| ``P`` | :c:expr:`void \*` | integer | | \(5) |
|
||||||
+--------+--------------------------+--------------------+----------------+------------+
|
+--------+--------------------------+--------------------+----------------+------------+
|
||||||
|
|
@ -292,8 +302,33 @@ Notes:
|
||||||
format <half precision format_>`_ for more information.
|
format <half precision format_>`_ for more information.
|
||||||
|
|
||||||
(7)
|
(7)
|
||||||
For padding, ``x`` inserts null bytes.
|
When packing, ``'x'`` inserts one NUL byte.
|
||||||
|
|
||||||
|
(8)
|
||||||
|
The ``'p'`` format character encodes a "Pascal string", meaning a short
|
||||||
|
variable-length string stored in a *fixed number of bytes*, given by the count.
|
||||||
|
The first byte stored is the length of the string, or 255, whichever is
|
||||||
|
smaller. The bytes of the string follow. If the string passed in to
|
||||||
|
:func:`pack` is too long (longer than the count minus 1), only the leading
|
||||||
|
``count-1`` bytes of the string are stored. If the string is shorter than
|
||||||
|
``count-1``, it is padded with null bytes so that exactly count bytes in all
|
||||||
|
are used. Note that for :func:`unpack`, the ``'p'`` format character consumes
|
||||||
|
``count`` bytes, but that the string returned can never contain more than 255
|
||||||
|
bytes.
|
||||||
|
|
||||||
|
(9)
|
||||||
|
For the ``'s'`` format character, the count is interpreted as the length of the
|
||||||
|
bytes, not a repeat count like for the other format characters; for example,
|
||||||
|
``'10s'`` means a single 10-byte string mapping to or from a single
|
||||||
|
Python byte string, while ``'10c'`` means 10
|
||||||
|
separate one byte character elements (e.g., ``cccccccccc``) mapping
|
||||||
|
to or from ten different Python byte objects. (See :ref:`struct-examples`
|
||||||
|
for a concrete demonstration of the difference.)
|
||||||
|
If a count is not given, it defaults to 1. For packing, the string is
|
||||||
|
truncated or padded with null bytes as appropriate to make it fit. For
|
||||||
|
unpacking, the resulting bytes object always has exactly the specified number
|
||||||
|
of bytes. As a special case, ``'0s'`` means a single, empty string (while
|
||||||
|
``'0c'`` means 0 characters).
|
||||||
|
|
||||||
A format character may be preceded by an integral repeat count. For example,
|
A format character may be preceded by an integral repeat count. For example,
|
||||||
the format string ``'4h'`` means exactly the same as ``'hhhh'``.
|
the format string ``'4h'`` means exactly the same as ``'hhhh'``.
|
||||||
|
|
@ -301,15 +336,6 @@ the format string ``'4h'`` means exactly the same as ``'hhhh'``.
|
||||||
Whitespace characters between formats are ignored; a count and its format must
|
Whitespace characters between formats are ignored; a count and its format must
|
||||||
not contain whitespace though.
|
not contain whitespace though.
|
||||||
|
|
||||||
For the ``'s'`` format character, the count is interpreted as the length of the
|
|
||||||
bytes, not a repeat count like for the other format characters; for example,
|
|
||||||
``'10s'`` means a single 10-byte string, while ``'10c'`` means 10 characters.
|
|
||||||
If a count is not given, it defaults to 1. For packing, the string is
|
|
||||||
truncated or padded with null bytes as appropriate to make it fit. For
|
|
||||||
unpacking, the resulting bytes object always has exactly the specified number
|
|
||||||
of bytes. As a special case, ``'0s'`` means a single, empty string (while
|
|
||||||
``'0c'`` means 0 characters).
|
|
||||||
|
|
||||||
When packing a value ``x`` using one of the integer formats (``'b'``,
|
When packing a value ``x`` using one of the integer formats (``'b'``,
|
||||||
``'B'``, ``'h'``, ``'H'``, ``'i'``, ``'I'``, ``'l'``, ``'L'``,
|
``'B'``, ``'h'``, ``'H'``, ``'i'``, ``'I'``, ``'l'``, ``'L'``,
|
||||||
``'q'``, ``'Q'``), if ``x`` is outside the valid range for that format
|
``'q'``, ``'Q'``), if ``x`` is outside the valid range for that format
|
||||||
|
|
@ -319,17 +345,6 @@ then :exc:`struct.error` is raised.
|
||||||
Previously, some of the integer formats wrapped out-of-range values and
|
Previously, some of the integer formats wrapped out-of-range values and
|
||||||
raised :exc:`DeprecationWarning` instead of :exc:`struct.error`.
|
raised :exc:`DeprecationWarning` instead of :exc:`struct.error`.
|
||||||
|
|
||||||
The ``'p'`` format character encodes a "Pascal string", meaning a short
|
|
||||||
variable-length string stored in a *fixed number of bytes*, given by the count.
|
|
||||||
The first byte stored is the length of the string, or 255, whichever is
|
|
||||||
smaller. The bytes of the string follow. If the string passed in to
|
|
||||||
:func:`pack` is too long (longer than the count minus 1), only the leading
|
|
||||||
``count-1`` bytes of the string are stored. If the string is shorter than
|
|
||||||
``count-1``, it is padded with null bytes so that exactly count bytes in all
|
|
||||||
are used. Note that for :func:`unpack`, the ``'p'`` format character consumes
|
|
||||||
``count`` bytes, but that the string returned can never contain more than 255
|
|
||||||
bytes.
|
|
||||||
|
|
||||||
.. index:: single: ? (question mark); in struct format strings
|
.. index:: single: ? (question mark); in struct format strings
|
||||||
|
|
||||||
For the ``'?'`` format character, the return value is either :const:`True` or
|
For the ``'?'`` format character, the return value is either :const:`True` or
|
||||||
|
|
@ -345,18 +360,36 @@ Examples
|
||||||
^^^^^^^^
|
^^^^^^^^
|
||||||
|
|
||||||
.. note::
|
.. note::
|
||||||
All examples assume a native byte order, size, and alignment with a
|
Native byte order examples (designated by the ``'@'`` format prefix or
|
||||||
big-endian machine.
|
lack of any prefix character) may not match what the reader's
|
||||||
|
machine produces as
|
||||||
|
that depends on the platform and compiler.
|
||||||
|
|
||||||
A basic example of packing/unpacking three integers::
|
Pack and unpack integers of three different sizes, using big endian
|
||||||
|
ordering::
|
||||||
|
|
||||||
>>> from struct import *
|
>>> from struct import *
|
||||||
>>> pack('hhl', 1, 2, 3)
|
>>> pack(">bhl", 1, 2, 3)
|
||||||
b'\x00\x01\x00\x02\x00\x00\x00\x03'
|
b'\x01\x00\x02\x00\x00\x00\x03'
|
||||||
>>> unpack('hhl', b'\x00\x01\x00\x02\x00\x00\x00\x03')
|
>>> unpack('>bhl', b'\x01\x00\x02\x00\x00\x00\x03'
|
||||||
(1, 2, 3)
|
(1, 2, 3)
|
||||||
>>> calcsize('hhl')
|
>>> calcsize('>bhl')
|
||||||
8
|
7
|
||||||
|
|
||||||
|
Attempt to pack an integer which is too large for the defined field::
|
||||||
|
|
||||||
|
>>> pack(">h", 99999)
|
||||||
|
Traceback (most recent call last):
|
||||||
|
File "<stdin>", line 1, in <module>
|
||||||
|
struct.error: 'h' format requires -32768 <= number <= 32767
|
||||||
|
|
||||||
|
Demonstrate the difference between ``'s'`` and ``'c'`` format
|
||||||
|
characters::
|
||||||
|
|
||||||
|
>>> pack("@ccc", b'1', b'2', b'3')
|
||||||
|
b'123'
|
||||||
|
>>> pack("@3s", b'123')
|
||||||
|
b'123'
|
||||||
|
|
||||||
Unpacked fields can be named by assigning them to variables or by wrapping
|
Unpacked fields can be named by assigning them to variables or by wrapping
|
||||||
the result in a named tuple::
|
the result in a named tuple::
|
||||||
|
|
@ -369,35 +402,132 @@ the result in a named tuple::
|
||||||
>>> Student._make(unpack('<10sHHb', record))
|
>>> Student._make(unpack('<10sHHb', record))
|
||||||
Student(name=b'raymond ', serialnum=4658, school=264, gradelevel=8)
|
Student(name=b'raymond ', serialnum=4658, school=264, gradelevel=8)
|
||||||
|
|
||||||
The ordering of format characters may have an impact on size since the padding
|
The ordering of format characters may have an impact on size in native
|
||||||
needed to satisfy alignment requirements is different::
|
mode since padding is implicit. In standard mode, the user is
|
||||||
|
responsible for inserting any desired padding.
|
||||||
|
Note in
|
||||||
|
the first ``pack`` call below that three NUL bytes were added after the
|
||||||
|
packed ``'#'`` to align the following integer on a four-byte boundary.
|
||||||
|
In this example, the output was produced on a little endian machine::
|
||||||
|
|
||||||
>>> pack('ci', b'*', 0x12131415)
|
>>> pack('@ci', b'#', 0x12131415)
|
||||||
b'*\x00\x00\x00\x12\x13\x14\x15'
|
b'#\x00\x00\x00\x15\x14\x13\x12'
|
||||||
>>> pack('ic', 0x12131415, b'*')
|
>>> pack('@ic', 0x12131415, b'#')
|
||||||
b'\x12\x13\x14\x15*'
|
b'\x15\x14\x13\x12#'
|
||||||
>>> calcsize('ci')
|
>>> calcsize('@ci')
|
||||||
8
|
8
|
||||||
>>> calcsize('ic')
|
>>> calcsize('@ic')
|
||||||
5
|
5
|
||||||
|
|
||||||
The following format ``'llh0l'`` specifies two pad bytes at the end, assuming
|
The following format ``'llh0l'`` results in two pad bytes being added
|
||||||
longs are aligned on 4-byte boundaries::
|
at the end, assuming the platform's longs are aligned on 4-byte boundaries::
|
||||||
|
|
||||||
>>> pack('llh0l', 1, 2, 3)
|
>>> pack('@llh0l', 1, 2, 3)
|
||||||
b'\x00\x00\x00\x01\x00\x00\x00\x02\x00\x03\x00\x00'
|
b'\x00\x00\x00\x01\x00\x00\x00\x02\x00\x03\x00\x00'
|
||||||
|
|
||||||
This only works when native size and alignment are in effect; standard size and
|
|
||||||
alignment does not enforce any alignment.
|
|
||||||
|
|
||||||
|
|
||||||
.. seealso::
|
.. seealso::
|
||||||
|
|
||||||
Module :mod:`array`
|
Module :mod:`array`
|
||||||
Packed binary storage of homogeneous data.
|
Packed binary storage of homogeneous data.
|
||||||
|
|
||||||
Module :mod:`xdrlib`
|
Module :mod:`json`
|
||||||
Packing and unpacking of XDR data.
|
JSON encoder and decoder.
|
||||||
|
|
||||||
|
Module :mod:`pickle`
|
||||||
|
Python object serialization.
|
||||||
|
|
||||||
|
|
||||||
|
.. _applications:
|
||||||
|
|
||||||
|
Applications
|
||||||
|
------------
|
||||||
|
|
||||||
|
Two main applications for the :mod:`struct` module exist, data
|
||||||
|
interchange between Python and C code within an application or another
|
||||||
|
application compiled using the same compiler (:ref:`native formats<struct-native-formats>`), and
|
||||||
|
data interchange between applications using agreed upon data layout
|
||||||
|
(:ref:`standard formats<struct-standard-formats>`). Generally speaking, the format strings
|
||||||
|
constructed for these two domains are distinct.
|
||||||
|
|
||||||
|
|
||||||
|
.. _struct-native-formats:
|
||||||
|
|
||||||
|
Native Formats
|
||||||
|
^^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
When constructing format strings which mimic native layouts, the
|
||||||
|
compiler and machine architecture determine byte ordering and padding.
|
||||||
|
In such cases, the ``@`` format character should be used to specify
|
||||||
|
native byte ordering and data sizes. Internal pad bytes are normally inserted
|
||||||
|
automatically. It is possible that a zero-repeat format code will be
|
||||||
|
needed at the end of a format string to round up to the correct
|
||||||
|
byte boundary for proper alignment of consective chunks of data.
|
||||||
|
|
||||||
|
Consider these two simple examples (on a 64-bit, little-endian
|
||||||
|
machine)::
|
||||||
|
|
||||||
|
>>> calcsize('@lhl')
|
||||||
|
24
|
||||||
|
>>> calcsize('@llh')
|
||||||
|
18
|
||||||
|
|
||||||
|
Data is not padded to an 8-byte boundary at the end of the second
|
||||||
|
format string without the use of extra padding. A zero-repeat format
|
||||||
|
code solves that problem::
|
||||||
|
|
||||||
|
>>> calcsize('@llh0l')
|
||||||
|
24
|
||||||
|
|
||||||
|
The ``'x'`` format code can be used to specify the repeat, but for
|
||||||
|
native formats it is better to use a zero-repeat format like ``'0l'``.
|
||||||
|
|
||||||
|
By default, native byte ordering and alignment is used, but it is
|
||||||
|
better to be explicit and use the ``'@'`` prefix character.
|
||||||
|
|
||||||
|
|
||||||
|
.. _struct-standard-formats:
|
||||||
|
|
||||||
|
Standard Formats
|
||||||
|
^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
When exchanging data beyond your process such as networking or storage,
|
||||||
|
be precise. Specify the exact byte order, size, and alignment. Do
|
||||||
|
not assume they match the native order of a particular machine.
|
||||||
|
For example, network byte order is big-endian, while many popular CPUs
|
||||||
|
are little-endian. By defining this explicitly, the user need not
|
||||||
|
care about the specifics of the platform their code is running on.
|
||||||
|
The first character should typically be ``<`` or ``>``
|
||||||
|
(or ``!``). Padding is the responsibility of the programmer. The
|
||||||
|
zero-repeat format character won't work. Instead, the user must
|
||||||
|
explicitly add ``'x'`` pad bytes where needed. Revisiting the
|
||||||
|
examples from the previous section, we have::
|
||||||
|
|
||||||
|
>>> calcsize('<qh6xq')
|
||||||
|
24
|
||||||
|
>>> pack('<qh6xq', 1, 2, 3) == pack('@lhl', 1, 2, 3)
|
||||||
|
True
|
||||||
|
>>> calcsize('@llh')
|
||||||
|
18
|
||||||
|
>>> pack('@llh', 1, 2, 3) == pack('<qqh', 1, 2, 3)
|
||||||
|
True
|
||||||
|
>>> calcsize('<qqh6x')
|
||||||
|
24
|
||||||
|
>>> calcsize('@llh0l')
|
||||||
|
24
|
||||||
|
>>> pack('@llh0l', 1, 2, 3) == pack('<qqh6x', 1, 2, 3)
|
||||||
|
True
|
||||||
|
|
||||||
|
The above results (executed on a 64-bit machine) aren't guaranteed to
|
||||||
|
match when executed on different machines. For example, the examples
|
||||||
|
below were executed on a 32-bit machine::
|
||||||
|
|
||||||
|
>>> calcsize('<qqh6x')
|
||||||
|
24
|
||||||
|
>>> calcsize('@llh0l')
|
||||||
|
12
|
||||||
|
>>> pack('@llh0l', 1, 2, 3) == pack('<qqh6x', 1, 2, 3)
|
||||||
|
False
|
||||||
|
|
||||||
|
|
||||||
.. _struct-objects:
|
.. _struct-objects:
|
||||||
|
|
@ -411,9 +541,9 @@ The :mod:`struct` module also defines the following type:
|
||||||
.. class:: Struct(format)
|
.. class:: Struct(format)
|
||||||
|
|
||||||
Return a new Struct object which writes and reads binary data according to
|
Return a new Struct object which writes and reads binary data according to
|
||||||
the format string *format*. Creating a Struct object once and calling its
|
the format string *format*. Creating a ``Struct`` object once and calling its
|
||||||
methods is more efficient than calling the :mod:`struct` functions with the
|
methods is more efficient than calling module-level functions with the
|
||||||
same format since the format string only needs to be compiled once.
|
same format since the format string is only compiled once.
|
||||||
|
|
||||||
.. note::
|
.. note::
|
||||||
|
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue