gh-132983: Introduce _zstd bindings module (GH-133027)

* Add _zstd module for https://peps.python.org/pep-0784/

This commit introduces the `_zstd` module, with bindings to libzstd from
the pyzstd project. It also includes the unix build system configuration.
Windows build system support will be integrated independently as it
depends on integration with cpython-source-deps.

* Add _zstd to modules

* Fix path for compression.zstd module

* Ignore _zstd module like _io

* Expand module state macros to improve code quality

Also removes module state references from the classes in the _zstd
module and instead uses PyType_GetModuleState()

* Remove backticks suggested in review

Co-authored-by: Stan Ulbrych <89152624+StanFromIreland@users.noreply.github.com>

* Use critical sections to lock object state

This should avoid races and deadlocks.

* Remove compress/decompress and mark module as not reliant on the GIL

The `compress`/`decompress` functions will be moved to Python code for simplicity.
C implementations can always be re-added in the future.

Also, mark _zstd as not requiring the GIL.

* Lift critical section to avoid clang warning

* Respond to comments by picnixz

* Call out pyzstd explicitly in license description

Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>

* Use a much more robust implementation...

... for `get_zstd_state_from_type`

Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>

* Use PyList_GetItemRef for thread safety purposes

* Use a macro for the minimum supported version

* remove const from primivite types

* Use PyMem_New in another spot

* Simplify error handling in _get_frame_size

* Another simplification of error handling in get_frame_info

* Rename _module_state to mod_state

* Rewrite comment explaining the context of the code

* Add link to pyzstd

* Add TODO about refactoring dict training code

* Use PyModule_AddObjectRef over PyModule_AddObject

PyModule_AddObject is soft-deprecated, so we should use PyModule_AddObjectRef

* Check result of OutputBufferGrow

* Simplify return logic in `add_constant_to_type`

Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>

* Ignore return value of _zstd_clear()

Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>

* Remove redundant comments

* Remove __reduce__ from ZstdDict

We should instead document that to pickle a dictionary a user should use
the `.dict_content` attribute.

* Use PyUnicode_FromFormat instead of a buffer

* Don't use C constants/types in error messages

* Make error messages easier to understand for Python users

* Lower minimum required version 1.4.0

* Use casts and make slot function signatures correct

* Be consistent with CPython on const usage

* Make else clauses in line with PEP 7

* Fix over-indented blocks in argument clinic

* Add critical section around ZSTD_DCtx_setParameter

* Add a TODO about refactoring critical sections

* Use Py_UNREACHABLE

* Move bytes operations out of Py_BEGIN_ALLOW_THREADS

* Add TODO about ensuring a lock is held

* Remove asserts that may not be correct

* Add TODO to make ZstdDict and others GC objects

* Make objects GC tracked

* Remove unused include

* Fix some memory issues

* Fix refleaks on module and in ZstdDict

* Update configure to check for ZDICT_finalizeDictionary

* Properly check version in configure

* exit(1) if check fails

* Use AC_RUN_IFELSE

* Use a define() to re-use version check

* Actually properly set _zstd module status based on version

---------

Co-authored-by: Stan Ulbrych <89152624+StanFromIreland@users.noreply.github.com>
Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
This commit is contained in:
Emma Smith 2025-05-03 18:29:55 -07:00 committed by GitHub
parent 2bc8365231
commit 3b4333583f
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
23 changed files with 4804 additions and 1 deletions

104
Modules/_zstd/buffer.h Normal file
View file

@ -0,0 +1,104 @@
/*
Low level interface to Meta's zstd library for use in the compression.zstd
Python module.
*/
#include "_zstdmodule.h"
#include "pycore_blocks_output_buffer.h"
/* Blocks output buffer wrapper code */
/* Initialize the buffer, and grow the buffer.
Return 0 on success
Return -1 on failure */
static inline int
_OutputBuffer_InitAndGrow(_BlocksOutputBuffer *buffer, ZSTD_outBuffer *ob,
Py_ssize_t max_length)
{
/* Ensure .list was set to NULL */
assert(buffer->list == NULL);
Py_ssize_t res = _BlocksOutputBuffer_InitAndGrow(buffer, max_length, &ob->dst);
if (res < 0) {
return -1;
}
ob->size = (size_t) res;
ob->pos = 0;
return 0;
}
/* Initialize the buffer, with an initial size.
init_size: the initial size.
Return 0 on success
Return -1 on failure */
static inline int
_OutputBuffer_InitWithSize(_BlocksOutputBuffer *buffer, ZSTD_outBuffer *ob,
Py_ssize_t max_length,
Py_ssize_t init_size)
{
Py_ssize_t block_size;
/* Ensure .list was set to NULL */
assert(buffer->list == NULL);
/* Get block size */
if (0 <= max_length && max_length < init_size) {
block_size = max_length;
}
else {
block_size = init_size;
}
Py_ssize_t res = _BlocksOutputBuffer_InitWithSize(buffer, block_size, &ob->dst);
if (res < 0) {
return -1;
}
// Set max_length, InitWithSize doesn't do this
buffer->max_length = max_length;
ob->size = (size_t) res;
ob->pos = 0;
return 0;
}
/* Grow the buffer.
Return 0 on success
Return -1 on failure */
static inline int
_OutputBuffer_Grow(_BlocksOutputBuffer *buffer, ZSTD_outBuffer *ob)
{
assert(ob->pos == ob->size);
Py_ssize_t res = _BlocksOutputBuffer_Grow(buffer, &ob->dst, 0);
if (res < 0) {
return -1;
}
ob->size = (size_t) res;
ob->pos = 0;
return 0;
}
/* Finish the buffer.
Return a bytes object on success
Return NULL on failure */
static inline PyObject *
_OutputBuffer_Finish(_BlocksOutputBuffer *buffer, ZSTD_outBuffer *ob)
{
return _BlocksOutputBuffer_Finish(buffer, ob->size - ob->pos);
}
/* Clean up the buffer */
static inline void
_OutputBuffer_OnError(_BlocksOutputBuffer *buffer)
{
_BlocksOutputBuffer_OnError(buffer);
}
/* Whether the output data has reached max_length.
The avail_out must be 0, please check it before calling. */
static inline int
_OutputBuffer_ReachedMaxLength(_BlocksOutputBuffer *buffer, ZSTD_outBuffer *ob)
{
/* Ensure (data size == allocated size) */
assert(ob->pos == ob->size);
return buffer->allocated == buffer->max_length;
}