mirror of
https://github.com/python/cpython.git
synced 2025-07-07 19:35:27 +00:00
gh-134004: Added the reorganize() methods to dbm.sqlite, dbm.dumb and shelve (GH-134028)
They are similar to the same named method in dbm.gnu.
This commit is contained in:
parent
b595237166
commit
f806463e16
9 changed files with 172 additions and 6 deletions
|
@ -15,10 +15,16 @@
|
||||||
* :mod:`dbm.ndbm`
|
* :mod:`dbm.ndbm`
|
||||||
|
|
||||||
If none of these modules are installed, the
|
If none of these modules are installed, the
|
||||||
slow-but-simple implementation in module :mod:`dbm.dumb` will be used. There
|
slow-but-simple implementation in module :mod:`dbm.dumb` will be used. There
|
||||||
is a `third party interface <https://www.jcea.es/programacion/pybsddb.htm>`_ to
|
is a `third party interface <https://www.jcea.es/programacion/pybsddb.htm>`_ to
|
||||||
the Oracle Berkeley DB.
|
the Oracle Berkeley DB.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
None of the underlying modules will automatically shrink the disk space used by
|
||||||
|
the database file. However, :mod:`dbm.sqlite3`, :mod:`dbm.gnu` and :mod:`dbm.dumb`
|
||||||
|
provide a :meth:`!reorganize` method that can be used for this purpose.
|
||||||
|
|
||||||
|
|
||||||
.. exception:: error
|
.. exception:: error
|
||||||
|
|
||||||
A tuple containing the exceptions that can be raised by each of the supported
|
A tuple containing the exceptions that can be raised by each of the supported
|
||||||
|
@ -186,6 +192,17 @@ or any other SQLite browser, including the SQLite CLI.
|
||||||
The Unix file access mode of the file (default: octal ``0o666``),
|
The Unix file access mode of the file (default: octal ``0o666``),
|
||||||
used only when the database has to be created.
|
used only when the database has to be created.
|
||||||
|
|
||||||
|
.. method:: sqlite3.reorganize()
|
||||||
|
|
||||||
|
If you have carried out a lot of deletions and would like to shrink the space
|
||||||
|
used on disk, this method will reorganize the database; otherwise, deleted file
|
||||||
|
space will be kept and reused as new (key, value) pairs are added.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
While reorganizing, as much as two times the size of the original database is required
|
||||||
|
in free disk space. However, be aware that this factor changes for each :mod:`dbm` submodule.
|
||||||
|
|
||||||
|
.. versionadded:: next
|
||||||
|
|
||||||
:mod:`dbm.gnu` --- GNU database manager
|
:mod:`dbm.gnu` --- GNU database manager
|
||||||
---------------------------------------
|
---------------------------------------
|
||||||
|
@ -284,6 +301,10 @@ functionality like crash tolerance.
|
||||||
reorganization; otherwise, deleted file space will be kept and reused as new
|
reorganization; otherwise, deleted file space will be kept and reused as new
|
||||||
(key, value) pairs are added.
|
(key, value) pairs are added.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
While reorganizing, as much as one time the size of the original database is required
|
||||||
|
in free disk space. However, be aware that this factor changes for each :mod:`dbm` submodule.
|
||||||
|
|
||||||
.. method:: gdbm.sync()
|
.. method:: gdbm.sync()
|
||||||
|
|
||||||
When the database has been opened in fast mode, this method forces any
|
When the database has been opened in fast mode, this method forces any
|
||||||
|
@ -438,6 +459,11 @@ The :mod:`!dbm.dumb` module defines the following:
|
||||||
with a sufficiently large/complex entry due to stack depth limitations in
|
with a sufficiently large/complex entry due to stack depth limitations in
|
||||||
Python's AST compiler.
|
Python's AST compiler.
|
||||||
|
|
||||||
|
.. warning::
|
||||||
|
:mod:`dbm.dumb` does not support concurrent read/write access. (Multiple
|
||||||
|
simultaneous read accesses are safe.) When a program has the database open
|
||||||
|
for writing, no other program should have it open for reading or writing.
|
||||||
|
|
||||||
.. versionchanged:: 3.5
|
.. versionchanged:: 3.5
|
||||||
:func:`~dbm.dumb.open` always creates a new database when *flag* is ``'n'``.
|
:func:`~dbm.dumb.open` always creates a new database when *flag* is ``'n'``.
|
||||||
|
|
||||||
|
@ -460,3 +486,15 @@ The :mod:`!dbm.dumb` module defines the following:
|
||||||
.. method:: dumbdbm.close()
|
.. method:: dumbdbm.close()
|
||||||
|
|
||||||
Close the database.
|
Close the database.
|
||||||
|
|
||||||
|
.. method:: dumbdbm.reorganize()
|
||||||
|
|
||||||
|
If you have carried out a lot of deletions and would like to shrink the space
|
||||||
|
used on disk, this method will reorganize the database; otherwise, deleted file
|
||||||
|
space will not be reused.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
While reorganizing, no additional free disk space is required. However, be aware
|
||||||
|
that this factor changes for each :mod:`dbm` submodule.
|
||||||
|
|
||||||
|
.. versionadded:: next
|
||||||
|
|
|
@ -75,8 +75,15 @@ Two additional methods are supported:
|
||||||
|
|
||||||
Write back all entries in the cache if the shelf was opened with *writeback*
|
Write back all entries in the cache if the shelf was opened with *writeback*
|
||||||
set to :const:`True`. Also empty the cache and synchronize the persistent
|
set to :const:`True`. Also empty the cache and synchronize the persistent
|
||||||
dictionary on disk, if feasible. This is called automatically when the shelf
|
dictionary on disk, if feasible. This is called automatically when
|
||||||
is closed with :meth:`close`.
|
:meth:`reorganize` is called or the shelf is closed with :meth:`close`.
|
||||||
|
|
||||||
|
.. method:: Shelf.reorganize()
|
||||||
|
|
||||||
|
Calls :meth:`sync` and attempts to shrink space used on disk by removing empty
|
||||||
|
space resulting from deletions.
|
||||||
|
|
||||||
|
.. versionadded:: next
|
||||||
|
|
||||||
.. method:: Shelf.close()
|
.. method:: Shelf.close()
|
||||||
|
|
||||||
|
@ -116,6 +123,11 @@ Restrictions
|
||||||
* On macOS :mod:`dbm.ndbm` can silently corrupt the database file on updates,
|
* On macOS :mod:`dbm.ndbm` can silently corrupt the database file on updates,
|
||||||
which can cause hard crashes when trying to read from the database.
|
which can cause hard crashes when trying to read from the database.
|
||||||
|
|
||||||
|
* :meth:`Shelf.reorganize` may not be available for all database packages and
|
||||||
|
may temporarely increase resource usage (especially disk space) when called.
|
||||||
|
Additionally, it will never run automatically and instead needs to be called
|
||||||
|
explicitly.
|
||||||
|
|
||||||
|
|
||||||
.. class:: Shelf(dict, protocol=None, writeback=False, keyencoding='utf-8')
|
.. class:: Shelf(dict, protocol=None, writeback=False, keyencoding='utf-8')
|
||||||
|
|
||||||
|
|
|
@ -89,6 +89,14 @@ New modules
|
||||||
Improved modules
|
Improved modules
|
||||||
================
|
================
|
||||||
|
|
||||||
|
dbm
|
||||||
|
---
|
||||||
|
|
||||||
|
* Added new :meth:`!reorganize` methods to :mod:`dbm.dumb` and :mod:`dbm.sqlite3`
|
||||||
|
which allow to recover unused free space previously occupied by deleted entries.
|
||||||
|
(Contributed by Andrea Oliveri in :gh:`134004`.)
|
||||||
|
|
||||||
|
|
||||||
difflib
|
difflib
|
||||||
-------
|
-------
|
||||||
|
|
||||||
|
@ -96,6 +104,15 @@ difflib
|
||||||
class, and migrated the output to the HTML5 standard.
|
class, and migrated the output to the HTML5 standard.
|
||||||
(Contributed by Jiahao Li in :gh:`134580`.)
|
(Contributed by Jiahao Li in :gh:`134580`.)
|
||||||
|
|
||||||
|
|
||||||
|
shelve
|
||||||
|
------
|
||||||
|
|
||||||
|
* Added new :meth:`!reorganize` method to :mod:`shelve` used to recover unused free
|
||||||
|
space previously occupied by deleted entries.
|
||||||
|
(Contributed by Andrea Oliveri in :gh:`134004`.)
|
||||||
|
|
||||||
|
|
||||||
ssl
|
ssl
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|
|
@ -9,7 +9,7 @@ XXX TO DO:
|
||||||
- seems to contain a bug when updating...
|
- seems to contain a bug when updating...
|
||||||
|
|
||||||
- reclaim free space (currently, space once occupied by deleted or expanded
|
- reclaim free space (currently, space once occupied by deleted or expanded
|
||||||
items is never reused)
|
items is not reused exept if .reorganize() is called)
|
||||||
|
|
||||||
- support concurrent access (currently, if two processes take turns making
|
- support concurrent access (currently, if two processes take turns making
|
||||||
updates, they can mess up the index)
|
updates, they can mess up the index)
|
||||||
|
@ -17,8 +17,6 @@ updates, they can mess up the index)
|
||||||
- support efficient access to large databases (currently, the whole index
|
- support efficient access to large databases (currently, the whole index
|
||||||
is read when the database is opened, and some updates rewrite the whole index)
|
is read when the database is opened, and some updates rewrite the whole index)
|
||||||
|
|
||||||
- support opening for read-only (flag = 'm')
|
|
||||||
|
|
||||||
"""
|
"""
|
||||||
|
|
||||||
import ast as _ast
|
import ast as _ast
|
||||||
|
@ -289,6 +287,34 @@ class _Database(collections.abc.MutableMapping):
|
||||||
def __exit__(self, *args):
|
def __exit__(self, *args):
|
||||||
self.close()
|
self.close()
|
||||||
|
|
||||||
|
def reorganize(self):
|
||||||
|
if self._readonly:
|
||||||
|
raise error('The database is opened for reading only')
|
||||||
|
self._verify_open()
|
||||||
|
# Ensure all changes are committed before reorganizing.
|
||||||
|
self._commit()
|
||||||
|
# Open file in r+ to allow changing in-place.
|
||||||
|
with _io.open(self._datfile, 'rb+') as f:
|
||||||
|
reorganize_pos = 0
|
||||||
|
|
||||||
|
# Iterate over existing keys, sorted by starting byte.
|
||||||
|
for key in sorted(self._index, key = lambda k: self._index[k][0]):
|
||||||
|
pos, siz = self._index[key]
|
||||||
|
f.seek(pos)
|
||||||
|
val = f.read(siz)
|
||||||
|
|
||||||
|
f.seek(reorganize_pos)
|
||||||
|
f.write(val)
|
||||||
|
self._index[key] = (reorganize_pos, siz)
|
||||||
|
|
||||||
|
blocks_occupied = (siz + _BLOCKSIZE - 1) // _BLOCKSIZE
|
||||||
|
reorganize_pos += blocks_occupied * _BLOCKSIZE
|
||||||
|
|
||||||
|
f.truncate(reorganize_pos)
|
||||||
|
# Commit changes to index, which were not in-place.
|
||||||
|
self._commit()
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
def open(file, flag='c', mode=0o666):
|
def open(file, flag='c', mode=0o666):
|
||||||
"""Open the database file, filename, and return corresponding object.
|
"""Open the database file, filename, and return corresponding object.
|
||||||
|
|
|
@ -15,6 +15,7 @@ LOOKUP_KEY = "SELECT value FROM Dict WHERE key = CAST(? AS BLOB)"
|
||||||
STORE_KV = "REPLACE INTO Dict (key, value) VALUES (CAST(? AS BLOB), CAST(? AS BLOB))"
|
STORE_KV = "REPLACE INTO Dict (key, value) VALUES (CAST(? AS BLOB), CAST(? AS BLOB))"
|
||||||
DELETE_KEY = "DELETE FROM Dict WHERE key = CAST(? AS BLOB)"
|
DELETE_KEY = "DELETE FROM Dict WHERE key = CAST(? AS BLOB)"
|
||||||
ITER_KEYS = "SELECT key FROM Dict"
|
ITER_KEYS = "SELECT key FROM Dict"
|
||||||
|
REORGANIZE = "VACUUM"
|
||||||
|
|
||||||
|
|
||||||
class error(OSError):
|
class error(OSError):
|
||||||
|
@ -122,6 +123,9 @@ class _Database(MutableMapping):
|
||||||
def __exit__(self, *args):
|
def __exit__(self, *args):
|
||||||
self.close()
|
self.close()
|
||||||
|
|
||||||
|
def reorganize(self):
|
||||||
|
self._execute(REORGANIZE)
|
||||||
|
|
||||||
|
|
||||||
def open(filename, /, flag="r", mode=0o666):
|
def open(filename, /, flag="r", mode=0o666):
|
||||||
"""Open a dbm.sqlite3 database and return the dbm object.
|
"""Open a dbm.sqlite3 database and return the dbm object.
|
||||||
|
|
|
@ -171,6 +171,11 @@ class Shelf(collections.abc.MutableMapping):
|
||||||
if hasattr(self.dict, 'sync'):
|
if hasattr(self.dict, 'sync'):
|
||||||
self.dict.sync()
|
self.dict.sync()
|
||||||
|
|
||||||
|
def reorganize(self):
|
||||||
|
self.sync()
|
||||||
|
if hasattr(self.dict, 'reorganize'):
|
||||||
|
self.dict.reorganize()
|
||||||
|
|
||||||
|
|
||||||
class BsdDbShelf(Shelf):
|
class BsdDbShelf(Shelf):
|
||||||
"""Shelf implementation using the "BSD" db interface.
|
"""Shelf implementation using the "BSD" db interface.
|
||||||
|
|
|
@ -135,6 +135,67 @@ class AnyDBMTestCase:
|
||||||
assert(f[key] == b"Python:")
|
assert(f[key] == b"Python:")
|
||||||
f.close()
|
f.close()
|
||||||
|
|
||||||
|
def test_anydbm_readonly_reorganize(self):
|
||||||
|
self.init_db()
|
||||||
|
with dbm.open(_fname, 'r') as d:
|
||||||
|
# Early stopping.
|
||||||
|
if not hasattr(d, 'reorganize'):
|
||||||
|
self.skipTest("method reorganize not available this dbm submodule")
|
||||||
|
|
||||||
|
self.assertRaises(dbm.error, lambda: d.reorganize())
|
||||||
|
|
||||||
|
def test_anydbm_reorganize_not_changed_content(self):
|
||||||
|
self.init_db()
|
||||||
|
with dbm.open(_fname, 'c') as d:
|
||||||
|
# Early stopping.
|
||||||
|
if not hasattr(d, 'reorganize'):
|
||||||
|
self.skipTest("method reorganize not available this dbm submodule")
|
||||||
|
|
||||||
|
keys_before = sorted(d.keys())
|
||||||
|
values_before = [d[k] for k in keys_before]
|
||||||
|
d.reorganize()
|
||||||
|
keys_after = sorted(d.keys())
|
||||||
|
values_after = [d[k] for k in keys_before]
|
||||||
|
self.assertEqual(keys_before, keys_after)
|
||||||
|
self.assertEqual(values_before, values_after)
|
||||||
|
|
||||||
|
def test_anydbm_reorganize_decreased_size(self):
|
||||||
|
|
||||||
|
def _calculate_db_size(db_path):
|
||||||
|
if os.path.isfile(db_path):
|
||||||
|
return os.path.getsize(db_path)
|
||||||
|
total_size = 0
|
||||||
|
for root, _, filenames in os.walk(db_path):
|
||||||
|
for filename in filenames:
|
||||||
|
file_path = os.path.join(root, filename)
|
||||||
|
total_size += os.path.getsize(file_path)
|
||||||
|
return total_size
|
||||||
|
|
||||||
|
# This test requires relatively large databases to reliably show difference in size before and after reorganizing.
|
||||||
|
with dbm.open(_fname, 'n') as f:
|
||||||
|
# Early stopping.
|
||||||
|
if not hasattr(f, 'reorganize'):
|
||||||
|
self.skipTest("method reorganize not available this dbm submodule")
|
||||||
|
|
||||||
|
for k in self._dict:
|
||||||
|
f[k.encode('ascii')] = self._dict[k] * 100000
|
||||||
|
db_keys = list(f.keys())
|
||||||
|
|
||||||
|
# Make sure to calculate size of database only after file is closed to ensure file content are flushed to disk.
|
||||||
|
size_before = _calculate_db_size(os.path.dirname(_fname))
|
||||||
|
|
||||||
|
# Delete some elements from the start of the database.
|
||||||
|
keys_to_delete = db_keys[:len(db_keys) // 2]
|
||||||
|
with dbm.open(_fname, 'c') as f:
|
||||||
|
for k in keys_to_delete:
|
||||||
|
del f[k]
|
||||||
|
f.reorganize()
|
||||||
|
|
||||||
|
# Make sure to calculate size of database only after file is closed to ensure file content are flushed to disk.
|
||||||
|
size_after = _calculate_db_size(os.path.dirname(_fname))
|
||||||
|
|
||||||
|
self.assertLess(size_after, size_before)
|
||||||
|
|
||||||
def test_open_with_bytes(self):
|
def test_open_with_bytes(self):
|
||||||
dbm.open(os.fsencode(_fname), "c").close()
|
dbm.open(os.fsencode(_fname), "c").close()
|
||||||
|
|
||||||
|
|
|
@ -1365,6 +1365,7 @@ Milan Oberkirch
|
||||||
Pascal Oberndoerfer
|
Pascal Oberndoerfer
|
||||||
Géry Ogam
|
Géry Ogam
|
||||||
Seonkyo Ok
|
Seonkyo Ok
|
||||||
|
Andrea Oliveri
|
||||||
Jeffrey Ollie
|
Jeffrey Ollie
|
||||||
Adam Olsen
|
Adam Olsen
|
||||||
Bryan Olson
|
Bryan Olson
|
||||||
|
|
|
@ -0,0 +1,2 @@
|
||||||
|
:mod:`shelve` as well as underlying :mod:`!dbm.dumb` and :mod:`!dbm.sqlite` now have :meth:`!reorganize` methods to
|
||||||
|
recover unused free space previously occupied by deleted entries.
|
Loading…
Add table
Add a link
Reference in a new issue