Commit graph

41 commits

Author SHA1 Message Date
Andrew Zipperer
a6b610a94b
docs: typo: tiny grammar change: "pointed by" -> "pointed to by" (#118411)
* docs: tiny grammar change: "pointed by" -> "pointed to by"

This commit uses "file pointed to by" to replace "file pointed by" in
 - doc for shutil.copytree
 - docstring for shutil.copytree
 - docstring _abc.PathBase.open
 - docstring for pathlib.Path.open
 - doc for os.copy_file_range
 - doc for os.splice

The docs use "file pointed to by" more frequently than
"file pointed by". So, this commit replaces the uses of
"file pointed by" in order to make the uses consistent
through the docs.

```bash
$ grep -ri 'pointed to by' cpython/
```
yields more results than
```bash
$ grep -ri 'pointed by' cpython/
```

Separately:

There are two occurrences of "tree pointed by":
 - cpython/Doc/library/xml.etree.elementtree.rst for
     `xml.etree.ElementInclude.include`
 - cpython/Lib/xml/etree/ElementInclude.py for `include`

For those uses of "tree pointed by", I expect "tree pointed to by"
instead. However, I found enough uses online of (a) "tree pointed by"
rather than (b) "tree pointed to by" to convince me that (a) is in
common use.

So, this commit does not replace those occurrences of "tree pointed by"
to "tree pointed to by". But I will replace them if a reviewer
believes it is correct to replace them.

* docs: typo: "exists and executable" -> "exists and is executable"

---------

Co-authored-by: Andrew-Zipperer <atzipperer@gmail.com>
2024-05-02 05:37:12 +00:00
Barney Gale
a74f117dab
GH-115060: Speed up pathlib.Path.glob() by omitting initial stat() (#117831)
Since 6258844c, paths that might not exist can be fed into pathlib's
globbing implementation, which will call `os.scandir()` / `os.lstat()` only
when strictly necessary. This allows us to drop an initial `self.is_dir()`
call, which saves a `stat()`.

Co-authored-by: Shantanu <12621235+hauntsaninja@users.noreply.github.com>
2024-04-14 00:08:03 +01:00
Barney Gale
0eb52f5f26
GH-115060: Speed up pathlib.Path.glob() by not scanning literal parts (#117732)
Don't bother calling `os.scandir()` to scan for literal pattern segments,
like `foo` in `foo/*.py`. Instead, append the segment(s) as-is and call
through to the next selector with `exists=False`, which signals that the
path might not exist. Subsequent selectors will call `os.scandir()` or
`os.lstat()` to filter out missing paths as needed.
2024-04-12 22:19:21 +01:00
Barney Gale
0cc71bde00
GH-117586: Speed up pathlib.Path.walk() by working with strings (#117726)
Move `pathlib.Path.walk()` implementation into `glob._Globber`. The new
`glob._Globber.walk()` classmethod works with strings internally, which is
a little faster than generating `Path` objects and keeping them normalized.
The `pathlib.Path.walk()` method converts the strings back to path objects.

In the private pathlib ABCs, our existing subclass of `_Globber` ensures
that `PathBase` instances are used throughout.

Follow-up to #117589.
2024-04-11 01:26:53 +01:00
Barney Gale
6258844c27
GH-117586: Speed up pathlib.Path.glob() by working with strings (#117589)
Move pathlib globbing implementation into a new private class: `glob._Globber`. This class implements fast string-based globbing. It's called by `pathlib.Path.glob()`, which then converts strings back to path objects.

In the private pathlib ABCs, add a `pathlib._abc.Globber` subclass that works with `PathBase` objects rather than strings, and calls user-defined path methods like `PathBase.stat()` rather than `os.stat()`.

This sets the stage for two more improvements:

- GH-115060: Query non-wildcard segments with `lstat()`
- GH-116380: Unify `pathlib` and `glob` implementations of globbing.

No change to the implementations of `glob.glob()` and `glob.iglob()`.
2024-04-10 20:43:07 +01:00
Barney Gale
6150bb2412
GH-77609: Add recurse_symlinks argument to pathlib.Path.glob() (#117311)
Replace tri-state `follow_symlinks` with boolean `recurse_symlinks` argument. The new argument controls whether symlinks are followed when expanding recursive `**` wildcards. The possible argument values correspond as follows:

    follow_symlinks  recurse_symlinks
    ===============  ================
    False            N/A
    None             False
    True             True

We therefore drop support for not following symlinks when expanding non-recursive pattern parts; it wasn't requested in the original issue, and it's a feature not found in any shells.

This makes the API a easier to grok by eliminating `None` as an option.

No news blurb as `follow_symlinks` was new in 3.13.
2024-04-05 18:51:54 +00:00
Barney Gale
752e18389e
GH-114575: Rename PurePath.pathmod to PurePath.parser (#116513)
And rename the private base class from `PathModuleBase` to `ParserBase`.
2024-03-31 19:14:48 +01:00
Barney Gale
1dce0073da
pathlib ABCs: follow all symlinks in PathBase.glob() (#116293)
Switch the default value of *follow_symlinks* from `None` to `True` in
`pathlib._abc.PathBase.glob()` and `rglob()`. This speeds up recursive
globbing.

No change to the public pathlib classes.
2024-03-04 02:26:33 +00:00
Barney Gale
e3dedeae7a
GH-114610: Fix pathlib.PurePath.with_stem('') handling of file extensions (#114612)
Raise `ValueError` if `with_stem('')` is called on a path with a file
extension. Paths may only have an empty stem if they also have an empty
suffix.
2024-02-24 19:37:03 +00:00
Barney Gale
6f93b4df92
GH-115060: Speed up pathlib.Path.glob() by removing redundant regex matching (#115061)
When expanding and filtering paths for a `**` wildcard segment, build an `re.Pattern` object from the subsequent pattern parts, rather than the entire pattern, and match against the `os.DirEntry` object prior to instantiating a path object. Also skip compiling a pattern when expanding a `*` wildcard segment.
2024-02-10 18:12:34 +00:00
Barney Gale
1b1f8398d0
GH-106747: Make pathlib ABC globbing more consistent with glob.glob() (#115056)
When expanding `**` wildcards, ensure we add a trailing slash to the
topmost directory path. This matches `glob.glob()` behaviour:

    >>> glob.glob('dirA/**', recursive=True)
    ['dirA/', 'dirA/dirB', 'dirA/dirB/dirC']

This does not affect `pathlib.Path.glob()`, because trailing slashes aren't
supported in pathlib proper.
2024-02-06 02:48:18 +00:00
Barney Gale
574291963f
pathlib ABCs: drop partial, broken, untested support for bytes paths. (#114777)
Methods like `full_match()`, `glob()`, etc, are difficult to make work with
byte paths, and it's not worth the effort. This patch makes `PurePathBase`
raise `TypeError` when given non-`str` path segments.
2024-01-31 00:59:33 +00:00
Barney Gale
1667c28686
pathlib ABCs: raise UnsupportedOperation directly. (#114776)
Raise `UnsupportedOperation` directly, rather than via an `_unsupported()`
helper, to give human readers and IDEs/typecheckers/etc a bigger hint that
these methods are abstract.
2024-01-31 00:38:01 +00:00
Barney Gale
809eed4805
GH-114610: Fix pathlib._abc.PurePathBase.with_suffix('.ext') handling of stems (#114613)
Raise `ValueError` if `with_suffix('.ext')` is called on a path without a
stem. Paths may only have a non-empty suffix if they also have a non-empty
stem.

ABC-only bugfix; no effect on public classes.
2024-01-30 14:25:16 +00:00
Barney Gale
823a38a960
GH-79634: Speed up pathlib globbing by removing joinpath() call. (#114623)
Remove `self.joinpath('')` call that should have been removed in 6313cdde.

This makes `PathBase.glob('')` yield itself *without* adding a trailing slash. It's hard to say whether this is more or less correct, but at least everything else is faster, and there's no behaviour change in the public classes where empty glob patterns are disallowed.
2024-01-27 19:59:51 +00:00
Barney Gale
b69548a0f5
GH-73435: Add pathlib.PurePath.full_match() (#114350)
In 49f90ba we added support for the recursive wildcard `**` in
`pathlib.PurePath.match()`. This should allow arbitrary prefix and suffix
matching, like `p.match('foo/**')` or `p.match('**/foo')`, but there's a
problem: for relative patterns only, `match()` implicitly inserts a `**`
token on the left hand side, causing all patterns to match from the right.
As a result, it's impossible to match relative patterns from the left:
`PurePath('foo/bar').match('bar/**')` is true!

This commit reverts the changes to `match()`, and instead adds a new
`full_match()` method that:

- Allows empty patterns
- Supports the recursive wildcard `**`
- Matches the *entire* path when given a relative pattern
2024-01-26 01:12:46 +00:00
Barney Gale
1e610fb05f
GH-113225: Speed up pathlib.Path.walk(top_down=False) (#113693)
Use `_make_child_entry()` rather than `_make_child_relpath()` to retrieve
path objects for directories to visit. This saves the allocation of one
path object per directory in user subclasses of `PathBase`, and avoids a
second loop.

This trick does not apply when walking top-down, because users can affect
the walk by modifying *dirnames* in-place.

A side effect of this change is that, in bottom-up mode, subdirectories of
each directory are visited in reverse order, and that this order doesn't
match that of the names in *dirnames*. I suspect this is fine as the
order is arbitrary anyway.
2024-01-20 03:06:00 +00:00
Barney Gale
6313cdde58
GH-79634: Accept path-like objects as pathlib glob patterns. (#114017)
Allow `os.PathLike` objects to be passed as patterns to `pathlib.Path.glob()` and `rglob()`. (It's already possible to use them in `PurePath.match()`)

While we're in the area:

- Allow empty glob patterns in `PathBase` (but not `Path`)
- Speed up globbing in `PathBase` by generating paths with trailing slashes only as a final step, rather than for every intermediate directory.
- Simplify and speed up handling of rare patterns involving both `**` and `..` segments.
2024-01-20 02:10:25 +00:00
Barney Gale
4de4e654e5
Replace pathlib._abc.PathModuleBase.splitroot() with splitdrive() (#114065)
This allows users of the `pathlib-abc` PyPI package to use `posixpath` or
`ntpath` as a path module in versions of Python lacking
`os.path.splitroot()` (3.11 and before).
2024-01-14 23:06:04 +00:00
Barney Gale
ca6cf56330
Add pathlib._abc.PathModuleBase (#113893)
Path modules provide a subset of the `os.path` API, specifically those
functions needed to provide `PurePathBase` functionality. Each
`PurePathBase` subclass references its path module via a `pathmod` class
attribute.

This commit adds a new `PathModuleBase` class, which provides abstract
methods that unconditionally raise `UnsupportedOperation`. An instance of
this class is assigned to `PurePathBase.pathmod`, replacing `posixpath`.
As a result, `PurePathBase` is no longer POSIX-y by default, and
all its methods raise `UnsupportedOperation` courtesy of `pathmod`.

Users who subclass `PurePathBase` or `PathBase` should choose the path
syntax by setting `pathmod` to `posixpath`, `ntpath`, `os.path`, or their
own subclass of `PathModuleBase`, as circumstances demand.
2024-01-14 21:49:53 +00:00
Barney Gale
21f83efd10
Add module docstring for pathlib._abc. (#113691) 2024-01-13 08:47:00 +00:00
Barney Gale
f20b151a1c
pathlib ABCs: add _raw_path property (#113976)
It's wrong for the `PurePathBase` methods to rely so much on `__str__()`.
Instead, they should treat the raw path(s) as opaque objects and leave the
details to `pathmod`.

This commit adds a `PurePathBase._raw_path` property and uses it through
many of the other ABC methods. These methods are all redefined in
`PurePath` and `Path`, so this has no effect on the public classes.
2024-01-13 08:03:21 +00:00
Barney Gale
e4ff131e01
GH-44626, GH-105476: Fix ntpath.isabs() handling of part-absolute paths (#113829)
On Windows, `os.path.isabs()` now returns `False` when given a path that
starts with exactly one (back)slash. This is more compatible with other
functions in `os.path`, and with Microsoft's own documentation.

Also adjust `pathlib.PureWindowsPath.is_absolute()` to call
`ntpath.isabs()`, which corrects its handling of partial UNC/device paths
like `//foo`.

Co-authored-by: Jon Foster <jon@jon-foster.co.uk>
2024-01-13 07:36:05 +00:00
Barney Gale
5d8a3e74b5
pathlib ABCs: Require one or more initialiser arguments (#113885)
Refuse to guess what a user means when they initialise a pathlib ABC
without any positional arguments. In mainline pathlib it's normalised to
`.`, but in the ABCs this guess isn't appropriate; for example, the path
type may not represent the current directory as `.`, or may have no concept
of a "current directory" at all.
2024-01-10 01:12:58 +00:00
Barney Gale
beb80d11ec
GH-113528: Deoptimise pathlib._abc.PurePathBase (#113559)
Apply pathlib's normalization and performance tuning in `pathlib.PurePath`, but not `pathlib._abc.PurePathBase`.

With this change, the pathlib ABCs do not normalize away alternate path separators, empty segments, or dot segments. A single string given to the initialiser will round-trip by default, i.e. `str(PurePathBase(my_string)) == my_string`. Implementors can set their own path domain-specific normalization scheme by overriding `__str__()`

Eliminating path normalization makes maintaining and caching the path's parts and string representation both optional and not very useful, so this commit moves the `_drv`, `_root`, `_tail_cached` and `_str` slots from `PurePathBase` to `PurePath`. Only `_raw_paths` and `_resolving` slots remain in `PurePathBase`. This frees the ABCs from the burden of some of pathlib's hardest-to-understand code.
2024-01-09 23:52:15 +00:00
Barney Gale
cdca0ce0ad
GH-113528: Deoptimise pathlib._abc.PurePathBase.relative_to() (again) (#113882)
Restore full battle-tested implementations of `PurePath.[is_]relative_to()`. These were recently split up in 3375dfe and a15a773.

In `PurePathBase`, add entirely new implementations based on `_stack`, which itself calls `pathmod.split()` repeatedly to disassemble a path. These new implementations preserve features like trailing slashes where possible, while still observing that a `..` segment cannot be added to traverse an empty or `.` segment in *walk_up* mode. They do not rely on `parents` nor `__eq__()`, nor do they spin up temporary path objects.

Unfortunately calling `pathmod.relpath()` isn't an option, as it calls `abspath()` and in turn `os.getcwd()`, which is impure.
2024-01-09 23:04:14 +00:00
Barney Gale
5c7bd0e398
GH-113528: Deoptimise pathlib._abc.PurePathBase.parts (#113883)
Implement `parts` using `_stack`, which itself calls `pathmod.split()`
repeatedly. This avoids use of `_tail`, which will be moved to `PurePath`
shortly.
2024-01-09 22:46:50 +00:00
Barney Gale
1092cfb201
GH-113528: Deoptimise pathlib._abc.PathBase.resolve() (#113782)
Replace use of `_from_parsed_parts()` with `with_segments()` in
`resolve()`.

No effect on `Path.resolve()`, which uses `os.path.realpath()`.
2024-01-09 19:50:23 +00:00
Barney Gale
9100fc407e
GH-113528: Deoptimise pathlib._abc.PathBase._make_child_relpath() (#113532)
Call straight through to `joinpath()` in `PathBase._make_child_relpath()`.
Move optimised/caching code to `pathlib.Path._make_child_relpath()`
2024-01-09 19:11:17 +00:00
Barney Gale
b3dba18eab
GH-113528: Speed up pathlib ABC tests. (#113788)
- Add `__slots__` to dummy path classes.
- Return namedtuple rather than `os.stat_result` from `DummyPath.stat()`.
- Reduce maximum symlink count in `DummyPathWithSymlinks.resolve()`.
2024-01-08 19:31:52 +00:00
Barney Gale
a15a7735e6
GH-113528: Deoptimise pathlib._abc.PurePathBase.relative_to() (#113529)
Replace use of `_from_parsed_parts()` with `with_segments()` in
`PurePathBase.relative_to()`, and move the assignment of `_drv`, `_root`
and `_tail_cached` slots into `PurePath.relative_to()`.
2024-01-06 21:37:38 +00:00
Barney Gale
37bd893a22
GH-113528: Deoptimise pathlib._abc.PurePathBase.parent (#113530)
Replace use of `_from_parsed_parts()` with `with_segments()`, and move
assignments to `_drv`, `_root`, _tail_cached` and `_str` slots into
`PurePath`.
2024-01-06 21:17:51 +00:00
Barney Gale
1e914ad89d
GH-113528: Deoptimise pathlib._abc.PurePathBase.name (#113531)
Replace usage of `_from_parsed_parts()` with `with_segments()` in
`with_name()`, and take a similar approach in `name` for consistency's
sake.
2024-01-06 20:50:25 +00:00
Barney Gale
3375dfed40
GH-113568: Stop raising deprecation warnings from pathlib ABCs (#113757) 2024-01-05 22:56:04 +00:00
Barney Gale
3c4e972d6d
GH-113568: Stop raising auditing events from pathlib ABCs (#113571)
Raise auditing events in `pathlib.Path.glob()`, `rglob()` and `walk()`,
but not in `pathlib._abc.PathBase` methods. Also move generation of a
deprecation warning into `pathlib.Path` so it gets the right stack level.
2024-01-05 21:41:19 +00:00
Barney Gale
b664d91599
GH-113225: Speed up pathlib._abc.PathBase.glob() (#113556)
`PathBase._scandir()` is implemented using `iterdir()`, so we can use its
results directly, rather than passing them through `_make_child_relpath()`.
2023-12-28 22:23:01 +00:00
Barney Gale
1b19d73768
GH-110109: pathlib ABCs: drop use of warnings._deprecated() (#113419)
The `pathlib._abc` module will be made available as a PyPI backport
supporting Python 3.8+. The `warnings._deprecated()` function was only
added last year, and it's private from an external package perspective, so
here we switch to `warnings.warn()` instead.
2023-12-27 15:40:03 +00:00
Barney Gale
f8b6e171ad
GH-110109: pathlib ABCs: drop use of io.text_encoding() (#113417)
Do not use the locale-specific default encoding in `PathBase.read_text()`
and `write_text()`. Locale settings shouldn't influence the operation of
these base classes, which are intended mostly for implementing rich paths
on *nonlocal* filesystems.
2023-12-27 15:32:35 +00:00
Barney Gale
a0d3d3ec9d
GH-110109: pathlib ABCs: do not vary path syntax by host OS. (#113219)
Change the value of `pathlib._abc.PurePathBase.pathmod` from `os.path` to
`posixpath`.

User subclasses of `PurePathBase` and `PathBase` previously used the host
OS's path syntax, e.g. backslashes as separators on Windows. This is wrong
in most use cases, and likely to catch developers out unless they test on
both Windows and non-Windows machines.

In this patch we change the default to POSIX syntax, regardless of OS. This
is somewhat arguable (why not make all aspects of syntax abstract and
individually configurable?) but an improvement all the same.

This change has no effect on `PurePath`, `Path`, nor their subclasses. Only
private APIs are affected.
2023-12-22 18:09:50 +00:00
Barney Gale
237e2cff00
GH-110109: Fix misleading pathlib._abc.PurePathBase repr (#113376)
`PurePathBase.__repr__()` produces a string like `MyPath('/foo')`. This
repr is incorrect/misleading when a subclass's `__init__()` method is
customized, which I expect to be the very common.

This commit moves the `__repr__()` method to `PurePath`, leaving
`PurePathBase` with the default `object` repr.

No user-facing changes because the `pathlib._abc` module remains private.
2023-12-22 15:11:16 +00:00
Barney Gale
a98e7a8112
GH-110109: Move pathlib ABCs to new pathlib._abc module. (#112881)
Move `_PurePathBase` and `_PathBase` to a new `pathlib._abc` module, and
drop the underscores from the class names.

Tests are mostly left alone in this commit, but they'll be similarly split
in a subsequent commit.

The `pathlib._abc` module will be published as an independent PyPI package
(similar to how `zipfile._path` is published as `zipp`), to be refined
and stabilised prior to its possible addition to the standard library.
2023-12-09 16:07:40 +01:00
Renamed from Lib/pathlib.py (Browse further)