gh-135953: Implement sampling tool under profile.sample (#135998)

Implement a statistical sampling profiler that can profile external Python processes by PID. Uses the _remote_debugging module and converts the results to pstats-compatible format for analysis. Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
2025-07-19 09:15:34 +00:00 · 2025-07-10 18:44:24 +01:00 · 2025-07-10 18:44:24 +01:00 · 59acdba820
commit 59acdba820
parent 35e2c35970
15 changed files with 3319 additions and 74 deletions
--- a/Doc/library/profile.rst
+++ b/Doc/library/profile.rst
@ -4,7 +4,7 @@
 The Python Profilers
 ********************
-**Source code:** :source:`Lib/profile.py` and :source:`Lib/pstats.py`
+**Source code:** :source:`Lib/profile.py`, :source:`Lib/pstats.py`, and :source:`Lib/profile/sample.py`
 --------------
@ -14,23 +14,32 @@ Introduction to the profilers
 =============================
 .. index::
   single: statistical profiling
   single: profiling, statistical
   single: deterministic profiling
   single: profiling, deterministic
-:mod:`cProfile` and :mod:`profile` provide :dfn:`deterministic profiling` of
+Python provides both :dfn:`statistical profiling` and :dfn:`deterministic profiling` of
 Python programs. A :dfn:`profile` is a set of statistics that describes how
 often and for how long various parts of the program executed. These statistics
 can be formatted into reports via the :mod:`pstats` module.
-The Python standard library provides two different implementations of the same
+The Python standard library provides three different profiling implementations:
 profiling interface:
-1. :mod:`cProfile` is recommended for most users; it's a C extension with
+**Statistical Profiler:**
 1. :mod:`profile.sample` provides statistical profiling of running Python processes
   using periodic stack sampling. It can attach to any running Python process without
   requiring code modification or restart, making it ideal for production debugging.
 **Deterministic Profilers:**
 2. :mod:`cProfile` is recommended for development and testing; it's a C extension with
   reasonable overhead that makes it suitable for profiling long-running
   programs.  Based on :mod:`lsprof`, contributed by Brett Rosen and Ted
   Czotter.
-2. :mod:`profile`, a pure Python module whose interface is imitated by
+3. :mod:`profile`, a pure Python module whose interface is imitated by
   :mod:`cProfile`, but which adds significant overhead to profiled programs.
   If you're trying to extend the profiler in some way, the task might be easier
   with this module.  Originally designed and written by Jim Roskind.
@ -44,6 +53,77 @@ profiling interface:
   but not for C-level functions, and so the C code would seem faster than any
   Python one.
 **Profiler Comparison:**
 +-------------------+----------------------+----------------------+----------------------+
 | Feature           | Statistical          | Deterministic        | Deterministic        |
 |                   | (``profile.sample``) | (``cProfile``)       | (``profile``)        |
 +===================+======================+======================+======================+
 | **Target**        | Running process      | Code you run         | Code you run         |
 +-------------------+----------------------+----------------------+----------------------+
 | **Overhead**      | Virtually none       | Moderate             | High                 |
 +-------------------+----------------------+----------------------+----------------------+
 | **Accuracy**      | Statistical approx.  | Exact call counts    | Exact call counts    |
 +-------------------+----------------------+----------------------+----------------------+
 | **Setup**         | Attach to any PID    | Instrument code      | Instrument code      |
 +-------------------+----------------------+----------------------+----------------------+
 | **Use Case**      | Production debugging | Development/testing  | Profiler extension   |
 +-------------------+----------------------+----------------------+----------------------+
 | **Implementation**| C extension          | C extension          | Pure Python          |
 +-------------------+----------------------+----------------------+----------------------+
 .. note::
   The statistical profiler (:mod:`profile.sample`) is recommended for most production
   use cases due to its extremely low overhead and ability to profile running processes
   without modification. It can attach to any Python process and collect performance
   data with minimal impact on execution speed, making it ideal for debugging
   performance issues in live applications.
 .. _statistical-profiling:
 What Is Statistical Profiling?
 ==============================
 :dfn:`Statistical profiling` works by periodically interrupting a running
 program to capture its current call stack. Rather than monitoring every
 function entry and exit like deterministic profilers, it takes snapshots at
 regular intervals to build a statistical picture of where the program spends
 its time.
 The sampling profiler uses process memory reading (via system calls like
 ``process_vm_readv`` on Linux, ``vm_read`` on macOS, and ``ReadProcessMemory`` on
 Windows) to attach to a running Python process and extract stack trace
 information without requiring any code modification or restart of the target
 process. This approach provides several key advantages over traditional
 profiling methods.
 The fundamental principle is that if a function appears frequently in the
 collected stack samples, it is likely consuming significant CPU time. By
 analyzing thousands of samples, the profiler can accurately estimate the
 relative time spent in different parts of the program. The statistical nature
 means that while individual measurements may vary, the aggregate results
 converge to represent the true performance characteristics of the application.
 Since statistical profiling operates externally to the target process, it
 introduces virtually no overhead to the running program. The profiler process
 runs separately and reads the target process memory without interrupting its
 execution. This makes it suitable for profiling production systems where
 performance impact must be minimized.
 The accuracy of statistical profiling improves with the number of samples
 collected. Short-lived functions may be missed or underrepresented, while
 long-running functions will be captured proportionally to their execution time.
 This characteristic makes statistical profiling particularly effective for
 identifying the most significant performance bottlenecks rather than providing
 exhaustive coverage of all function calls.
 Statistical profiling excels at answering questions like "which functions
 consume the most CPU time?" and "where should I focus optimization efforts?"
 rather than "exactly how many times was this function called?" The trade-off
 between precision and practicality makes it an invaluable tool for performance
 analysis in real-world applications.
 .. _profile-instant:
@ -54,6 +134,18 @@ This section is provided for users that "don't want to read the manual." It
 provides a very brief overview, and allows a user to rapidly perform profiling
 on an existing application.
 **Statistical Profiling (Recommended for Production):**
 To profile an existing running process::
   python -m profile.sample 1234
 To profile with custom settings::
   python -m profile.sample -i 50 -d 30 1234
 **Deterministic Profiling (Development/Testing):**
 To profile a function that takes a single argument, you can do::
   import cProfile
@ -121,8 +213,208 @@ results to a file by specifying a filename to the :func:`run` function::
 The :class:`pstats.Stats` class reads profile results from a file and formats
 them in various ways.
 .. _sampling-profiler-cli:
 Statistical Profiler Command Line Interface
 ===========================================
 .. program:: profile.sample
 The :mod:`profile.sample` module can be invoked as a script to profile running processes::
   python -m profile.sample [options] PID
 **Basic Usage Examples:**
 Profile process 1234 for 10 seconds with default settings::
   python -m profile.sample 1234
 Profile with custom interval and duration, save to file::
   python -m profile.sample -i 50 -d 30 -o profile.stats 1234
 Generate collapsed stacks to use with tools like `flamegraph.pl
 <https://github.com/brendangregg/FlameGraph>`_::
   python -m profile.sample --collapsed 1234
 Profile all threads, sort by total time::
   python -m profile.sample -a --sort-tottime 1234
 Profile with real-time sampling statistics::
   python -m profile.sample --realtime-stats 1234
 **Command Line Options:**
 .. option:: PID
   Process ID of the Python process to profile (required)
 .. option:: -i, --interval INTERVAL
   Sampling interval in microseconds (default: 100)
 .. option:: -d, --duration DURATION
   Sampling duration in seconds (default: 10)
 .. option:: -a, --all-threads
   Sample all threads in the process instead of just the main thread
 .. option:: --realtime-stats
   Print real-time sampling statistics during profiling
 .. option:: --pstats
   Generate pstats output (default)
 .. option:: --collapsed
   Generate collapsed stack traces for flamegraphs
 .. option:: -o, --outfile OUTFILE
   Save output to a file
 **Sorting Options (pstats format only):**
 .. option:: --sort-nsamples
   Sort by number of direct samples
 .. option:: --sort-tottime
   Sort by total time
 .. option:: --sort-cumtime
   Sort by cumulative time (default)
 .. option:: --sort-sample-pct
   Sort by sample percentage
 .. option:: --sort-cumul-pct
   Sort by cumulative sample percentage
 .. option:: --sort-nsamples-cumul
   Sort by cumulative samples
 .. option:: --sort-name
   Sort by function name
 .. option:: -l, --limit LIMIT
   Limit the number of rows in the output (default: 15)
 .. option:: --no-summary
   Disable the summary section in the output
 **Understanding Statistical Profile Output:**
 The statistical profiler produces output similar to deterministic profilers but with different column meanings::
   Profile Stats:
          nsamples  sample%     tottime (ms)  cumul%    cumtime (ms)  filename:lineno(function)
             45/67     12.5        23.450     18.6        56.780     mymodule.py:42(process_data)
             23/23      6.4        15.230      6.4        15.230     <built-in>:0(len)
 **Column Meanings:**
 - **nsamples**: ``direct/cumulative`` - Times function was directly executing / on call stack
 - **sample%**: Percentage of total samples where function was directly executing
 - **tottime**: Estimated time spent directly in this function
 - **cumul%**: Percentage of samples where function was anywhere on call stack
 - **cumtime**: Estimated cumulative time including called functions
 - **filename:lineno(function)**: Location and name of the function
 .. _profile-cli:
 :mod:`profile.sample` Module Reference
 =======================================================
 .. module:: profile.sample
   :synopsis: Python statistical profiler.
 This section documents the programmatic interface for the :mod:`profile.sample` module.
 For command-line usage, see :ref:`sampling-profiler-cli`. For conceptual information
 about statistical profiling, see :ref:`statistical-profiling`
 .. function:: sample(pid, *, sort=2, sample_interval_usec=100, duration_sec=10, filename=None, all_threads=False, limit=None, show_summary=True, output_format="pstats", realtime_stats=False)
   Sample a Python process and generate profiling data.
   This is the main entry point for statistical profiling. It creates a
   :class:`SampleProfiler`, collects stack traces from the target process, and
   outputs the results in the specified format.
   :param int pid: Process ID of the target Python process
   :param int sort: Sort order for pstats output (default: 2 for cumulative time)
   :param int sample_interval_usec: Sampling interval in microseconds (default: 100)
   :param int duration_sec: Duration to sample in seconds (default: 10)
   :param str filename: Output filename (None for stdout/default naming)
   :param bool all_threads: Whether to sample all threads (default: False)
   :param int limit: Maximum number of functions to display (default: None)
   :param bool show_summary: Whether to show summary statistics (default: True)
   :param str output_format: Output format - 'pstats' or 'collapsed' (default: 'pstats')
   :param bool realtime_stats: Whether to display real-time statistics (default: False)
   :raises ValueError: If output_format is not 'pstats' or 'collapsed'
   Examples::
       # Basic usage - profile process 1234 for 10 seconds
       import profile.sample
       profile.sample.sample(1234)
       # Profile with custom settings
       profile.sample.sample(1234, duration_sec=30, sample_interval_usec=50, all_threads=True)
       # Generate collapsed stack traces for flamegraph.pl
       profile.sample.sample(1234, output_format='collapsed', filename='profile.collapsed')
 .. class:: SampleProfiler(pid, sample_interval_usec, all_threads)
   Low-level API for the statistical profiler.
   This profiler uses periodic stack sampling to collect performance data
   from running Python processes with minimal overhead. It can attach to
   any Python process by PID and collect stack traces at regular intervals.
   :param int pid: Process ID of the target Python process
   :param int sample_interval_usec: Sampling interval in microseconds
   :param bool all_threads: Whether to sample all threads or just the main thread
   .. method:: sample(collector, duration_sec=10)
      Sample the target process for the specified duration.
      Collects stack traces from the target process at regular intervals
      and passes them to the provided collector for processing.
      :param collector: Object that implements ``collect()`` method to process stack traces
      :param int duration_sec: Duration to sample in seconds (default: 10)
      The method tracks sampling statistics and can display real-time
      information if realtime_stats is enabled.
 .. seealso::
   :ref:`sampling-profiler-cli`
      Command-line interface documentation for the statistical profiler.
 Deterministic Profiler Command Line Interface
 =============================================
 .. program:: cProfile
 The files :mod:`cProfile` and :mod:`profile` can also be invoked as a script to
@ -564,7 +856,7 @@ What Is Deterministic Profiling?
 call*, *function return*, and *exception* events are monitored, and precise
 timings are made for the intervals between these events (during which time the
 user's code is executing).  In contrast, :dfn:`statistical profiling` (which is
-not done by this module) randomly samples the effective instruction pointer, and
+provided by the :mod:`profile.sample` module) periodically samples the effective instruction pointer, and
 deduces where time is being spent.  The latter technique traditionally involves
 less overhead (as the code does not need to be instrumented), but provides only
 relative indications of where time is being spent.
--- a/Doc/whatsnew/3.15.rst
+++ b/Doc/whatsnew/3.15.rst
@ -70,6 +70,103 @@ Summary --- release highlights
 New features
 ============
 .. _whatsnew315-sampling-profiler:
 High frequency statistical sampling profiler
 --------------------------------------------
 A new statistical sampling profiler has been added to the :mod:`profile` module as
 :mod:`profile.sample`. This profiler enables low-overhead performance analysis of
 running Python processes without requiring code modification or process restart.
 Unlike deterministic profilers (:mod:`cProfile` and :mod:`profile`) that instrument
 every function call, the sampling profiler periodically captures stack traces from
 running processes.  This approach provides virtually zero overhead while achieving
 sampling rates of **up to 200,000 Hz**, making it the fastest sampling profiler
 available for Python (at the time of its contribution) and ideal for debugging
 performance issues in production environments.
 Key features include:
 * **Zero-overhead profiling**: Attach to any running Python process without
  affecting its performance
 * **No code modification required**: Profile existing applications without restart
 * **Real-time statistics**: Monitor sampling quality during data collection
 * **Multiple output formats**: Generate both detailed statistics and flamegraph data
 * **Thread-aware profiling**: Option to profile all threads or just the main thread
 Profile process 1234 for 10 seconds with default settings::
  python -m profile.sample 1234
 Profile with custom interval and duration, save to file::
  python -m profile.sample -i 50 -d 30 -o profile.stats 1234
 Generate collapsed stacks for flamegraph::
  python -m profile.sample --collapsed 1234
 Profile all threads and sort by total time::
  python -m profile.sample -a --sort-tottime 1234
 The profiler generates statistical estimates of where time is spent::
  Real-time sampling stats: Mean: 100261.5Hz (9.97µs) Min: 86333.4Hz (11.58µs) Max: 118807.2Hz (8.42µs) Samples: 400001
  Captured 498841 samples in 5.00 seconds
  Sample rate: 99768.04 samples/sec
  Error rate: 0.72%
  Profile Stats:
        nsamples   sample%   tottime (s)    cumul%   cumtime (s)  filename:lineno(function)
        43/418858       0.0         0.000      87.9         4.189  case.py:667(TestCase.run)
      3293/418812       0.7         0.033      87.9         4.188  case.py:613(TestCase._callTestMethod)
    158562/158562      33.3         1.586      33.3         1.586  test_compile.py:725(TestSpecifics.test_compiler_recursion_limit.<locals>.check_limit)
    129553/129553      27.2         1.296      27.2         1.296  ast.py:46(parse)
        0/128129       0.0         0.000      26.9         1.281  test_ast.py:884(AST_Tests.test_ast_recursion_limit.<locals>.check_limit)
          7/67446       0.0         0.000      14.2         0.674  test_compile.py:729(TestSpecifics.test_compiler_recursion_limit)
          6/60380       0.0         0.000      12.7         0.604  test_ast.py:888(AST_Tests.test_ast_recursion_limit)
          3/50020       0.0         0.000      10.5         0.500  test_compile.py:727(TestSpecifics.test_compiler_recursion_limit)
          1/38011       0.0         0.000       8.0         0.380  test_ast.py:886(AST_Tests.test_ast_recursion_limit)
          1/25076       0.0         0.000       5.3         0.251  test_compile.py:728(TestSpecifics.test_compiler_recursion_limit)
      22361/22362       4.7         0.224       4.7         0.224  test_compile.py:1368(TestSpecifics.test_big_dict_literal)
          4/18008       0.0         0.000       3.8         0.180  test_ast.py:889(AST_Tests.test_ast_recursion_limit)
        11/17696       0.0         0.000       3.7         0.177  subprocess.py:1038(Popen.__init__)
      16968/16968       3.6         0.170       3.6         0.170  subprocess.py:1900(Popen._execute_child)
          2/16941       0.0         0.000       3.6         0.169  test_compile.py:730(TestSpecifics.test_compiler_recursion_limit)
  Legend:
    nsamples: Direct/Cumulative samples (direct executing / on call stack)
    sample%: Percentage of total samples this function was directly executing
    tottime: Estimated total time spent directly in this function
    cumul%: Percentage of total samples when this function was on the call stack
    cumtime: Estimated cumulative time (including time in called functions)
    filename:lineno(function): Function location and name
  Summary of Interesting Functions:
  Functions with Highest Direct/Cumulative Ratio (Hot Spots):
    1.000 direct/cumulative ratio, 33.3% direct samples: test_compile.py:(TestSpecifics.test_compiler_recursion_limit.<locals>.check_limit)
    1.000 direct/cumulative ratio, 27.2% direct samples: ast.py:(parse)
    1.000 direct/cumulative ratio, 3.6% direct samples: subprocess.py:(Popen._execute_child)
  Functions with Highest Call Frequency (Indirect Calls):
    418815 indirect calls, 87.9% total stack presence: case.py:(TestCase.run)
    415519 indirect calls, 87.9% total stack presence: case.py:(TestCase._callTestMethod)
    159470 indirect calls, 33.5% total stack presence: test_compile.py:(TestSpecifics.test_compiler_recursion_limit)
  Functions with Highest Call Magnification (Cumulative/Direct):
    12267.9x call magnification, 159470 indirect calls from 13 direct: test_compile.py:(TestSpecifics.test_compiler_recursion_limit)
    10581.7x call magnification, 116388 indirect calls from 11 direct: test_ast.py:(AST_Tests.test_ast_recursion_limit)
    9740.9x call magnification, 418815 indirect calls from 43 direct: case.py:(TestCase.run)
 The profiler automatically identifies performance bottlenecks through statistical
 analysis, highlighting functions with high CPU usage and call frequency patterns.
 This capability is particularly valuable for debugging performance issues in
 production systems where traditional profiling approaches would be too intrusive.
 (Contributed by Pablo Galindo and László Kiss Kollár in :gh:`135953`.)
 Other language changes
--- a/Lib/profile/init.py
+++ b/Lib/profile/init.py
@ -0,0 +1,6 @@
 from .profile import run
 from .profile import runctx
 from .profile import Profile
 from .profile import _Utils
 __all__ = ['run', 'runctx', 'Profile']
--- a/Lib/profile/main.py
+++ b/Lib/profile/main.py
@ -0,0 +1,69 @@
 import io
 import importlib.machinery
 import os
 import sys
 from optparse import OptionParser
 from .profile import runctx
 def main():
    usage = "profile.py [-o output_file_path] [-s sort] [-m module | scriptfile] [arg] ..."
    parser = OptionParser(usage=usage)
    parser.allow_interspersed_args = False
    parser.add_option('-o', '--outfile', dest="outfile",
        help="Save stats to <outfile>", default=None)
    parser.add_option('-m', dest="module", action="store_true",
        help="Profile a library module.", default=False)
    parser.add_option('-s', '--sort', dest="sort",
        help="Sort order when printing to stdout, based on pstats.Stats class",
        default=-1)
    if not sys.argv[1:]:
        parser.print_usage()
        sys.exit(2)
    (options, args) = parser.parse_args()
    sys.argv[:] = args
    # The script that we're profiling may chdir, so capture the absolute path
    # to the output file at startup.
    if options.outfile is not None:
        options.outfile = os.path.abspath(options.outfile)
    if len(args) > 0:
        if options.module:
            import runpy
            code = "run_module(modname, run_name='__main__')"
            globs = {
                'run_module': runpy.run_module,
                'modname': args[0]
            }
        else:
            progname = args[0]
            sys.path.insert(0, os.path.dirname(progname))
            with io.open_code(progname) as fp:
                code = compile(fp.read(), progname, 'exec')
            spec = importlib.machinery.ModuleSpec(name='__main__', loader=None,
                                                  origin=progname)
            globs = {
                '__spec__': spec,
                '__file__': spec.origin,
                '__name__': spec.name,
                '__package__': None,
                '__cached__': None,
            }
        try:
            runctx(code, globs, None, options.outfile, options.sort)
        except BrokenPipeError as exc:
            # Prevent "Exception ignored" during interpreter shutdown.
            sys.stdout = None
            sys.exit(exc.errno)
    else:
        parser.print_usage()
    return parser
 # When invoked as main program, invoke the profiler on a script
 if __name__ == '__main__':
    main()
--- a/Lib/profile/collector.py
+++ b/Lib/profile/collector.py
@ -0,0 +1,11 @@
 from abc import ABC, abstractmethod
 class Collector(ABC):
    @abstractmethod
    def collect(self, stack_frames):
        """Collect profiling data from stack frames."""
    @abstractmethod
    def export(self, filename):
        """Export collected data to a file."""
--- a/Lib/profile/profile.py
+++ b/Lib/profile/profile.py
@ -550,66 +550,3 @@ class Profile:
        return mean
 #****************************************************************************
 def main():
    import os
    from optparse import OptionParser
    usage = "profile.py [-o output_file_path] [-s sort] [-m module | scriptfile] [arg] ..."
    parser = OptionParser(usage=usage)
    parser.allow_interspersed_args = False
    parser.add_option('-o', '--outfile', dest="outfile",
        help="Save stats to <outfile>", default=None)
    parser.add_option('-m', dest="module", action="store_true",
        help="Profile a library module.", default=False)
    parser.add_option('-s', '--sort', dest="sort",
        help="Sort order when printing to stdout, based on pstats.Stats class",
        default=-1)
    if not sys.argv[1:]:
        parser.print_usage()
        sys.exit(2)
    (options, args) = parser.parse_args()
    sys.argv[:] = args
    # The script that we're profiling may chdir, so capture the absolute path
    # to the output file at startup.
    if options.outfile is not None:
        options.outfile = os.path.abspath(options.outfile)
    if len(args) > 0:
        if options.module:
            import runpy
            code = "run_module(modname, run_name='__main__')"
            globs = {
                'run_module': runpy.run_module,
                'modname': args[0]
            }
        else:
            progname = args[0]
            sys.path.insert(0, os.path.dirname(progname))
            with io.open_code(progname) as fp:
                code = compile(fp.read(), progname, 'exec')
            spec = importlib.machinery.ModuleSpec(name='__main__', loader=None,
                                                  origin=progname)
            globs = {
                '__spec__': spec,
                '__file__': spec.origin,
                '__name__': spec.name,
                '__package__': None,
                '__cached__': None,
            }
        try:
            runctx(code, globs, None, options.outfile, options.sort)
        except BrokenPipeError as exc:
            # Prevent "Exception ignored" during interpreter shutdown.
            sys.stdout = None
            sys.exit(exc.errno)
    else:
        parser.print_usage()
    return parser
 # When invoked as main program, invoke the profiler on a script
 if __name__ == '__main__':
    main()
--- a/Lib/profile/pstats_collector.py
+++ b/Lib/profile/pstats_collector.py
@ -0,0 +1,81 @@
 import collections
 import marshal
 from .collector import Collector
 class PstatsCollector(Collector):
    def __init__(self, sample_interval_usec):
        self.result = collections.defaultdict(
            lambda: dict(total_rec_calls=0, direct_calls=0, cumulative_calls=0)
        )
        self.stats = {}
        self.sample_interval_usec = sample_interval_usec
        self.callers = collections.defaultdict(
            lambda: collections.defaultdict(int)
        )
    def collect(self, stack_frames):
        for thread_id, frames in stack_frames:
            if not frames:
                continue
            # Process each frame in the stack to track cumulative calls
            for frame in frames:
                location = (frame.filename, frame.lineno, frame.funcname)
                self.result[location]["cumulative_calls"] += 1
            # The top frame gets counted as an inline call (directly executing)
            top_frame = frames[0]
            top_location = (
                top_frame.filename,
                top_frame.lineno,
                top_frame.funcname,
            )
            self.result[top_location]["direct_calls"] += 1
            # Track caller-callee relationships for call graph
            for i in range(1, len(frames)):
                callee_frame = frames[i - 1]
                caller_frame = frames[i]
                callee = (
                    callee_frame.filename,
                    callee_frame.lineno,
                    callee_frame.funcname,
                )
                caller = (
                    caller_frame.filename,
                    caller_frame.lineno,
                    caller_frame.funcname,
                )
                self.callers[callee][caller] += 1
    def export(self, filename):
        self.create_stats()
        self._dump_stats(filename)
    def _dump_stats(self, file):
        stats_with_marker = dict(self.stats)
        stats_with_marker[("__sampled__",)] = True
        with open(file, "wb") as f:
            marshal.dump(stats_with_marker, f)
    # Needed for compatibility with pstats.Stats
    def create_stats(self):
        sample_interval_sec = self.sample_interval_usec / 1_000_000
        callers = {}
        for fname, call_counts in self.result.items():
            total = call_counts["direct_calls"] * sample_interval_sec
            cumulative_calls = call_counts["cumulative_calls"]
            cumulative = cumulative_calls * sample_interval_sec
            callers = dict(self.callers.get(fname, {}))
            self.stats[fname] = (
                call_counts["direct_calls"],  # cc = direct calls for sample percentage
                cumulative_calls,  # nc = cumulative calls for cumulative percentage
                total,
                cumulative,
                callers,
            )
--- a/Lib/profile/sample.py
+++ b/Lib/profile/sample.py
@ -0,0 +1,730 @@
 import argparse
 import _remote_debugging
 import os
 import pstats
 import statistics
 import sys
 import sysconfig
 import time
 from collections import deque
 from _colorize import ANSIColors
 from .pstats_collector import PstatsCollector
 from .stack_collector import CollapsedStackCollector
 FREE_THREADED_BUILD = sysconfig.get_config_var("Py_GIL_DISABLED") is not None
 class SampleProfiler:
    def __init__(self, pid, sample_interval_usec, all_threads):
        self.pid = pid
        self.sample_interval_usec = sample_interval_usec
        self.all_threads = all_threads
        if FREE_THREADED_BUILD:
            self.unwinder = _remote_debugging.RemoteUnwinder(
                self.pid, all_threads=self.all_threads
            )
        else:
            only_active_threads = bool(self.all_threads)
            self.unwinder = _remote_debugging.RemoteUnwinder(
                self.pid, only_active_thread=only_active_threads
            )
        # Track sample intervals and total sample count
        self.sample_intervals = deque(maxlen=100)
        self.total_samples = 0
        self.realtime_stats = False
    def sample(self, collector, duration_sec=10):
        sample_interval_sec = self.sample_interval_usec / 1_000_000
        running_time = 0
        num_samples = 0
        errors = 0
        start_time = next_time = time.perf_counter()
        last_sample_time = start_time
        realtime_update_interval = 1.0  # Update every second
        last_realtime_update = start_time
        while running_time < duration_sec:
            current_time = time.perf_counter()
            if next_time < current_time:
                try:
                    stack_frames = self.unwinder.get_stack_trace()
                    collector.collect(stack_frames)
                except ProcessLookupError:
                    break
                except (RuntimeError, UnicodeDecodeError, MemoryError, OSError):
                    errors += 1
                except Exception as e:
                    if not self._is_process_running():
                        break
                    raise e from None
                # Track actual sampling intervals for real-time stats
                if num_samples > 0:
                    actual_interval = current_time - last_sample_time
                    self.sample_intervals.append(
                        1.0 / actual_interval
                    )  # Convert to Hz
                    self.total_samples += 1
                    # Print real-time statistics if enabled
                    if (
                        self.realtime_stats
                        and (current_time - last_realtime_update)
                        >= realtime_update_interval
                    ):
                        self._print_realtime_stats()
                        last_realtime_update = current_time
                last_sample_time = current_time
                num_samples += 1
                next_time += sample_interval_sec
            running_time = time.perf_counter() - start_time
        # Clear real-time stats line if it was being displayed
        if self.realtime_stats and len(self.sample_intervals) > 0:
            print()  # Add newline after real-time stats
        print(f"Captured {num_samples} samples in {running_time:.2f} seconds")
        print(f"Sample rate: {num_samples / running_time:.2f} samples/sec")
        print(f"Error rate: {(errors / num_samples) * 100:.2f}%")
        expected_samples = int(duration_sec / sample_interval_sec)
        if num_samples < expected_samples:
            print(
                f"Warning: missed {expected_samples - num_samples} samples "
                f"from the expected total of {expected_samples} "
                f"({(expected_samples - num_samples) / expected_samples * 100:.2f}%)"
            )
    def _is_process_running(self):
        if sys.platform == "linux" or sys.platform == "darwin":
            try:
                os.kill(self.pid, 0)
                return True
            except ProcessLookupError:
                return False
        elif sys.platform == "win32":
            try:
                _remote_debugging.RemoteUnwinder(self.pid)
            except Exception:
                return False
            return True
        else:
            raise ValueError(f"Unsupported platform: {sys.platform}")
    def _print_realtime_stats(self):
        """Print real-time sampling statistics."""
        if len(self.sample_intervals) < 2:
            return
        # Calculate statistics on the Hz values (deque automatically maintains rolling window)
        hz_values = list(self.sample_intervals)
        mean_hz = statistics.mean(hz_values)
        min_hz = min(hz_values)
        max_hz = max(hz_values)
        # Calculate microseconds per sample for all metrics (1/Hz * 1,000,000)
        mean_us_per_sample = (1.0 / mean_hz) * 1_000_000 if mean_hz > 0 else 0
        min_us_per_sample = (
            (1.0 / max_hz) * 1_000_000 if max_hz > 0 else 0
        )  # Min time = Max Hz
        max_us_per_sample = (
            (1.0 / min_hz) * 1_000_000 if min_hz > 0 else 0
        )  # Max time = Min Hz
        # Clear line and print stats
        print(
            f"\r\033[K{ANSIColors.BOLD_BLUE}Real-time sampling stats:{ANSIColors.RESET} "
            f"{ANSIColors.YELLOW}Mean: {mean_hz:.1f}Hz ({mean_us_per_sample:.2f}µs){ANSIColors.RESET} "
            f"{ANSIColors.GREEN}Min: {min_hz:.1f}Hz ({max_us_per_sample:.2f}µs){ANSIColors.RESET} "
            f"{ANSIColors.RED}Max: {max_hz:.1f}Hz ({min_us_per_sample:.2f}µs){ANSIColors.RESET} "
            f"{ANSIColors.CYAN}Samples: {self.total_samples}{ANSIColors.RESET}",
            end="",
            flush=True,
        )
 def _determine_best_unit(max_value):
    """Determine the best unit (s, ms, μs) and scale factor for a maximum value."""
    if max_value >= 1.0:
        return "s", 1.0
    elif max_value >= 0.001:
        return "ms", 1000.0
    else:
        return "μs", 1000000.0
 def print_sampled_stats(
    stats, sort=-1, limit=None, show_summary=True, sample_interval_usec=100
 ):
    # Get the stats data
    stats_list = []
    for func, (
        direct_calls,
        cumulative_calls,
        total_time,
        cumulative_time,
        callers,
    ) in stats.stats.items():
        stats_list.append(
            (
                func,
                direct_calls,
                cumulative_calls,
                total_time,
                cumulative_time,
                callers,
            )
        )
    # Calculate total samples for percentage calculations (using direct_calls)
    total_samples = sum(
        direct_calls for _, direct_calls, _, _, _, _ in stats_list
    )
    # Sort based on the requested field
    sort_field = sort
    if sort_field == -1:  # stdname
        stats_list.sort(key=lambda x: str(x[0]))
    elif sort_field == 0:  # nsamples (direct samples)
        stats_list.sort(key=lambda x: x[1], reverse=True)  # direct_calls
    elif sort_field == 1:  # tottime
        stats_list.sort(key=lambda x: x[3], reverse=True)  # total_time
    elif sort_field == 2:  # cumtime
        stats_list.sort(key=lambda x: x[4], reverse=True)  # cumulative_time
    elif sort_field == 3:  # sample%
        stats_list.sort(
            key=lambda x: (x[1] / total_samples * 100)
            if total_samples > 0
            else 0,
            reverse=True,  # direct_calls percentage
        )
    elif sort_field == 4:  # cumul%
        stats_list.sort(
            key=lambda x: (x[2] / total_samples * 100)
            if total_samples > 0
            else 0,
            reverse=True,  # cumulative_calls percentage
        )
    elif sort_field == 5:  # nsamples (cumulative samples)
        stats_list.sort(key=lambda x: x[2], reverse=True)  # cumulative_calls
    # Apply limit if specified
    if limit is not None:
        stats_list = stats_list[:limit]
    # Determine the best unit for time columns based on maximum values
    max_total_time = max(
        (total_time for _, _, _, total_time, _, _ in stats_list), default=0
    )
    max_cumulative_time = max(
        (cumulative_time for _, _, _, _, cumulative_time, _ in stats_list),
        default=0,
    )
    total_time_unit, total_time_scale = _determine_best_unit(max_total_time)
    cumulative_time_unit, cumulative_time_scale = _determine_best_unit(
        max_cumulative_time
    )
    # Define column widths for consistent alignment
    col_widths = {
        "nsamples": 15,  # "nsamples" column (inline/cumulative format)
        "sample_pct": 8,  # "sample%" column
        "tottime": max(12, len(f"tottime ({total_time_unit})")),
        "cum_pct": 8,  # "cumul%" column
        "cumtime": max(12, len(f"cumtime ({cumulative_time_unit})")),
    }
    # Print header with colors and proper alignment
    print(f"{ANSIColors.BOLD_BLUE}Profile Stats:{ANSIColors.RESET}")
    header_nsamples = f"{ANSIColors.BOLD_BLUE}{'nsamples':>{col_widths['nsamples']}}{ANSIColors.RESET}"
    header_sample_pct = f"{ANSIColors.BOLD_BLUE}{'sample%':>{col_widths['sample_pct']}}{ANSIColors.RESET}"
    header_tottime = f"{ANSIColors.BOLD_BLUE}{f'tottime ({total_time_unit})':>{col_widths['tottime']}}{ANSIColors.RESET}"
    header_cum_pct = f"{ANSIColors.BOLD_BLUE}{'cumul%':>{col_widths['cum_pct']}}{ANSIColors.RESET}"
    header_cumtime = f"{ANSIColors.BOLD_BLUE}{f'cumtime ({cumulative_time_unit})':>{col_widths['cumtime']}}{ANSIColors.RESET}"
    header_filename = (
        f"{ANSIColors.BOLD_BLUE}filename:lineno(function){ANSIColors.RESET}"
    )
    print(
        f"{header_nsamples}  {header_sample_pct}  {header_tottime}  {header_cum_pct}  {header_cumtime}  {header_filename}"
    )
    # Print each line with proper alignment
    for (
        func,
        direct_calls,
        cumulative_calls,
        total_time,
        cumulative_time,
        callers,
    ) in stats_list:
        # Calculate percentages
        sample_pct = (
            (direct_calls / total_samples * 100) if total_samples > 0 else 0
        )
        cum_pct = (
            (cumulative_calls / total_samples * 100)
            if total_samples > 0
            else 0
        )
        # Format values with proper alignment - always use A/B format
        nsamples_str = f"{direct_calls}/{cumulative_calls}"
        nsamples_str = f"{nsamples_str:>{col_widths['nsamples']}}"
        sample_pct_str = f"{sample_pct:{col_widths['sample_pct']}.1f}"
        tottime = f"{total_time * total_time_scale:{col_widths['tottime']}.3f}"
        cum_pct_str = f"{cum_pct:{col_widths['cum_pct']}.1f}"
        cumtime = f"{cumulative_time * cumulative_time_scale:{col_widths['cumtime']}.3f}"
        # Format the function name with colors
        func_name = (
            f"{ANSIColors.GREEN}{func[0]}{ANSIColors.RESET}:"
            f"{ANSIColors.YELLOW}{func[1]}{ANSIColors.RESET}("
            f"{ANSIColors.CYAN}{func[2]}{ANSIColors.RESET})"
        )
        # Print the formatted line with consistent spacing
        print(
            f"{nsamples_str}  {sample_pct_str}  {tottime}  {cum_pct_str}  {cumtime}  {func_name}"
        )
    # Print legend
    print(f"\n{ANSIColors.BOLD_BLUE}Legend:{ANSIColors.RESET}")
    print(
        f"  {ANSIColors.YELLOW}nsamples{ANSIColors.RESET}: Direct/Cumulative samples (direct executing / on call stack)"
    )
    print(
        f"  {ANSIColors.YELLOW}sample%{ANSIColors.RESET}: Percentage of total samples this function was directly executing"
    )
    print(
        f"  {ANSIColors.YELLOW}tottime{ANSIColors.RESET}: Estimated total time spent directly in this function"
    )
    print(
        f"  {ANSIColors.YELLOW}cumul%{ANSIColors.RESET}: Percentage of total samples when this function was on the call stack"
    )
    print(
        f"  {ANSIColors.YELLOW}cumtime{ANSIColors.RESET}: Estimated cumulative time (including time in called functions)"
    )
    print(
        f"  {ANSIColors.YELLOW}filename:lineno(function){ANSIColors.RESET}: Function location and name"
    )
    def _format_func_name(func):
        """Format function name with colors."""
        return (
            f"{ANSIColors.GREEN}{func[0]}{ANSIColors.RESET}:"
            f"{ANSIColors.YELLOW}{func[1]}{ANSIColors.RESET}("
            f"{ANSIColors.CYAN}{func[2]}{ANSIColors.RESET})"
        )
    def _print_top_functions(stats_list, title, key_func, format_line, n=3):
        """Print top N functions sorted by key_func with formatted output."""
        print(f"\n{ANSIColors.BOLD_BLUE}{title}:{ANSIColors.RESET}")
        sorted_stats = sorted(stats_list, key=key_func, reverse=True)
        for stat in sorted_stats[:n]:
            if line := format_line(stat):
                print(f"  {line}")
    # Print summary of interesting functions if enabled
    if show_summary and stats_list:
        print(
            f"\n{ANSIColors.BOLD_BLUE}Summary of Interesting Functions:{ANSIColors.RESET}"
        )
        # Aggregate stats by fully qualified function name (ignoring line numbers)
        func_aggregated = {}
        for (
            func,
            direct_calls,
            cumulative_calls,
            total_time,
            cumulative_time,
            callers,
        ) in stats_list:
            # Use filename:function_name as the key to get fully qualified name
            qualified_name = f"{func[0]}:{func[2]}"
            if qualified_name not in func_aggregated:
                func_aggregated[qualified_name] = [
                    0,
                    0,
                    0,
                    0,
                ]  # direct_calls, cumulative_calls, total_time, cumulative_time
            func_aggregated[qualified_name][0] += direct_calls
            func_aggregated[qualified_name][1] += cumulative_calls
            func_aggregated[qualified_name][2] += total_time
            func_aggregated[qualified_name][3] += cumulative_time
        # Convert aggregated data back to list format for processing
        aggregated_stats = []
        for qualified_name, (
            prim_calls,
            total_calls,
            total_time,
            cumulative_time,
        ) in func_aggregated.items():
            # Parse the qualified name back to filename and function name
            if ":" in qualified_name:
                filename, func_name = qualified_name.rsplit(":", 1)
            else:
                filename, func_name = "", qualified_name
            # Create a dummy func tuple with filename and function name for display
            dummy_func = (filename, "", func_name)
            aggregated_stats.append(
                (
                    dummy_func,
                    prim_calls,
                    total_calls,
                    total_time,
                    cumulative_time,
                    {},
                )
            )
        # Determine best units for summary metrics
        max_total_time = max(
            (total_time for _, _, _, total_time, _, _ in aggregated_stats),
            default=0,
        )
        max_cumulative_time = max(
            (
                cumulative_time
                for _, _, _, _, cumulative_time, _ in aggregated_stats
            ),
            default=0,
        )
        total_unit, total_scale = _determine_best_unit(max_total_time)
        cumulative_unit, cumulative_scale = _determine_best_unit(
            max_cumulative_time
        )
        # Functions with highest direct/cumulative ratio (hot spots)
        def format_hotspots(stat):
            func, direct_calls, cumulative_calls, total_time, _, _ = stat
            if direct_calls > 0 and cumulative_calls > 0:
                ratio = direct_calls / cumulative_calls
                direct_pct = (
                    (direct_calls / total_samples * 100)
                    if total_samples > 0
                    else 0
                )
                return (
                    f"{ratio:.3f} direct/cumulative ratio, "
                    f"{direct_pct:.1f}% direct samples: {_format_func_name(func)}"
                )
            return None
        _print_top_functions(
            aggregated_stats,
            "Functions with Highest Direct/Cumulative Ratio (Hot Spots)",
            key_func=lambda x: (x[1] / x[2]) if x[2] > 0 else 0,
            format_line=format_hotspots,
        )
        # Functions with highest call frequency (cumulative/direct difference)
        def format_call_frequency(stat):
            func, direct_calls, cumulative_calls, total_time, _, _ = stat
            if cumulative_calls > direct_calls:
                call_frequency = cumulative_calls - direct_calls
                cum_pct = (
                    (cumulative_calls / total_samples * 100)
                    if total_samples > 0
                    else 0
                )
                return (
                    f"{call_frequency:d} indirect calls, "
                    f"{cum_pct:.1f}% total stack presence: {_format_func_name(func)}"
                )
            return None
        _print_top_functions(
            aggregated_stats,
            "Functions with Highest Call Frequency (Indirect Calls)",
            key_func=lambda x: x[2] - x[1],  # Sort by (cumulative - direct)
            format_line=format_call_frequency,
        )
        # Functions with highest cumulative-to-direct multiplier (call magnification)
        def format_call_magnification(stat):
            func, direct_calls, cumulative_calls, total_time, _, _ = stat
            if direct_calls > 0 and cumulative_calls > direct_calls:
                multiplier = cumulative_calls / direct_calls
                indirect_calls = cumulative_calls - direct_calls
                return (
                    f"{multiplier:.1f}x call magnification, "
                    f"{indirect_calls:d} indirect calls from {direct_calls:d} direct: {_format_func_name(func)}"
                )
            return None
        _print_top_functions(
            aggregated_stats,
            "Functions with Highest Call Magnification (Cumulative/Direct)",
            key_func=lambda x: (x[2] / x[1])
            if x[1] > 0
            else 0,  # Sort by cumulative/direct ratio
            format_line=format_call_magnification,
        )
 def sample(
    pid,
    *,
    sort=2,
    sample_interval_usec=100,
    duration_sec=10,
    filename=None,
    all_threads=False,
    limit=None,
    show_summary=True,
    output_format="pstats",
    realtime_stats=False,
 ):
    profiler = SampleProfiler(
        pid, sample_interval_usec, all_threads=all_threads
    )
    profiler.realtime_stats = realtime_stats
    collector = None
    match output_format:
        case "pstats":
            collector = PstatsCollector(sample_interval_usec)
        case "collapsed":
            collector = CollapsedStackCollector()
            filename = filename or f"collapsed.{pid}.txt"
        case _:
            raise ValueError(f"Invalid output format: {output_format}")
    profiler.sample(collector, duration_sec)
    if output_format == "pstats" and not filename:
        stats = pstats.SampledStats(collector).strip_dirs()
        print_sampled_stats(
            stats, sort, limit, show_summary, sample_interval_usec
        )
    else:
        collector.export(filename)
 def _validate_collapsed_format_args(args, parser):
    # Check for incompatible pstats options
    invalid_opts = []
    # Get list of pstats-specific options
    pstats_options = {"sort": None, "limit": None, "no_summary": False}
    # Find the default values from the argument definitions
    for action in parser._actions:
        if action.dest in pstats_options and hasattr(action, "default"):
            pstats_options[action.dest] = action.default
    # Check if any pstats-specific options were provided by comparing with defaults
    for opt, default in pstats_options.items():
        if getattr(args, opt) != default:
            invalid_opts.append(opt.replace("no_", ""))
    if invalid_opts:
        parser.error(
            f"The following options are only valid with --pstats format: {', '.join(invalid_opts)}"
        )
    # Set default output filename for collapsed format
    if not args.outfile:
        args.outfile = f"collapsed.{args.pid}.txt"
 def main():
    # Create the main parser
    parser = argparse.ArgumentParser(
        description=(
            "Sample a process's stack frames and generate profiling data.\n"
            "Supports two output formats:\n"
            "  - pstats: Detailed profiling statistics with sorting options\n"
            "  - collapsed: Stack traces for generating flamegraphs\n"
            "\n"
            "Examples:\n"
            "  # Profile process 1234 for 10 seconds with default settings\n"
            "  python -m profile.sample 1234\n"
            "\n"
            "  # Profile with custom interval and duration, save to file\n"
            "  python -m profile.sample -i 50 -d 30 -o profile.stats 1234\n"
            "\n"
            "  # Generate collapsed stacks for flamegraph\n"
            "  python -m profile.sample --collapsed 1234\n"
            "\n"
            "  # Profile all threads, sort by total time\n"
            "  python -m profile.sample -a --sort-tottime 1234\n"
            "\n"
            "  # Profile for 1 minute with 1ms sampling interval\n"
            "  python -m profile.sample -i 1000 -d 60 1234\n"
            "\n"
            "  # Show only top 20 functions sorted by direct samples\n"
            "  python -m profile.sample --sort-nsamples -l 20 1234\n"
            "\n"
            "  # Profile all threads and save collapsed stacks\n"
            "  python -m profile.sample -a --collapsed -o stacks.txt 1234\n"
            "\n"
            "  # Profile with real-time sampling statistics\n"
            "  python -m profile.sample --realtime-stats 1234\n"
            "\n"
            "  # Sort by sample percentage to find most sampled functions\n"
            "  python -m profile.sample --sort-sample-pct 1234\n"
            "\n"
            "  # Sort by cumulative samples to find functions most on call stack\n"
            "  python -m profile.sample --sort-nsamples-cumul 1234"
        ),
        formatter_class=argparse.RawDescriptionHelpFormatter,
    )
    # Required arguments
    parser.add_argument("pid", type=int, help="Process ID to sample")
    # Sampling options
    sampling_group = parser.add_argument_group("Sampling configuration")
    sampling_group.add_argument(
        "-i",
        "--interval",
        type=int,
        default=100,
        help="Sampling interval in microseconds (default: 100)",
    )
    sampling_group.add_argument(
        "-d",
        "--duration",
        type=int,
        default=10,
        help="Sampling duration in seconds (default: 10)",
    )
    sampling_group.add_argument(
        "-a",
        "--all-threads",
        action="store_true",
        help="Sample all threads in the process instead of just the main thread",
    )
    sampling_group.add_argument(
        "--realtime-stats",
        action="store_true",
        default=False,
        help="Print real-time sampling statistics (Hz, mean, min, max, stdev) during profiling",
    )
    # Output format selection
    output_group = parser.add_argument_group("Output options")
    output_format = output_group.add_mutually_exclusive_group()
    output_format.add_argument(
        "--pstats",
        action="store_const",
        const="pstats",
        dest="format",
        default="pstats",
        help="Generate pstats output (default)",
    )
    output_format.add_argument(
        "--collapsed",
        action="store_const",
        const="collapsed",
        dest="format",
        help="Generate collapsed stack traces for flamegraphs",
    )
    output_group.add_argument(
        "-o",
        "--outfile",
        help="Save output to a file (if omitted, prints to stdout for pstats, "
        "or saves to collapsed.<pid>.txt for collapsed format)",
    )
    # pstats-specific options
    pstats_group = parser.add_argument_group("pstats format options")
    sort_group = pstats_group.add_mutually_exclusive_group()
    sort_group.add_argument(
        "--sort-nsamples",
        action="store_const",
        const=0,
        dest="sort",
        help="Sort by number of direct samples (nsamples column)",
    )
    sort_group.add_argument(
        "--sort-tottime",
        action="store_const",
        const=1,
        dest="sort",
        help="Sort by total time (tottime column)",
    )
    sort_group.add_argument(
        "--sort-cumtime",
        action="store_const",
        const=2,
        dest="sort",
        help="Sort by cumulative time (cumtime column, default)",
    )
    sort_group.add_argument(
        "--sort-sample-pct",
        action="store_const",
        const=3,
        dest="sort",
        help="Sort by sample percentage (sample%% column)",
    )
    sort_group.add_argument(
        "--sort-cumul-pct",
        action="store_const",
        const=4,
        dest="sort",
        help="Sort by cumulative sample percentage (cumul%% column)",
    )
    sort_group.add_argument(
        "--sort-nsamples-cumul",
        action="store_const",
        const=5,
        dest="sort",
        help="Sort by cumulative samples (nsamples column, cumulative part)",
    )
    sort_group.add_argument(
        "--sort-name",
        action="store_const",
        const=-1,
        dest="sort",
        help="Sort by function name",
    )
    pstats_group.add_argument(
        "-l",
        "--limit",
        type=int,
        help="Limit the number of rows in the output",
        default=15,
    )
    pstats_group.add_argument(
        "--no-summary",
        action="store_true",
        help="Disable the summary section in the output",
    )
    args = parser.parse_args()
    # Validate format-specific arguments
    if args.format == "collapsed":
        _validate_collapsed_format_args(args, parser)
    sort_value = args.sort if args.sort is not None else 2
    sample(
        args.pid,
        sample_interval_usec=args.interval,
        duration_sec=args.duration,
        filename=args.outfile,
        all_threads=args.all_threads,
        limit=args.limit,
        sort=sort_value,
        show_summary=not args.no_summary,
        output_format=args.format,
        realtime_stats=args.realtime_stats,
    )
 if __name__ == "__main__":
    main()
--- a/Lib/profile/stack_collector.py
+++ b/Lib/profile/stack_collector.py
@ -0,0 +1,37 @@
 import collections
 import os
 from .collector import Collector
 class StackTraceCollector(Collector):
    def __init__(self):
        self.call_trees = []
        self.function_samples = collections.defaultdict(int)
    def collect(self, stack_frames):
        for thread_id, frames in stack_frames:
            if frames:
                # Store the complete call stack (reverse order - root first)
                call_tree = list(reversed(frames))
                self.call_trees.append(call_tree)
                # Count samples per function
                for frame in frames:
                    self.function_samples[frame] += 1
 class CollapsedStackCollector(StackTraceCollector):
    def export(self, filename):
        stack_counter = collections.Counter()
        for call_tree in self.call_trees:
            # Call tree is already in root->leaf order
            stack_str = ";".join(
                f"{os.path.basename(f[0])}:{f[2]}:{f[1]}" for f in call_tree
            )
            stack_counter[stack_str] += 1
        with open(filename, "w") as f:
            for stack, count in stack_counter.items():
                f.write(f"{stack} {count}\n")
        print(f"Collapsed stack output written to {filename}")
--- a/Lib/pstats.py
+++ b/Lib/pstats.py
@ -139,7 +139,11 @@ class Stats:
            return
        elif isinstance(arg, str):
            with open(arg, 'rb') as f:
-                self.stats = marshal.load(f)
+                stats = marshal.load(f)
            if (('__sampled__',)) in stats:
                stats.pop((('__sampled__',)))
                self.__class__ = SampledStats
            self.stats = stats
            try:
                file_stats = os.stat(arg)
                arg = time.ctime(file_stats.st_mtime) + "    " + arg
@ -467,6 +471,9 @@ class Stats:
                subheader = isinstance(value, tuple)
                break
        if subheader:
            self.print_call_subheading(name_size)
    def print_call_subheading(self, name_size):
        print(" "*name_size + "    ncalls  tottime  cumtime", file=self.stream)
    def print_call_line(self, name_size, source, call_dict, arrow="->"):
@ -516,6 +523,35 @@ class Stats:
            print(f8(ct/cc), end=' ', file=self.stream)
        print(func_std_string(func), file=self.stream)
 class SampledStats(Stats):
    def __init__(self, *args, stream=None):
        super().__init__(*args, stream=stream)
        self.sort_arg_dict = {
              "samples"   : (((1,-1),              ), "sample count"),
              "nsamples"  : (((1,-1),              ), "sample count"),
              "cumtime"   : (((3,-1),              ), "cumulative time"),
              "cumulative": (((3,-1),              ), "cumulative time"),
              "filename"  : (((4, 1),              ), "file name"),
              "line"      : (((5, 1),              ), "line number"),
              "module"    : (((4, 1),              ), "file name"),
              "name"      : (((6, 1),              ), "function name"),
              "nfl"       : (((6, 1),(4, 1),(5, 1),), "name/file/line"),
              "psamples"  : (((0,-1),              ), "primitive call count"),
              "stdname"   : (((7, 1),              ), "standard name"),
              "time"      : (((2,-1),              ), "internal time"),
              "tottime"   : (((2,-1),              ), "internal time"),
              }
    def print_call_subheading(self, name_size):
        print(" "*name_size + "    nsamples  tottime  cumtime", file=self.stream)
    def print_title(self):
        print(' nsamples  tottime persample cumtime persample', end=' ', file=self.stream)
        print('filename:lineno(function)', file=self.stream)
 class TupleComp:
    """This class provides a generic function for comparing any two tuples.
    Each instance records a list of tuple-indices (from most significant
@ -607,6 +643,24 @@ def f8(x):
 # Statistics browser added by ESR, April 2001
 #**************************************************************************
 class StatsLoaderShim:
    """Compatibility shim implementing 'create_stats' needed by Stats classes
    to handle already unmarshalled data."""
    def __init__(self, raw_stats):
        self.stats = raw_stats
    def create_stats(self):
        pass
 def stats_factory(raw_stats):
    """Return a Stats or SampledStats instance based on the marker in raw_stats."""
    if (('__sampled__',)) in raw_stats:
        raw_stats = dict(raw_stats)  # avoid mutating caller's dict
        raw_stats.pop((('__sampled__',)))
        return SampledStats(StatsLoaderShim(raw_stats))
    else:
        return Stats(StatsLoaderShim(raw_stats))
 if __name__ == '__main__':
    import cmd
    try:
@ -693,7 +747,15 @@ if __name__ == '__main__':
        def do_read(self, line):
            if line:
                try:
-                    self.stats = Stats(line)
+                    with open(line, 'rb') as f:
                        raw_stats = marshal.load(f)
                    self.stats = stats_factory(raw_stats)
                    try:
                        file_stats = os.stat(line)
                        arg = time.ctime(file_stats.st_mtime) + "    " + line
                    except Exception:
                        arg = line
                    self.stats.files = [arg]
                except OSError as err:
                    print(err.args[1], file=self.stream)
                    return
--- a/Lib/test/test_external_inspection.py
+++ b/Lib/test/test_external_inspection.py
@ -1257,5 +1257,21 @@ class TestGetStackTrace(unittest.TestCase):
            )
 class TestUnsupportedPlatformHandling(unittest.TestCase):
    @unittest.skipIf(
        sys.platform in ("linux", "darwin", "win32"),
        "Test only runs on unsupported platforms (not Linux, macOS, or Windows)",
    )
    @unittest.skipIf(sys.platform == "android", "Android raises Linux-specific exception")
    def test_unsupported_platform_error(self):
        with self.assertRaises(RuntimeError) as cm:
            RemoteUnwinder(os.getpid())
        self.assertIn(
            "Reading the PyRuntime section is not supported on this platform",
            str(cm.exception)
        )
 if __name__ == "__main__":
    unittest.main()
--- a/Lib/test/test_sample_profiler.py
+++ b/Lib/test/test_sample_profiler.py
--- a/Makefile.pre.in
+++ b/Makefile.pre.in
@ -2530,6 +2530,7 @@ LIBSUBDIRS=	asyncio \
 		logging \
 		multiprocessing multiprocessing/dummy \
 		pathlib \
 		profile \
 		pydoc_data \
 		re \
 		site-packages \
--- a/Misc/NEWS.d/next/Library/2025-07-06-18-38-10.gh-issue-135953.Z29DCz.rst
+++ b/Misc/NEWS.d/next/Library/2025-07-06-18-38-10.gh-issue-135953.Z29DCz.rst
@ -0,0 +1,9 @@
 Implement a new high-frequency runtime profiler that leverages the existing
 remote debugging functionality to collect detailed execution statistics
 from running Python processes. This tool is exposed in the
 ``profile.sample`` module and enables non-intrusive observation of
 production applications by attaching to already-running processes without
 requiring any code modifications, restarts, or special startup flags. The
 observer can perform extremely high-frequency sampling of stack traces and
 interpreter state, providing detailed runtime execution analysis of live
 applications.
--- a/Python/remote_debug.h
+++ b/Python/remote_debug.h
@ -751,6 +751,14 @@ search_linux_map_for_section(proc_handle_t *handle, const char* secname, const c
 #ifdef MS_WINDOWS
 static int is_process_alive(HANDLE hProcess) {
    DWORD exitCode;
    if (GetExitCodeProcess(hProcess, &exitCode)) {
        return exitCode == STILL_ACTIVE;
    }
    return 0;
 }
 static void* analyze_pe(const wchar_t* mod_path, BYTE* remote_base, const char* secname) {
    HANDLE hFile = CreateFileW(mod_path, GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
    if (hFile == INVALID_HANDLE_VALUE) {
@ -911,7 +919,9 @@ _Py_RemoteDebug_GetPyRuntimeAddress(proc_handle_t* handle)
        _PyErr_ChainExceptions1(exc);
    }
 #else
-    Py_UNREACHABLE();
+    _set_debug_exception_cause(PyExc_RuntimeError,
        "Reading the PyRuntime section is not supported on this platform");
    return 0;
 #endif
    return address;
@ -981,6 +991,13 @@ _Py_RemoteDebug_ReadRemoteMemory(proc_handle_t *handle, uintptr_t remote_address
    SIZE_T result = 0;
    do {
        if (!ReadProcessMemory(handle->hProcess, (LPCVOID)(remote_address + result), (char*)dst + result, len - result, &read_bytes)) {
            // Check if the process is still alive: we need to be able to tell our caller
            // that the process is dead and not just that the read failed.
            if (!is_process_alive(handle->hProcess)) {
                _set_errno(ESRCH);
                PyErr_SetFromErrno(PyExc_OSError);
                return -1;
            }
            PyErr_SetFromWindowsErr(0);
            DWORD error = GetLastError();
            _set_debug_exception_cause(PyExc_OSError,
@ -1013,6 +1030,9 @@ _Py_RemoteDebug_ReadRemoteMemory(proc_handle_t *handle, uintptr_t remote_address
                return read_remote_memory_fallback(handle, remote_address, len, dst);
            }
            PyErr_SetFromErrno(PyExc_OSError);
            if (errno == ESRCH) {
                return -1;
            }
            _set_debug_exception_cause(PyExc_OSError,
                "process_vm_readv failed for PID %d at address 0x%lx "
                "(size %zu, partial read %zd bytes): %s",