When using multiprocesss (-jN), the main process now uses a timeout
of 60 seconds instead of the double of the --timeout value. The
buildbot server stops a job which does not produce any output in 1200
seconds.
(cherry picked from commit 46b0b81220)
Co-authored-by: Victor Stinner <vstinner@redhat.com>
* Write a message when killing a worker process
* Put a timeout on the second popen.communicate() call
(after killing the process)
* Put a timeout on popen.wait() call
* Catch popen.kill() and popen.wait() exceptions
(cherry picked from commit de2d9eed8b)
When using multiprocessing (-jN option), worker processes now create
their temporary directory inside the temporary directory of the
main process. So the main process is able to remove temporary
directories of worker processes even if they crash or when they are
killed by regrtest on KeyboardInterrupt (CTRL+c).
Rework also how multiprocessing arguments are parsed in main.py.
"python3 -m test -jN ..." now continues the execution of next tests
when a worker process crash (CHILD_ERROR state). Previously, the test
suite stopped immediately. Use --failfast to stop at the first error.
Moreover, --forever now also implies --failfast.
Rewrite run_tests_multiprocess() function as a new MultiprocessRunner
class with multiple methods to better report errors and stop
immediately when needed.
Changes:
* Worker processes are now killed immediately if tests are
interrupted or if a test does crash (CHILD_ERROR): worker
processes are killed.
* Rewrite how errors in a worker thread are reported to
the main thread. No longer ignore BaseException or parsing errors
silently.
* Remove 'finished' variable: use worker.is_alive() instead
* Always compute omitted tests. Add Regrtest.get_executed() method.
* Add TestResult and MultiprocessResult types to ensure that results
always have the same fields.
* runtest() now handles KeyboardInterrupt
* accumulate_result() and format_test_result() now takes a TestResult
* cleanup_test_droppings() is now called by runtest() and mark the
test as ENV_CHANGED if the test leaks support.TESTFN file.
* runtest() now includes code "around" the test in the test timing
* Add print_warning() in test.libregrtest.utils to standardize how
libregrtest logs warnings to ease parsing the test output.
* support.unload() is now called with abstest rather than test_name
* Rename 'test' variable/parameter to 'test_name'
* dash_R(): remove unused the_module parameter
* Remove unused imports
* "running:" progress: Format number of seconds as hours and minutes
* format_duration(): count also minutes as hours
* Create Lib/test/libregrtest/utils.py
Issue #29362: Catch a crash of a worker process as a normal failure and
continue to run next tests. It allows to get the usual test summary: single
line result (OK/FAIL), total duration, etc.
* Fix "-m test --forever": replace _test_forever() with self._test_forever()
* Add unit test for --forever
* Add unit test for a failing test
* Fix also some pyflakes warnings in libregrtest
* Remove runtest_ns(): pass directly ns to runtest().
* Create also Regrtest.rerun_failed_tests() method.
* Inline again Regrtest.run_test(): it's no more justified to have a method
Slaves (child processes running tests for regrtest -jN) now inherit
--memlimit/-M, --threshold/-t and --nowindows/-n options.
* -M, -t and -n are now supported with -jN
* Factorize code to run tests.
* run_test_in_subprocess() now pass the whole "ns" namespace to the child
process.
Running the Python test suite with -jN now:
- Display the duration of tests which took longer than 30 seconds
- Display the tests currently running since at least 30 seconds
- Display the tests we are waiting for when the test suite is interrupted
Clenaup also run_test_in_subprocess() code.
Python doesn't display the refcount anymore by default. It only displays it
when -X showrefcount command line option is used, which is not the case here.
regrtest can be run with -X showrefcount, the option is not inherited by child
processes.
Move the code to run tests in multiple processes using threading and subprocess
to a new submodule.
Move also slave_runner() (renamed to run_tests_slave()) and
run_test_in_subprocess() (renamed to run_tests_in_subprocess()) there.