mirror of
				https://github.com/python/cpython.git
				synced 2025-11-04 03:44:55 +00:00 
			
		
		
		
	
		
			
				
	
	
		
			68 lines
		
	
	
	
		
			2.5 KiB
		
	
	
	
		
			Text
		
	
	
	
	
	
			
		
		
	
	
			68 lines
		
	
	
	
		
			2.5 KiB
		
	
	
	
		
			Text
		
	
	
	
	
	
stringbench is a set of performance tests comparing byte string
 | 
						|
operations with unicode operations.  The two string implementations
 | 
						|
are loosely based on each other and sometimes the algorithm for one is
 | 
						|
faster than the other.
 | 
						|
 | 
						|
These test set was started at the Need For Speed sprint in Reykjavik
 | 
						|
to identify which string methods could be sped up quickly and to
 | 
						|
identify obvious places for improvement.
 | 
						|
 | 
						|
Here is an example of a benchmark
 | 
						|
 | 
						|
 | 
						|
@bench('"Andrew".startswith("A")', 'startswith single character', 1000)
 | 
						|
def startswith_single(STR):
 | 
						|
    s1 = STR("Andrew")
 | 
						|
    s2 = STR("A")
 | 
						|
    s1_startswith = s1.startswith
 | 
						|
    for x in _RANGE_1000:
 | 
						|
        s1_startswith(s2)
 | 
						|
 | 
						|
The bench decorator takes three parameters.  The first is a short
 | 
						|
description of how the code works.  In most cases this is Python code
 | 
						|
snippet.  It is not the code which is actually run because the real
 | 
						|
code is hand-optimized to focus on the method being tested.
 | 
						|
 | 
						|
The second parameter is a group title.  All benchmarks with the same
 | 
						|
group title are listed together.  This lets you compare different
 | 
						|
implementations of the same algorithm, such as "t in s"
 | 
						|
vs. "s.find(t)".
 | 
						|
 | 
						|
The last is a count.  Each benchmark loops over the algorithm either
 | 
						|
100 or 1000 times, depending on the algorithm performance.  The output
 | 
						|
time is the time per benchmark call so the reader needs a way to know
 | 
						|
how to scale the performance.
 | 
						|
 | 
						|
These parameters become function attributes.
 | 
						|
 | 
						|
 | 
						|
Here is an example of the output
 | 
						|
 | 
						|
 | 
						|
========== count newlines
 | 
						|
38.54   41.60   92.7    ...text.with.2000.newlines.count("\n") (*100)
 | 
						|
========== early match, single character
 | 
						|
1.14    1.18    96.8    ("A"*1000).find("A") (*1000)
 | 
						|
0.44    0.41    105.6   "A" in "A"*1000 (*1000)
 | 
						|
1.15    1.17    98.1    ("A"*1000).index("A") (*1000)
 | 
						|
 | 
						|
The first column is the run time in milliseconds for byte strings.
 | 
						|
The second is the run time for unicode strings.  The third is a
 | 
						|
percentage; byte time / unicode time.  It's the percentage by which
 | 
						|
unicode is faster than byte strings.
 | 
						|
 | 
						|
The last column contains the code snippet and the repeat count for the
 | 
						|
internal benchmark loop.
 | 
						|
 | 
						|
The times are computed with 'timeit.py' which repeats the test more
 | 
						|
and more times until the total time takes over 0.2 seconds, returning
 | 
						|
the best time for a single iteration.
 | 
						|
 | 
						|
The final line of the output is the cumulative time for byte and
 | 
						|
unicode strings, and the overall performance of unicode relative to
 | 
						|
bytes.  For example
 | 
						|
 | 
						|
4079.83 5432.25 75.1    TOTAL
 | 
						|
 | 
						|
However, this has no meaning as it evenly weights every test.
 | 
						|
 |