Benchmarking and Profiling¶
Objectives
Master IPython magic commands for timing code execution (
%timeand%timeit)Understand cProfile as Python’s built-in deterministic profiler for performance analysis
Use the timeit module for accurate benchmarking of small code snippets
Profile full scripts with
%prunto identify performance bottlenecksDistinguish between wall-clock time and statistical averaging in performance measurement
Interpret profiler output metrics (ncalls, tottime, cumtime, percall)
Identify optimization opportunities through function-level and line-by-line profiling
Apply memory profiling tools (
%memit,%mprun) to track resource consumption
Instructor note
Emphasize the difference between %time (single run) vs %timeit (multiple runs with averaging)
Show real examples of how cProfile output guides optimization decisions (e.g., sum formula vs loop)
Demonstrate that profiling overhead matters for very fast operations
Explain why cProfile uses C implementation for low overhead on long-running code
Present profiling as a scientific approach: measure first, optimize second
Connect function calls in profiler output to actual code flow
Highlight that list comprehensions and loops often show up as time bottlenecks
Motivate use of specialized profilers (
%lprun,%mprun) for deep performance investigation
Timing Code Snippets¶
Jupyter Notebook is a widely used platform for interactive coding, data analysis, and machine learning. It offers powerful features like magic functions that streamline tasks such as timing code execution or managing environments.
Here we’ll discuss the following IPython magic commands:
%time: Time the execution of a single statement%timeit: Time repeated execution of a single statement for more accuracy%prun: Run code with the profiler
Profiling Full Scripts: cProfile¶
cProfile is Python’s built-in profiling tool that helps measure and analyze the execution time of your code. It’s implemented as a C extension, making it particularly efficient for profiling long-running programs
Key Features:
Provides detailed statistics about function calls and execution times
Uses deterministic profiling for precise measurements
Low overhead due to C implementation
Suitable for both short scripts and long-running applications
def sum_(n):
total_sum = 0
# sum of numbers till n
for i in range(0,n):
total_sum += i
return total_sum
sum_(10000)
49995000
import cProfile
cProfile.run('sum_(10000)')
4 function calls in 0.001 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.001 0.001 0.001 0.001 606777552.py:1(sum_)
1 0.000 0.000 0.001 0.001 <string>:1(<module>)
1 0.000 0.000 0.001 0.001 {built-in method builtins.exec}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
We are profiling the sum_ function using cProfile, which measures the execution time of the function calls.
ncalls: The number of calls to a particular function. For example, the sum_ function was called once.
tottime: The total time spent in the function itself, excluding sub-functions. The sum_ function took 0.001 seconds.
percall: The time per call, which is the tottime divided by ncalls. For sum_, this is 0.001 seconds per call.
cumtime: The cumulative time, including time spent in the function and all sub-functions (which is the same here because there are no sub-functions).
percall (again for cumtime): The cumulative time per call.
In the example above, a simple Python function sum_ is defined to calculate the sum of numbers from 0 to n-1 using a for loop. To analyze its performance, the cProfile module is used to profile the function call sum_(10000). The profiling output shows that the function executed four main calls, including the custom sum_ function, the module execution, and internal calls by Python’s execution engine. Notably, the sum_ function completed its task in approximately 0.001 seconds, indicating efficient performance for a small input size. While the iterative approach is straightforward, this task can be optimized even further using the mathematical formula for the sum of an arithmetic series, n*(n-1)//2, which eliminates the loop entirely and dramatically improves performance for very large values of n.
Benchmarks using timeit and time:¶
import time
# Measure execution time
start_time = time.time()
result = sum_(10000)
end_time = time.time()
elapsed_time = end_time - start_time
print(f"Execution time: {elapsed_time} seconds")
Execution time: 0.000629425048828125 seconds
import timeit
def sum_(n):
total_sum = 0
# sum of numbers till n
for i in range(0, n):
total_sum += i
return total_sum
execution_time = timeit.timeit('sum_(10000)', globals=globals(), number=100)
print(f"Execution time for 100 runs: {execution_time} seconds")
Execution time for 100 runs: 0.056807367014698684 seconds
%%bash
python -m timeit --setup="sum_ = lambda n: sum(i for i in range(n))" "sum_(10000)"
500 loops, best of 5: 680 usec per loop
IPython, an interactive computing environment for Python, offers special commands called magic commands that provide extended functionalities for performance analysis and debugging. These commands, prefixed with % for line magics and %% for cell magics, enable users to efficiently measure execution time, profile code performance, and track memory usage. By leveraging these tools, developers can gain deeper insights into their code’s efficiency and optimize it for better performance.
timemodule: measures elapsed wall-clock time between two points; simple but less accurate for small, fast operations.timeitmodule: repeatedly runs code and measures average execution time; better for precise timing of small, fast code snippets.
%time: This command measures the execution time of a single run of a line of code. Unlike
%timeit, which runs the code multiple times and gives an average, %time runs the code once and reports the time it took for that execution.
%timeit sum_(1000000)
119 ms ± 15.4 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%time sum_(1000000)
CPU times: user 251 ms, sys: 11.9 ms, total: 263 ms
Wall time: 240 ms
499999500000
Profiling Full Scripts: %prun¶
A program is made of many single statements, and sometimes timing these statements in context is more important than timing them on their own. Python contains a built-in code profiler (which you can read about in the Python documentation), but IPython offers a much more convenient way to use this profiler, in the form of the magic function %prun.
def sum_of_lists(N):
total = 0
for i in range(5):
L = [j ^ (j >> i) for j in range(N)]
total += sum(L)
return total
%prun sum_of_lists(1000)
14 function calls in 0.001 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
5 0.001 0.000 0.001 0.000 3519952779.py:4(<listcomp>)
5 0.000 0.000 0.000 0.000 {built-in method builtins.sum}
1 0.000 0.000 0.001 0.001 3519952779.py:1(sum_of_lists)
1 0.000 0.000 0.001 0.001 {built-in method builtins.exec}
1 0.000 0.000 0.001 0.001 <string>:1(<module>)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
The result is a table that indicates, in order of total time on each function call, where the execution is spending the most time. In this case, the bulk of execution time is in the list comprehension inside sum_of_lists. From here, we could start thinking about what changes we might make to improve the performance in the algorithm.
For more information on %prun, as well as its available options, use the IPython help functionality (i.e., type %prun? at the IPython prompt)
%prun?
Docstring:
Run a statement through the python code profiler.
Usage, in line mode:
%prun [options] statement
Usage, in cell mode:
%%prun [options] [statement]
code...
code...
In cell mode, the additional code lines are appended to the (possibly
empty) statement in the first line. Cell mode allows you to easily
profile multiline blocks without having to put them in a separate
function.
The given statement (which doesn't require quote marks) is run via the
python profiler in a manner similar to the profile.run() function.
Namespaces are internally managed to work correctly; profile.run
cannot be used in IPython because it makes certain assumptions about
namespaces which do not hold under IPython.
Options:
-l <limit>
you can place restrictions on what or how much of the
profile gets printed. The limit value can be:
* A string: only information for function names containing this string
is printed.
* An integer: only these many lines are printed.
* A float (between 0 and 1): this fraction of the report is printed
(for example, use a limit of 0.4 to see the topmost 40% only).
You can combine several limits with repeated use of the option. For
example, ``-l __init__ -l 5`` will print only the topmost 5 lines of
information about class constructors.
-r
return the pstats.Stats object generated by the profiling. This
object has all the information about the profile in it, and you can
later use it for further analysis or in other functions.
-s <key>
sort profile by given key. You can provide more than one key
by using the option several times: '-s key1 -s key2 -s key3...'. The
default sorting key is 'time'.
The following is copied verbatim from the profile documentation
referenced below:
When more than one key is provided, additional keys are used as
secondary criteria when the there is equality in all keys selected
before them.
Abbreviations can be used for any key names, as long as the
abbreviation is unambiguous. The following are the keys currently
defined:
============ =====================
Valid Arg Meaning
============ =====================
"calls" call count
"cumulative" cumulative time
"file" file name
"module" file name
"pcalls" primitive call count
"line" line number
"name" function name
"nfl" name/file/line
"stdname" standard name
"time" internal time
============ =====================
Note that all sorts on statistics are in descending order (placing
most time consuming items first), where as name, file, and line number
searches are in ascending order (i.e., alphabetical). The subtle
distinction between "nfl" and "stdname" is that the standard name is a
sort of the name as printed, which means that the embedded line
numbers get compared in an odd way. For example, lines 3, 20, and 40
would (if the file names were the same) appear in the string order
"20" "3" and "40". In contrast, "nfl" does a numeric compare of the
line numbers. In fact, sort_stats("nfl") is the same as
sort_stats("name", "file", "line").
-T <filename>
save profile results as shown on screen to a text
file. The profile is still shown on screen.
-D <filename>
save (via dump_stats) profile statistics to given
filename. This data is in a format understood by the pstats module, and
is generated by a call to the dump_stats() method of profile
objects. The profile is still shown on screen.
-q
suppress output to the pager. Best used with -T and/or -D above.
If you want to run complete programs under the profiler's control, use
``%run -p [prof_opts] filename.py [args to program]`` where prof_opts
contains profiler specific options as described here.
You can read the complete documentation for the profile module with::
In [1]: import profile; profile.help()
.. versionchanged:: 7.3
User variables are no longer expanded,
the magic line is always left unmodified.
File: /apps/all/IPython/8.17.2-GCCcore-13.2.0/lib/python3.11/site-packages/IPython/core/magics/execution.py
cProfilemodule: the real Python profiler that collects detailed function call stats (time spent per function, number of calls, etc.), you use it by writing Python code like cProfile.run(‘your_code()’).%prunmagic: an IPython shortcut for cProfile; it runs profiling inside Jupyter/IPython without needing to manually import or call cProfile.
Other tools¶
Besides %prun, there are several other useful profiling tools in IPython. %lprun runs a line-by-line profiler, letting you see how much time each line of code takes. %memit measures the memory usage of a single statement, which is helpful when optimizing memory-heavy code. %mprun combines both ideas, running a line-by-line memory profiler to show how much memory is used by each line of a function.
%lprun: Run code with the line-by-line profiler%memit: Measure the memory use of a single statement%mprun: Run code with the line-by-line memory profiler
Conclusion¶
Keypoints
%timemeasures a single execution;%timeitrepeats the code for statistical accuracycProfile provides deterministic profiling with ncalls, tottime, cumtime metrics
tottime excludes sub-function calls; cumtime includes all time spent in that call tree
%prunis the IPython shortcut to cProfile, available directly in notebooksLine-by-line profilers (
%lprun) reveal exactly which lines consume the most timeMemory profilers (
%memit,%mprun) are critical for optimization in data-heavy workflowsProfiler output is ordered by cumulative time to highlight where to focus optimization efforts
timeit module can also run from command line (python -m timeit) for external script profiling