Benchmarking and Profiling¶

Objectives

Master IPython magic commands for timing code execution (%time and %timeit)
Understand cProfile as Python’s built-in deterministic profiler for performance analysis
Use the timeit module for accurate benchmarking of small code snippets
Profile full scripts with %prun to identify performance bottlenecks
Distinguish between wall-clock time and statistical averaging in performance measurement
Interpret profiler output metrics (ncalls, tottime, cumtime, percall)
Identify optimization opportunities through function-level and line-by-line profiling
Apply memory profiling tools (%memit, %mprun) to track resource consumption

Instructor note

Emphasize the difference between %time (single run) vs %timeit (multiple runs with averaging)
Show real examples of how cProfile output guides optimization decisions (e.g., sum formula vs loop)
Demonstrate that profiling overhead matters for very fast operations
Explain why cProfile uses C implementation for low overhead on long-running code
Present profiling as a scientific approach: measure first, optimize second
Connect function calls in profiler output to actual code flow
Highlight that list comprehensions and loops often show up as time bottlenecks
Motivate use of specialized profilers (%lprun, %mprun) for deep performance investigation

Timing Code Snippets¶

Jupyter Notebook is a widely used platform for interactive coding, data analysis, and machine learning. It offers powerful features like magic functions that streamline tasks such as timing code execution or managing environments.

Here we’ll discuss the following IPython magic commands:

%time: Time the execution of a single statement
%timeit: Time repeated execution of a single statement for more accuracy
%prun: Run code with the profiler

Profiling Full Scripts: cProfile¶

cProfile is Python’s built-in profiling tool that helps measure and analyze the execution time of your code. It’s implemented as a C extension, making it particularly efficient for profiling long-running programs

Key Features:

Provides detailed statistics about function calls and execution times
Uses deterministic profiling for precise measurements
Low overhead due to C implementation
Suitable for both short scripts and long-running applications

def sum_(n):
    total_sum = 0
    # sum of numbers till n
    for i in range(0,n):
        total_sum += i 
    return total_sum

sum_(10000)

49995000

import cProfile
cProfile.run('sum_(10000)')

         4 function calls in 0.001 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.001    0.001    0.001    0.001 606777552.py:1(sum_)
        1    0.000    0.000    0.001    0.001 <string>:1(<module>)
        1    0.000    0.000    0.001    0.001 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

We are profiling the sum_ function using cProfile, which measures the execution time of the function calls.

ncalls: The number of calls to a particular function. For example, the sum_ function was called once.
tottime: The total time spent in the function itself, excluding sub-functions. The sum_ function took 0.001 seconds.
percall: The time per call, which is the tottime divided by ncalls. For sum_, this is 0.001 seconds per call.
cumtime: The cumulative time, including time spent in the function and all sub-functions (which is the same here because there are no sub-functions).
percall (again for cumtime): The cumulative time per call.

In the example above, a simple Python function sum_ is defined to calculate the sum of numbers from 0 to n-1 using a for loop. To analyze its performance, the cProfile module is used to profile the function call sum_(10000). The profiling output shows that the function executed four main calls, including the custom sum_ function, the module execution, and internal calls by Python’s execution engine. Notably, the sum_ function completed its task in approximately 0.001 seconds, indicating efficient performance for a small input size. While the iterative approach is straightforward, this task can be optimized even further using the mathematical formula for the sum of an arithmetic series, n*(n-1)//2, which eliminates the loop entirely and dramatically improves performance for very large values of n.

Benchmarks using timeit and time:¶

import time
# Measure execution time
start_time = time.time()  
result = sum_(10000)     
end_time = time.time()

elapsed_time = end_time - start_time
print(f"Execution time: {elapsed_time} seconds")

Execution time: 0.000629425048828125 seconds

import timeit

def sum_(n):
    total_sum = 0
    # sum of numbers till n
    for i in range(0, n):
        total_sum += i
    return total_sum

execution_time = timeit.timeit('sum_(10000)', globals=globals(), number=100)

print(f"Execution time for 100 runs: {execution_time} seconds")

Execution time for 100 runs: 0.056807367014698684 seconds

%%bash
python -m timeit --setup="sum_ = lambda n: sum(i for i in range(n))" "sum_(10000)"

500 loops, best of 5: 680 usec per loop

IPython, an interactive computing environment for Python, offers special commands called magic commands that provide extended functionalities for performance analysis and debugging. These commands, prefixed with % for line magics and %% for cell magics, enable users to efficiently measure execution time, profile code performance, and track memory usage. By leveraging these tools, developers can gain deeper insights into their code’s efficiency and optimize it for better performance.

time module: measures elapsed wall-clock time between two points; simple but less accurate for small, fast operations.
timeit module: repeatedly runs code and measures average execution time; better for precise timing of small, fast code snippets.

%time: This command measures the execution time of a single run of a line of code. Unlike %timeit, which runs the code multiple times and gives an average, %time runs the code once and reports the time it took for that execution.

%timeit sum_(1000000)

119 ms ± 15.4 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%time sum_(1000000)

CPU times: user 251 ms, sys: 11.9 ms, total: 263 ms
Wall time: 240 ms

499999500000

Profiling Full Scripts: %prun¶

A program is made of many single statements, and sometimes timing these statements in context is more important than timing them on their own. Python contains a built-in code profiler (which you can read about in the Python documentation), but IPython offers a much more convenient way to use this profiler, in the form of the magic function %prun.

def sum_of_lists(N):
    total = 0
    for i in range(5):
        L = [j ^ (j >> i) for j in range(N)]
        total += sum(L)
    return total

%prun sum_of_lists(1000)

         14 function calls in 0.001 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        5    0.001    0.000    0.001    0.000 3519952779.py:4(<listcomp>)
        5    0.000    0.000    0.000    0.000 {built-in method builtins.sum}
        1    0.000    0.000    0.001    0.001 3519952779.py:1(sum_of_lists)
        1    0.000    0.000    0.001    0.001 {built-in method builtins.exec}
        1    0.000    0.000    0.001    0.001 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

The result is a table that indicates, in order of total time on each function call, where the execution is spending the most time. In this case, the bulk of execution time is in the list comprehension inside sum_of_lists. From here, we could start thinking about what changes we might make to improve the performance in the algorithm.

For more information on %prun, as well as its available options, use the IPython help functionality (i.e., type %prun? at the IPython prompt)

%prun?

Docstring:
Run a statement through the python code profiler.

Usage, in line mode:
  %prun [options] statement

Usage, in cell mode:
  %%prun [options] [statement]
  code...
  code...

In cell mode, the additional code lines are appended to the (possibly
empty) statement in the first line.  Cell mode allows you to easily
profile multiline blocks without having to put them in a separate
function.

The given statement (which doesn't require quote marks) is run via the
python profiler in a manner similar to the profile.run() function.
Namespaces are internally managed to work correctly; profile.run
cannot be used in IPython because it makes certain assumptions about
namespaces which do not hold under IPython.

Options:

-l <limit>
  you can place restrictions on what or how much of the
  profile gets printed. The limit value can be:

     * A string: only information for function names containing this string
       is printed.

     * An integer: only these many lines are printed.

     * A float (between 0 and 1): this fraction of the report is printed
       (for example, use a limit of 0.4 to see the topmost 40% only).

  You can combine several limits with repeated use of the option. For
  example, ``-l __init__ -l 5`` will print only the topmost 5 lines of
  information about class constructors.

-r
  return the pstats.Stats object generated by the profiling. This
  object has all the information about the profile in it, and you can
  later use it for further analysis or in other functions.

-s <key>
  sort profile by given key. You can provide more than one key
  by using the option several times: '-s key1 -s key2 -s key3...'. The
  default sorting key is 'time'.

  The following is copied verbatim from the profile documentation
  referenced below:

  When more than one key is provided, additional keys are used as
  secondary criteria when the there is equality in all keys selected
  before them.

  Abbreviations can be used for any key names, as long as the
  abbreviation is unambiguous.  The following are the keys currently
  defined:

  ============  =====================
  Valid Arg     Meaning
  ============  =====================
  "calls"       call count
  "cumulative"  cumulative time
  "file"        file name
  "module"      file name
  "pcalls"      primitive call count
  "line"        line number
  "name"        function name
  "nfl"         name/file/line
  "stdname"     standard name
  "time"        internal time
  ============  =====================

  Note that all sorts on statistics are in descending order (placing
  most time consuming items first), where as name, file, and line number
  searches are in ascending order (i.e., alphabetical). The subtle
  distinction between "nfl" and "stdname" is that the standard name is a
  sort of the name as printed, which means that the embedded line
  numbers get compared in an odd way.  For example, lines 3, 20, and 40
  would (if the file names were the same) appear in the string order
  "20" "3" and "40".  In contrast, "nfl" does a numeric compare of the
  line numbers.  In fact, sort_stats("nfl") is the same as
  sort_stats("name", "file", "line").

-T <filename>
  save profile results as shown on screen to a text
  file. The profile is still shown on screen.

-D <filename>
  save (via dump_stats) profile statistics to given
  filename. This data is in a format understood by the pstats module, and
  is generated by a call to the dump_stats() method of profile
  objects. The profile is still shown on screen.

-q
  suppress output to the pager.  Best used with -T and/or -D above.

If you want to run complete programs under the profiler's control, use
``%run -p [prof_opts] filename.py [args to program]`` where prof_opts
contains profiler specific options as described here.

You can read the complete documentation for the profile module with::

  In [1]: import profile; profile.help()

.. versionchanged:: 7.3
    User variables are no longer expanded,
    the magic line is always left unmodified.
File:      /apps/all/IPython/8.17.2-GCCcore-13.2.0/lib/python3.11/site-packages/IPython/core/magics/execution.py

cProfile module: the real Python profiler that collects detailed function call stats (time spent per function, number of calls, etc.), you use it by writing Python code like cProfile.run(‘your_code()’).
%prun magic: an IPython shortcut for cProfile; it runs profiling inside Jupyter/IPython without needing to manually import or call cProfile.

Other tools¶

Besides %prun, there are several other useful profiling tools in IPython. %lprun runs a line-by-line profiler, letting you see how much time each line of code takes. %memit measures the memory usage of a single statement, which is helpful when optimizing memory-heavy code. %mprun combines both ideas, running a line-by-line memory profiler to show how much memory is used by each line of a function.

%lprun: Run code with the line-by-line profiler
%memit: Measure the memory use of a single statement
%mprun: Run code with the line-by-line memory profiler

Conclusion¶

Keypoints

%time measures a single execution; %timeit repeats the code for statistical accuracy
cProfile provides deterministic profiling with ncalls, tottime, cumtime metrics
tottime excludes sub-function calls; cumtime includes all time spent in that call tree
%prun is the IPython shortcut to cProfile, available directly in notebooks
Line-by-line profilers (%lprun) reveal exactly which lines consume the most time
Memory profilers (%memit, %mprun) are critical for optimization in data-heavy workflows
Profiler output is ordered by cumulative time to highlight where to focus optimization efforts
timeit module can also run from command line (python -m timeit) for external script profiling