Numba¶

Objectives

Understand just-in-time (JIT) compilation and how Numba accelerates Python code to near-C performance
Apply @jit and @njit decorators for CPU compilation of computationally intensive functions
Leverage numba.prange to parallelize nested loops and utilize multicore processors
Optimize numerical algorithms (prime checking, integration, particle simulation, Monte Carlo)
Distinguish between nopython and object compilation modes and select appropriate modes
Create universal functions (ufuncs) using @vectorize for efficient array operations
Benchmark and measure performance gains from Numba compilation
Understand Numba’s type-specialization and numerical focus for data-driven applications

Instructor note

Start with performance comparisons (57x speedup in prime number example) to motivate adoption
Emphasize that Numba requires type consistency; prototype in object mode before nopython
Highlight that overhead is present—don’t use Numba for trivial functions (e.g., built-in math.hypot is faster)
Show the .py_func attribute for validating compiled vs uncompiled implementations
Explain that @njit(parallel=True) with numba.prange replaces loop parallelization explicitly
Demonstrate real-world use cases: financial modeling, particle physics, Monte Carlo simulations
Guide students to refactor toward nopython mode for optimal performance; object mode is for development
Clarify that Numba compilation happens on first function call; subsequent calls use cached machine code

Numba is a just-in-time compiler for python that converts python functions into optimized machine code at runtime. In other words, user-defined functions written in python would be run at native machine code speed. For example, a programmer can delegate functions that are computationally intensive (especially those with consecutive nested loops and arrays) within his/her code to Numba execution and gain speed up. This is achievable by placing Numba decorator at the top of a user-define function. A Numba decorator determines how a function would be compiled, and more on it would be explained later in the notebook. Numba has huge support for NumPy library and also enables parallel programming on CPU (multicore) and GPU (via CUDA API binding), thus, making execution on NumPy arrays faster.

our focus is on accelerating Python programs using Numba for CPU, with an emphasis on parallel programming from a conceptual standpoint inspired by CUDA kernel design. While CUDA typically refers to GPU computing, many of its core ideas—such as launching multiple threads to execute a function in parallel—can be applied to CPUs as well. We’ll draw from these concepts to better understand how parallel computation works, even when targeting the CPU.

Python is known for its simplicity and rich ecosystem of libraries, making it a popular choice across diverse fields. However, its interpreted nature can limit performance in computationally heavy applications. Traditionally, developers have turned to compiled languages like C++, C#, or Rust for speed. Numba changes this landscape by introducing just-in-time (JIT) compilation, which can dramatically improve the performance of Python code—especially numerical and scientific workloads—without requiring a switch to another language. Throughout this notebook, we’ll also use terms like Host (referring to the CPU), Kernel (a user-defined function optimized for parallel execution), and Threading (the concept of running tasks concurrently on the CPU), as we explore how Numba allows Python to approach the speed of low-level languages directly on the CPU.

Prime Number Algorithm:¶

A prime number is a natural number greater than 1 that has no positive divisors other than 1 and itself. Brute-force searching involves checking each number within the given range to determine if it is prime. We’ll use a simple function to check for prime numbers.

Python Implementation (Without Numba):¶

import time

def is_prime(num):
    if num <= 1:
        return False
    for i in range(2, int(num**0.5) + 1):
        if num % i == 0:
            return False
    return True

def find_primes(start, end):
    count = 0
    for num in range(start, end + 1):
        if is_prime(num):
            count += 1
    return count

def main():
    start = 0
    end = 10000000
    # Find primes and measure execution time
    start_time = time.time()
    primes_count = find_primes(start, end)
    end_time = time.time()
    execution_time = end_time - start_time
    print("Execution time:", execution_time, "seconds")
    print("Total prime numbers found:", primes_count)

main()

Execution time: 132.94145512580872 seconds
Total prime numbers found: 664579

Python Execution Time: 62.8 seconds (too slow)

Python Implementation (With Numba):¶

!pip install numba

Defaulting to user installation because normal site-packages is not writeable
Collecting numba
  Downloading numba-0.61.2-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (2.8 kB)
Collecting llvmlite<0.45,>=0.44.0dev0 (from numba)
  Downloading llvmlite-0.44.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.8 kB)
Requirement already satisfied: numpy<2.3,>=1.24 in /home/gth/.local/lib/python3.11/site-packages (from numba) (2.2.4)
Downloading numba-0.61.2-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (3.8 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.8/3.8 MB 53.2 MB/s eta 0:00:00
?25hDownloading llvmlite-0.44.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (42.4 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 42.4/42.4 MB 92.0 MB/s eta 0:00:00:00:0100:01
?25hInstalling collected packages: llvmlite, numba
  Attempting uninstall: llvmlite
    Found existing installation: llvmlite 0.43.0
    Uninstalling llvmlite-0.43.0:
      Successfully uninstalled llvmlite-0.43.0
Successfully installed llvmlite-0.44.0 numba-0.61.2

[notice] A new release of pip is available: 24.3.1 -> 25.1
[notice] To update, run: pip install --upgrade pip
Note: you may need to restart the kernel to use updated packages.

import time
import numba

@numba.jit
def is_prime_numba(num):
    if num <= 1:
        return False
    for i in range(2, int(num**0.5) + 1):
        if num % i == 0:
            return False
    return True

@numba.njit(fastmath=True, cache=True, parallel=True)
def find_primes_numba(start, end):
    # return [num for num in numba.prange(start, end + 1) if is_prime_numba(num)]
    
    count = 0
    for num in numba.prange(start, end + 1):
        if is_prime_numba(num):
            count += 1
    return count

def main():
    start = 0
    end = 10000000
    # Find primes and measure execution time
    start_time = time.time()
    primes_count = find_primes_numba(start, end)
    end_time = time.time()
    execution_time = end_time - start_time
    print("Execution time (with Numba):", execution_time, "seconds")
    print("Total prime numbers found:", primes_count)

main()

Execution time (with Numba): 1.0873942375183105 seconds
Total prime numbers found: 664579

we can see that with Numba the time was a 57X faster

Example — Numerical Integration (Trapezoidal Rule) Explanation: The trapezoidal rule is a numerical integration method used to approximate the definite integral of a function. It divides the area under the curve of the function into trapezoids and sums up their areas to approximate the integral.

import time
import numba

def f(x):
    # The function to be integrated
    return x**2

def numerical_integration_without_numba(f, a, b, n):
    h = (b - a) / n
    integral = (f(a) + f(b)) / 2.0
    for i in range(1, n):
        x = a + i * h
        integral += f(x)
    integral *= h
    return integral


@numba.jit
def g(x):
    # The function to be integrated
    return x**2

@numba.jit
def numerical_integration_with_numba(f, a, b, n):
    h = (b - a) / n
    integral = (f(a) + f(b)) / 2.0
    for i in range(1, n):
        x = a + i * h
        integral += f(x)
    integral *= h
    return integral

def main():
    a = 0.0  # Lower limit of integration
    b = 1.0  # Upper limit of integration
    n = 10000000  # Number of trapezoids
    
    # Without Numba
    start_time = time.time()
    result_without_numba = numerical_integration_without_numba(f, a, b, n)
    end_time = time.time()
    execution_time_without_numba = end_time - start_time

    print("Numerical Integration without Numba:")
    print("Result:", result_without_numba)
    print("Execution time:", execution_time_without_numba, "seconds")

    # With Numba
    start_time = time.time()
    result_with_numba = numerical_integration_with_numba(g, a, b, n)
    end_time = time.time()
    execution_time_with_numba = end_time - start_time

    print("Numerical Integration with Numba:")
    print("Result:", result_with_numba)
    print("Execution time:", execution_time_with_numba, "seconds")

if __name__ == "__main__":
    main()

Numerical Integration without Numba:
Result: 0.33333333333335957
Execution time: 2.276294231414795 seconds
Numerical Integration with Numba:
Result: 0.33333333333335957
Execution time: 0.268261194229126 seconds

Example — Simulation of Particle Motion with Constant Force Explanation: In this example, we’ll simulate the motion of a particle moving under the influence of a constant force. We’ll use the equations of motion to update the particle’s position and velocity over time.

import time
import numba


def simulate_particle_motion_without_numba(mass, initial_position, initial_velocity, constant_force, time_step, num_steps):
    position = initial_position
    velocity = initial_velocity

    for _ in range(num_steps):
        acceleration = constant_force / mass
        velocity += acceleration * time_step
        position += velocity * time_step

    return position


@numba.jit
def simulate_particle_motion_with_numba(mass, initial_position, initial_velocity, constant_force, time_step, num_steps):
    position = initial_position
    velocity = initial_velocity

    for _ in range(num_steps):
        acceleration = constant_force / mass
        velocity += acceleration * time_step
        position += velocity * time_step

    return position


def main():
    # Particle parameters
    mass = 1.0
    initial_position = 0.0
    initial_velocity = 0.0
    constant_force = 10.0

    # Simulation parameters
    time_step = 0.01
    num_steps = 10000000

    # Without Numba
    start_time = time.time()
    final_position_without_numba = simulate_particle_motion_without_numba(mass, initial_position, initial_velocity, constant_force, time_step, num_steps)
    end_time = time.time()
    execution_time_without_numba = end_time - start_time

    print("Simulation without Numba:")
    print("Final Position:", final_position_without_numba)
    print("Execution time:", execution_time_without_numba, "seconds")

    # With Numba
    start_time = time.time()
    final_position_with_numba = simulate_particle_motion_with_numba(mass, initial_position, initial_velocity, constant_force, time_step, num_steps)
    end_time = time.time()
    execution_time_with_numba = end_time - start_time

    print("Simulation with Numba:")
    print("Final Position:", final_position_with_numba)
    print("Execution time:", execution_time_with_numba, "seconds")


if __name__ == "__main__":
    main()

Simulation without Numba:
Final Position: 50000004994.55559
Execution time: 1.352325677871704 seconds
Simulation with Numba:
Final Position: 50000004994.55559
Execution time: 0.09574770927429199 seconds

Example — Option Pricing with Monte Carlo Simulation Explanation: Monte Carlo simulation is a widely used technique for option pricing in finance. It involves simulating the future stock price using random walks and then calculating the option payoff based on the simulated stock prices.

import numpy as np
import time
import numba


def option_pricing_without_numba(S0, K, r, sigma, T, num_simulations, num_steps):
    dt = T / num_steps
    total_payoff = 0.0

    for _ in range(num_simulations):
        S = S0
        for _ in range(num_steps):
            epsilon = np.random.normal(0.0, 1.0)
            S *= np.exp((r - 0.5 * sigma**2) * dt + sigma * np.sqrt(dt) * epsilon)

        total_payoff += max(S - K, 0)

    option_price = total_payoff / num_simulations
    return option_price


@numba.jit
def option_pricing_with_numba(S0, K, r, sigma, T, num_simulations, num_steps):
    dt = T / num_steps
    total_payoff = 0.0

    for _ in range(num_simulations):
        S = S0
        for _ in range(num_steps):
            epsilon = np.random.normal(0.0, 1.0)
            S *= np.exp((r - 0.5 * sigma**2) * dt + sigma * np.sqrt(dt) * epsilon)

        total_payoff += max(S - K, 0)

    option_price = total_payoff / num_simulations
    return option_price


def main():
    # Option parameters
    S0 = 100.0  # Initial stock price
    K = 100.0   # Strike price
    r = 0.05    # Risk-free interest rate
    sigma = 0.2 # Volatility (standard deviation of returns)
    T = 1.0     # Time to expiration (in years)

    # Monte Carlo simulation parameters
    num_simulations = 100000  # Number of simulations
    num_steps = 252           # Number of steps (days) for each simulation
    
    # Without Numba
    start_time = time.time()
    option_price_without_numba = option_pricing_without_numba(S0, K, r, sigma, T, num_simulations, num_steps)
    end_time = time.time()
    execution_time_without_numba = end_time - start_time

    print("Option Pricing without Numba:")
    print("Option Price:", option_price_without_numba)
    print("Execution time:", execution_time_without_numba, "seconds")

    # With Numba
    start_time = time.time()
    option_price_with_numba = option_pricing_with_numba(S0, K, r, sigma, T, num_simulations, num_steps)
    end_time = time.time()
    execution_time_with_numba = end_time - start_time

    print("Option Pricing with Numba:")
    print("Option Price:", option_price_with_numba)
    print("Execution time:", execution_time_with_numba, "seconds")

if __name__ == "__main__":
    main()

Option Pricing without Numba:
Option Price: 10.972173920731281
Execution time: 142.18183946609497 seconds
Option Pricing with Numba:
Option Price: 11.018405419095494
Execution time: 2.473653554916382 seconds

Parallelization:¶

As demonstrated in the additional example, Numba’s support for parallel processing allows developers to fully utilize multicore processors and tackle large-scale parallel computations efficiently.

Example — Matrix Multiplication with Parallelization Explanation: Matrix multiplication is a computationally intensive task that can benefit from parallelization. We’ll use Numba’s numba.prange function to parallelize the nested loops for matrix multiplication, taking advantage of multiple CPU cores.

import numpy as np
import time
import numba


def matrix_multiply_without_numba(A, B):
    m, n, p = A.shape[0], A.shape[1], B.shape[1]
    result = np.zeros((m, p), dtype=np.float64)

    for i in range(m):
        for j in range(p):
            for k in range(n):
                result[i, j] += A[i, k] * B[k, j]

    return result


@numba.njit(parallel=True)
def matrix_multiply_with_numba(A, B):
    m, n, p = A.shape[0], A.shape[1], B.shape[1]
    result = np.zeros((m, p), dtype=np.float64)

    for i in numba.prange(m):
        for j in range(p):
            for k in range(n):
                result[i, j] += A[i, k] * B[k, j]

    return result

def main():
    # Generate large random matrices
    size = 200
    A = np.random.rand(size, size)
    B = np.random.rand(size, size)
    
    # Without Numba
    start_time = time.time()
    result_without_numba = matrix_multiply_without_numba(A, B)
    end_time = time.time()
    execution_time_without_numba = end_time - start_time

    print("Matrix Multiplication without Numba:")
    print("Execution time:", execution_time_without_numba, "seconds")

    # With Numba Parallelization
    start_time = time.time()
    result_with_numba = matrix_multiply_with_numba(A, B)
    end_time = time.time()
    execution_time_with_numba = end_time - start_time

    print("Matrix Multiplication with Numba Parallelization:")
    print("Execution time:", execution_time_with_numba, "seconds")

if __name__ == "__main__":
    main()

Matrix Multiplication without Numba:
Execution time: 2.1771888732910156 seconds
Matrix Multiplication with Numba Parallelization:
Execution time: 0.25493907928466797 seconds

In all the test cases, you will observe a noticeable advantage in using Numba when dealing with large datasets. The functions optimized with Numba consistently outperform the Python implementations without Numba. As the data size increases, the benefit of using Numba becomes even more pronounced, resulting in significant performance improvements. Numba proves to be a valuable asset in scenarios where enhanced execution speed is crucial, such as scientific computing, machine learning, computational physics, financial modeling, and parallel processing. Its ability to harness the power of just-in-time compilation and parallel processing enables developers to achieve remarkable performance gains, especially when dealing with extensive and computationally intensive tasks. As the data scales up, Numba’s impact on speeding up operations becomes increasingly evident, making it an indispensable tool for data-driven applications.

Let’s break down those terms:

function compiler: Numba compiles Python functions, not entire applications, and not parts of functions. Numba does not replace your Python interpreter, but is just another Python module that can turn a function into a (usually) faster function.

type-specializing: Numba speeds up your function by generating a specialized implementation for the specific data types you are using. Python functions are designed to operate on generic data types, which makes them very flexible, but also very slow. In practice, you only will call a function with a small number of argument types, so Numba will generate a fast implementation for each set of types.

just-in-time: Numba translates functions when they are first called. This ensures the compiler knows what argument types you will be using. This also allows Numba to be used interactively in a Jupyter notebook just as easily as a traditional application.

numerically-focused: Currently, Numba is focused on numerical data types, like int, float, and complex. There is very limited string processing support, and many string use cases are not going to work well on the GPU. To get best results with Numba, you will likely be using NumPy arrays.

Compile for CPU¶

The Numba compiler is typically enabled by applying a function decorator to a Python function. Decorators are function modifiers that transform the Python functions they decorate, using a very simple syntax. Here we will use Numba’s CPU compilation decorator @jit:

from numba import jit
import math

# This is the function decorator syntax and is equivalent to `hypot = jit(hypot)`.
# The Numba compiler is just a function you can call whenever you want!
@jit
def hypot(x, y):
    # Implementation from https://en.wikipedia.org/wiki/Hypot
    x = abs(x);
    y = abs(y);
    t = min(x, y);
    x = max(x, y);
    t = t / x;
    return x * math.sqrt(1+t*t)

hypot(3.0, 4.0) 

5.0

The first time we call hypot, the compiler is triggered and compiles a machine code implementation of the function for float inputs. Numba also saves the original Python implementation of the function in the .py_func attribute, so we can call the original Python code to make sure we get the same answer:

hypot.py_func(3.0, 4.0) 

5.0

%timeit hypot.py_func(3.0, 4.0)

155 ns ± 1.58 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

%timeit hypot(3.0, 4.0)

80 ns ± 0.39 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

%timeit math.hypot(3.0, 4.0)

36.3 ns ± 0.253 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

Python’s built-in is even faster than Numba! This is because Numba does introduce some overhead to each function call that is larger than the function call overhead of Python itself. Extremely fast functions (like the above one) will be hurt by this.

#### Exercise: Use Numba to Compile a Function for the CPU

The following function uses the Monte Carlo Method to determine Pi (source code from the Numba homepage). The function itself is already working so don’t worry about the mathematical implementation details.

Complete the two TODOs in order to compile monte_carlo_pi with Numba before executing the following 3 cells which will:

Confirm the compiled version is behaving the same as the uncompiled version.
Benchmark the uncompiled version.
Benchmark the compiled version.

nsamples = 1_000_000 

import random

# TODO: Use the Numba compiler to compile this function

def monte_carlo_pi(nsamples):
    acc = 0
    for i in range(nsamples):
        x = random.random()
        y = random.random()
        if (x**2 + y**2) < 1.0:
            acc += 1
    return 4.0 * acc / nsamples 

Solution:

# TODO: Import Numba's just-in-time compiler function
import random
from numba import jit

# TODO: Use the Numba compiler to compile this function
@jit
def monte_carlo_pi(nsamples):
    acc = 0
    for i in range(nsamples):
        x = random.random()
        y = random.random()
        if (x**2 + y**2) < 1.0:
            acc += 1
    return 4.0 * acc / nsamples 

Test and benchmark

# We will use numpy's `testing` library to confirm compiled and uncompiled versions run the same
from numpy import testing

# This assertion will fail until you successfully complete the exercise one cell above
testing.assert_almost_equal(monte_carlo_pi(nsamples), monte_carlo_pi.py_func(nsamples), decimal=2)  

%timeit monte_carlo_pi(nsamples)

12.6 ms ± 7.03 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit monte_carlo_pi.py_func(nsamples)

358 ms ± 1.73 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Object and nopython Modes

Numba cannot compile all Python code. Some functions don’t have a Numba-translation, and some Python types can’t be efficiently compiled at all. For example, Numba does not support dictionaries. Let’s try to compile some Python code that Numba does not yet know how to compile:

@jit(forceobj=True)
def cannot_compile(x):
    return x['key']

cannot_compile(dict(key='value')) 

'value'

Numba provides two primary compilation modes: nopython and object mode. The @jit(nopython=True) decorator tells Numba to fully compile a function to machine code, avoiding Python objects entirely for maximum performance—this is often referred to as “nopython mode.” On the other hand, @jit(forceobj=True) forces object mode, allowing Numba to compile functions that use dynamic typing, print statements, or other features unsupported in nopython mode. While object mode is slower due to reliance on Python’s object model, it’s helpful during development, debugging, or when using Python features that don’t yet work in nopython mode. Ideally, you prototype in object mode and gradually refactor toward nopython mode for optimal speed.

@jit(nopython=True)
def cannot_compile(x):
    return x['key']

cannot_compile(dict(key='value')) 

---------------------------------------------------------------------------
TypingError                               Traceback (most recent call last)
Cell In[32], line 5
      1 @jit(nopython=True)
      2 def cannot_compile(x):
      3     return x['key']
----> 5 cannot_compile(dict(key='value')) 

File ~/Desktop/uv-exp/.venv/lib/python3.13/site-packages/numba/core/dispatcher.py:424, in _DispatcherBase._compile_for_args(self, *args, **kws)
    420         msg = (f"{str(e).rstrip()} \n\nThis error may have been caused "
    421                f"by the following argument(s):\n{args_str}\n")
    422         e.patch_message(msg)
--> 424     error_rewrite(e, 'typing')
    425 except errors.UnsupportedError as e:
    426     # Something unsupported is present in the user code, add help info
    427     error_rewrite(e, 'unsupported_error')

File ~/Desktop/uv-exp/.venv/lib/python3.13/site-packages/numba/core/dispatcher.py:365, in _DispatcherBase._compile_for_args.<locals>.error_rewrite(e, issue_type)
    363     raise e
    364 else:
--> 365     raise e.with_traceback(None)

TypingError: Failed in nopython mode pipeline (step: nopython frontend)
non-precise type pyobject
During: typing of argument at /var/folders/j6/w9gcx06114d30t6kgbxck3tw0000gn/T/ipykernel_38783/4228235441.py (1)

File "../../../../var/folders/j6/w9gcx06114d30t6kgbxck3tw0000gn/T/ipykernel_38783/4228235441.py", line 1:
<source missing, REPL/exec in use?>

During: Pass nopython_type_inference 

This error may have been caused by the following argument(s):
- argument 0: Cannot determine Numba type of <class 'dict'>

Now we get an exception when Numba tries to compile the function, and if you scroll down to the end of the exception output you will see an error that describes the underlying problem:

- argument 0: cannot determine Numba type of <class 'dict'>

Using nopython mode is the recommended and best practice way to use jit as it leads to the best performance.

We will start by optimizing NumPy Universal Functions (ufuncs) for the CPU using Numba. ufuncs, which apply the same operation to each element of a NumPy array, are naturally parallelizable. Numba simplifies this by allowing you to decorate a scalar function with @vectorize, and it handles the compilation and parallelization for the CPU. In this example, we’ll use the @vectorize decorator to compile and optimize a ufunc for CPU execution, improving performance without needing complex C code.

import numpy as np 
from numba import vectorize

@vectorize
def add_ten(num):
    return num + 10 # This scalar operation will be performed on each element  

nums = np.arange(10)
add_ten(nums) # pass the whole array into the ufunc, it performs the operation on each element 

array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19])

We are generating a ufunc for the CPU by specifying an explicit type signature and setting the target attribute to ‘cpu’. The type signature defines the data types for both the ufunc’s input arguments and its return value. By setting the target=’cpu’, Numba will compile the ufunc to run efficiently on the CPU. This enables optimized performance for element-wise operations on NumPy arrays, leveraging CPU parallelism through Numba’s just-in-time (JIT) compilation.

return_value_type(argument1_value_type, argument2_value_type, ...)

Here is a simple example of a ufunc.

%%writefile numba_ex_1.py

import numpy as np
from numba import vectorize

@vectorize(['int64(int64, int64)'], target='cpu') # Type cuda for the GPU
def add_ufunc(x, y):
    return x + y

a = np.array([1, 2, 3, 4])
b = np.array([10, 20, 30, 40])

c = add_ufunc(a, b)
print("a + b = ", c)  

Overwriting numba_ex_1.py

!python numba_ex_1.py

a + b =  [11 22 33 44]

For such a simple function call, a lot of processes happen automatically with Numba when running on the CPU:

Compiled the ufunc to execute the operation in parallel over all the input elements on the CPU.
Allocated memory for the inputs and output on the host (CPU).
Executed the ufunc across all input elements using CPU parallelism, optimized through Numba’s JIT compilation.
Returned the result as a NumPy array on the host (CPU).

Compared to an implementation in C, this process is much more concise. You might be wondering how fast our simple example is on the CPU. Let’s check and see the performance!

%%writefile numba_ex_1.py

import numpy as np
from numba import vectorize
import time

@vectorize(['int64(int64, int64)'], target='cpu') # Type signature and target are required for the GPU
def add_ufunc(x, y):
    return x + y

a = np.array([1, 2, 3, 4])
b = np.array([10, 20, 30, 40])

t0 = time.time()
c = add_ufunc(a, b)
t1 = time.time()
t_diff_01 = t1-t0
print("Numba: a + b = ", c, "\n numba-timing: ", t_diff_01)

t2 = time.time()
d = np.add(a, b)
t3 = time.time()
t_diff_23 = t3-t2
print("Numpy: a + b = ", c, "\n numpy-timing: ", t_diff_23)  

Overwriting numba_ex_1.py

!python numba_ex_1.py

Numba: a + b =  [11 22 33 44] 
 numba-timing:  4.291534423828125e-05
Numpy: a + b =  [11 22 33 44] 
 numpy-timing:  1.621246337890625e-05

“zero suppression” function. A common operation when working with waveforms is to force all sample values below a certain absolute magnitude to be zero, as a way to eliminate low amplitude noise. Let’s make some sample data:

# This allows us to plot right here in the notebook
%matplotlib inline

from matplotlib import pyplot as plt

n = 100000
noise = np.random.normal(size=n) * 3
pulses = np.maximum(np.sin(np.arange(n) / (n / 23)) - 0.3, 0.0)
waveform = ((pulses * 300) + noise).astype(np.int16)
plt.plot(waveform) 

[<matplotlib.lines.Line2D at 0x14d544a28c90>]

../_images/1c47e1f83fc572a82f8cd9ee2c415168c427ccbfda69c9430ce2476d4142010f.png

Now decorate this zero_suppress function to run as a vectorized ufunc on the cpu device with a threshold of 15 and print the result. Make sure you write all the relevant code to a python file, write a job script and submit it.

def zero_suppress(waveform_value, threshold):
    if waveform_value < threshold:
        result = 0
    else:
        result = waveform_value
    return result 

Solution

%%writefile numba_ex_4.py

import math
import numpy as np
from numba import vectorize
from matplotlib import pyplot as plt

n = 100000
noise = np.random.normal(size=n) * 3
pulses = np.maximum(np.sin(np.arange(n) / (n / 23)) - 0.3, 0.0)
waveform = ((pulses * 300) + noise).astype(np.int16)

@vectorize(['float32(int16, float32)'], target='cpu')
def zero_suppress(waveform_value, threshold):
    if waveform_value < threshold:
        result = 0
    else:
        result = waveform_value
    return result

result = zero_suppress(waveform, 15)
print(result)  

Overwriting numba_ex_4.py

! python numba_ex_4.py

[0. 0. 0. ... 0. 0. 0.]

Exercice — Linear Regression: Linear regression is a popular supervised learning algorithm used for predicting a continuous target variable based on one or more predictor variables. In this exercice, we’ll perform simple linear regression with one predictor variable.

import numpy as np
import time
import numba


def linear_regression_without_numba(X, y):
    n = len(X)
    X_mean = np.mean(X)
    y_mean = np.mean(y)

    numerator = 0.0
    denominator = 0.0

    for i in range(n):
        numerator += (X[i] - X_mean) * (y[i] - y_mean)
        denominator += (X[i] - X_mean) ** 2

    slope = numerator / denominator
    intercept = y_mean - slope * X_mean
    return slope, intercept

Conclusion¶

Keypoints

Numba is a function-level JIT compiler optimized for numerical computing, not a full application compiler
Type-specializing generates fast implementations for specific argument types at compile time
@jit defaults to eager compilation; @jit(nopython=True) enforces full compilation or fails cleanly
@njit is shorthand for @jit(nopython=True) and is the recommended production mode
Dictionaries and many Python objects cannot be compiled in nopython mode
@vectorize(['float64(float64)'], target='cpu') creates optimized ufuncs for NumPy arrays
Speedups increase dramatically with data size; overhead dominates for small inputs
Numba supports random, NumPy functions, and mathematical operations natively in nopython mode