Quiz¶
Day 1: Python Fundamentals & Essentials¶
Multiple choice, single answer - Python Essentials
What is the main benefit of using virtual environments in Python?
A) They make Python code run faster
B) They allow isolated project dependencies without conflicts
C) They are required to run Python code
D) They reduce the size of Python files
Which of the following is NOT a basic Python data structure?
A) List
B) Dictionary
C) Array (built-in)
D) Tuple
What is the purpose of the if __name__ == "__main__": statement?
A) To check if the script is imported as a module
B) To define the main function
C) To prevent code execution when the module is imported
D) Both A and C
How does string slicing work in Python?
A) s[start:end] includes both start and end indices
B) s[start:end] excludes the end index
C) s[start:end] always returns a list
D) Slicing modifies the original string
Which keyword is used to create a function in Python?
A) function
B) def
C) func
D) define
Multiple choice, single answer - Virtual Environments & HPC Modules
What does a virtual environment provide?
A) A separate Python interpreter
B) A sandboxed directory for project-specific packages
C) A containerization solution
D) Direct access to HPC cluster resources
What is the purpose of venv module?
A) It validates Python code
B) It creates lightweight isolated Python environments
C) It manages HPC job submissions
D) It optimizes NumPy performance
What is an HPC module system used for?
A) Running Python scripts in parallel
B) Managing access to compilers, libraries, and software versions
C) Creating Python classes
D) Storing data in memory
When you load an HPC module with module load gcc/11.2, what happens?
A) It installs GCC on your machine
B) It adds the GCC compiler to your current shell environment
C) It copies GCC files to your home directory
D) It automatically compiles your code
Multiple choice, single answer - Benchmarking & Profiling
What is the main purpose of benchmarking code?
A) To make code look good
B) To measure execution time and performance characteristics
C) To fix all bugs
D) To replace slow code with faster alternatives
Which Python module is commonly used for timing code execution?
A) sys
B) os
C) timeit
D) datetime
What does profiling reveal about your code?
A) Only the total execution time
B) Which functions consume the most CPU and memory
C) Syntax errors
D) Variable types
What is the difference between wall-clock time and CPU time?
A) Wall-clock time includes I/O waits; CPU time is actual computation
B) CPU time is always longer
C) Wall-clock time is only for single-core execution
D) They are the same thing
Multiple choice, single answer - NumPy
Why are NumPy arrays more efficient than Python lists for numerical computations?
A) They are always smaller in size
B) They store homogeneous data in contiguous memory blocks
C) They automatically parallelize operations
D) They require less code to write
What does vectorization mean in NumPy?
A) Replacing loops with whole-array operations
B) Converting lists to tuples
C) Compiling Python to C++
D) Creating visual representations of data
What is the shape of a 3×4×2 NumPy array?
A) 3 elements
B) 24 elements
C) (3, 4, 2)
D) 9 elements
Which operation is most efficient in NumPy?
A)
for i in range(len(arr)): result[i] = arr[i] ** 2B)
result = arr ** 2(vectorized)C) Both are equally efficient
D) The for loop is always faster
Day 2: High-Performance Computing Techniques¶
Multiple choice, single answer - Cython
What is Cython primarily used for?
A) Writing web servers
B) Compiling Python code to C for performance improvement
C) Creating interactive plots
D) Managing data in databases
What type hint syntax does Cython use to optimize code?
A) Python type hints only
B) C-style type declarations like
cdef int xC) Fortran-style declarations
D) All of the above
Multiple choice, single answer - Dask
What is Dask designed for?
A) Creating dashboards
B) Parallel computing with out-of-core data processing
C) Building web applications
D) Creating databases
How does Dask differ from NumPy?
A) Dask is only for statistics
B) Dask handles larger-than-memory datasets with lazy evaluation
C) Dask requires GPU acceleration
D) Dask is slower for all operations
What is lazy evaluation in Dask?
A) Dask makes decisions slowly
B) Computations are deferred until explicitly requested
C) Data is loaded one row at a time
D) Results are cached indefinitely
Multiple choice, single answer - Numba
What does Numba do with Python functions?
A) Converts them to Java bytecode
B) Compiles them to machine code using LLVM
C) Makes them run in parallel on your CPU
D) Both B and C
Which decorator enables Numba JIT compilation?
A) @compile
B) @jit
C) @numba.compile
D) @fast
Multiple choice, single answer - SLURM & HPC Scheduling
What is SLURM?
A) A storage system for large files
B) A job scheduler and resource manager for HPC clusters
C) A Python library for data processing
D) A protocol for network communication
What information does an sbatch script typically specify?
A) Job name, number of tasks, time limit, and computation commands
B) Only the Python version
C) Only the memory allocation
D) Only the machine hostname
What does the #SBATCH directive do in a submission script?
A) It creates a comment that is ignored
B) It specifies job parameters for the resource manager
C) It imports the SBATCH module
D) It runs the script immediately
Multiple choice, single answer - Containerization
What is containerization used for in HPC?
A) Storing data between runs
B) Creating reproducible, isolated computing environments
C) Running only Python code
D) Replacing virtual machines
What is Apptainer (formerly Singularity)?
A) A Python package manager
B) A container platform for HPC environments
C) A job scheduler
D) A plotting library
What is the advantage of using containers for reproducibility?
A) Code runs faster
B) Same dependencies and environment can be deployed anywhere
C) It reduces code complexity
D) Containers eliminate the need for testing
Coding Challenges¶
Coding questions - Day 1
Python Essentials & Performance Measurement
Write a Python script that:
Creates a list of 1 million integers
Times how long it takes to double each element using a for loop
Compares it to a NumPy array of the same size
Report the speedup factor
Create a function that:
Takes a dictionary of package names and versions as input
Outputs a requirements.txt formatted list
Handles cases where versions are not specified (use “latest”)
Implement a simple profiler that:
Measures the time spent in each function of a small program
Outputs the results in descending order of time
(Can use
cProfileor manual timing)
Virtual Environments & Module Loading
Write a bash script that:
Creates a Python virtual environment
Installs three common scientific packages (NumPy, SciPy, Matplotlib)
Activates the environment and runs a simple Python script
Coding questions - Day 2
NumPy Performance
Given two large 2D arrays A and B:
Compute their element-wise product using vectorization
Benchmark vs. using nested for loops
Calculate memory usage for both approaches
Create a function that:
Reads a CSV file with numerical data
Fills missing values with column means
Applies z-score normalization to each column
Returns the cleaned array
Cython Optimization
Write a slow Python function that computes Fibonacci numbers recursively, then:
Create a Cython version with type hints
Compare execution time for computing fib(30)
Measure the speedup achieved
Dask & Parallel Processing
Using Dask:
Create a large synthetic dataset (cannot fit in memory)
Compute groupby aggregations partitioned across chunks
Compare computation time with pandas (if dataset fits in memory)
Numba JIT Compilation
Write a function that performs Monte Carlo Pi estimation:
Generate random points in a unit square
Count points inside a quarter circle
Implement both regular Python and Numba JIT versions
Report the speedup and estimated Pi value
HPC & Job Scheduling
Create an SBATCH submission script that:
Requests 4 CPU cores for 10 minutes
Loads necessary modules
Runs a Python script that uses NumPy and timing
Writes output to a log file with timestamp