Quiz

Day 1: Python Fundamentals & Essentials

Multiple choice, single answer - Python Essentials

What is the main benefit of using virtual environments in Python?

  • A) They make Python code run faster

  • B) They allow isolated project dependencies without conflicts

  • C) They are required to run Python code

  • D) They reduce the size of Python files

Which of the following is NOT a basic Python data structure?

  • A) List

  • B) Dictionary

  • C) Array (built-in)

  • D) Tuple

What is the purpose of the if __name__ == "__main__": statement?

  • A) To check if the script is imported as a module

  • B) To define the main function

  • C) To prevent code execution when the module is imported

  • D) Both A and C

How does string slicing work in Python?

  • A) s[start:end] includes both start and end indices

  • B) s[start:end] excludes the end index

  • C) s[start:end] always returns a list

  • D) Slicing modifies the original string

Which keyword is used to create a function in Python?

  • A) function

  • B) def

  • C) func

  • D) define

Multiple choice, single answer - Virtual Environments & HPC Modules

What does a virtual environment provide?

  • A) A separate Python interpreter

  • B) A sandboxed directory for project-specific packages

  • C) A containerization solution

  • D) Direct access to HPC cluster resources

What is the purpose of venv module?

  • A) It validates Python code

  • B) It creates lightweight isolated Python environments

  • C) It manages HPC job submissions

  • D) It optimizes NumPy performance

What is an HPC module system used for?

  • A) Running Python scripts in parallel

  • B) Managing access to compilers, libraries, and software versions

  • C) Creating Python classes

  • D) Storing data in memory

When you load an HPC module with module load gcc/11.2, what happens?

  • A) It installs GCC on your machine

  • B) It adds the GCC compiler to your current shell environment

  • C) It copies GCC files to your home directory

  • D) It automatically compiles your code

Multiple choice, single answer - Benchmarking & Profiling

What is the main purpose of benchmarking code?

  • A) To make code look good

  • B) To measure execution time and performance characteristics

  • C) To fix all bugs

  • D) To replace slow code with faster alternatives

Which Python module is commonly used for timing code execution?

  • A) sys

  • B) os

  • C) timeit

  • D) datetime

What does profiling reveal about your code?

  • A) Only the total execution time

  • B) Which functions consume the most CPU and memory

  • C) Syntax errors

  • D) Variable types

What is the difference between wall-clock time and CPU time?

  • A) Wall-clock time includes I/O waits; CPU time is actual computation

  • B) CPU time is always longer

  • C) Wall-clock time is only for single-core execution

  • D) They are the same thing

Multiple choice, single answer - NumPy

Why are NumPy arrays more efficient than Python lists for numerical computations?

  • A) They are always smaller in size

  • B) They store homogeneous data in contiguous memory blocks

  • C) They automatically parallelize operations

  • D) They require less code to write

What does vectorization mean in NumPy?

  • A) Replacing loops with whole-array operations

  • B) Converting lists to tuples

  • C) Compiling Python to C++

  • D) Creating visual representations of data

What is the shape of a 3×4×2 NumPy array?

  • A) 3 elements

  • B) 24 elements

  • C) (3, 4, 2)

  • D) 9 elements

Which operation is most efficient in NumPy?

  • A) for i in range(len(arr)): result[i] = arr[i] ** 2

  • B) result = arr ** 2 (vectorized)

  • C) Both are equally efficient

  • D) The for loop is always faster

Day 2: High-Performance Computing Techniques

Multiple choice, single answer - Cython

What is Cython primarily used for?

  • A) Writing web servers

  • B) Compiling Python code to C for performance improvement

  • C) Creating interactive plots

  • D) Managing data in databases

What type hint syntax does Cython use to optimize code?

  • A) Python type hints only

  • B) C-style type declarations like cdef int x

  • C) Fortran-style declarations

  • D) All of the above

Multiple choice, single answer - Dask

What is Dask designed for?

  • A) Creating dashboards

  • B) Parallel computing with out-of-core data processing

  • C) Building web applications

  • D) Creating databases

How does Dask differ from NumPy?

  • A) Dask is only for statistics

  • B) Dask handles larger-than-memory datasets with lazy evaluation

  • C) Dask requires GPU acceleration

  • D) Dask is slower for all operations

What is lazy evaluation in Dask?

  • A) Dask makes decisions slowly

  • B) Computations are deferred until explicitly requested

  • C) Data is loaded one row at a time

  • D) Results are cached indefinitely

Multiple choice, single answer - Numba

What does Numba do with Python functions?

  • A) Converts them to Java bytecode

  • B) Compiles them to machine code using LLVM

  • C) Makes them run in parallel on your CPU

  • D) Both B and C

Which decorator enables Numba JIT compilation?

  • A) @compile

  • B) @jit

  • C) @numba.compile

  • D) @fast

Multiple choice, single answer - SLURM & HPC Scheduling

What is SLURM?

  • A) A storage system for large files

  • B) A job scheduler and resource manager for HPC clusters

  • C) A Python library for data processing

  • D) A protocol for network communication

What information does an sbatch script typically specify?

  • A) Job name, number of tasks, time limit, and computation commands

  • B) Only the Python version

  • C) Only the memory allocation

  • D) Only the machine hostname

What does the #SBATCH directive do in a submission script?

  • A) It creates a comment that is ignored

  • B) It specifies job parameters for the resource manager

  • C) It imports the SBATCH module

  • D) It runs the script immediately

Multiple choice, single answer - Containerization

What is containerization used for in HPC?

  • A) Storing data between runs

  • B) Creating reproducible, isolated computing environments

  • C) Running only Python code

  • D) Replacing virtual machines

What is Apptainer (formerly Singularity)?

  • A) A Python package manager

  • B) A container platform for HPC environments

  • C) A job scheduler

  • D) A plotting library

What is the advantage of using containers for reproducibility?

  • A) Code runs faster

  • B) Same dependencies and environment can be deployed anywhere

  • C) It reduces code complexity

  • D) Containers eliminate the need for testing

Coding Challenges

Coding questions - Day 1

Python Essentials & Performance Measurement

  1. Write a Python script that:

    • Creates a list of 1 million integers

    • Times how long it takes to double each element using a for loop

    • Compares it to a NumPy array of the same size

    • Report the speedup factor

  2. Create a function that:

    • Takes a dictionary of package names and versions as input

    • Outputs a requirements.txt formatted list

    • Handles cases where versions are not specified (use “latest”)

  3. Implement a simple profiler that:

    • Measures the time spent in each function of a small program

    • Outputs the results in descending order of time

    • (Can use cProfile or manual timing)

Virtual Environments & Module Loading

  1. Write a bash script that:

    • Creates a Python virtual environment

    • Installs three common scientific packages (NumPy, SciPy, Matplotlib)

    • Activates the environment and runs a simple Python script

Coding questions - Day 2

NumPy Performance

  1. Given two large 2D arrays A and B:

    • Compute their element-wise product using vectorization

    • Benchmark vs. using nested for loops

    • Calculate memory usage for both approaches

  2. Create a function that:

    • Reads a CSV file with numerical data

    • Fills missing values with column means

    • Applies z-score normalization to each column

    • Returns the cleaned array

Cython Optimization

  1. Write a slow Python function that computes Fibonacci numbers recursively, then:

    • Create a Cython version with type hints

    • Compare execution time for computing fib(30)

    • Measure the speedup achieved

Dask & Parallel Processing

  1. Using Dask:

    • Create a large synthetic dataset (cannot fit in memory)

    • Compute groupby aggregations partitioned across chunks

    • Compare computation time with pandas (if dataset fits in memory)

Numba JIT Compilation

  1. Write a function that performs Monte Carlo Pi estimation:

    • Generate random points in a unit square

    • Count points inside a quarter circle

    • Implement both regular Python and Numba JIT versions

    • Report the speedup and estimated Pi value

HPC & Job Scheduling

  1. Create an SBATCH submission script that:

    • Requests 4 CPU cores for 10 minutes

    • Loads necessary modules

    • Runs a Python script that uses NumPy and timing

    • Writes output to a log file with timestamp