Tutorial Course

COMP 310 · Fortran and the AI Stack

Led by John Backus Simulacrum

5 modules ~12 hours Computing Updated 6 days ago

Every matrix multiplication in PyTorch calls BLAS. BLAS is Fortran. This course connects the formula Backus designed in 1957 to the tensor operation that trains neural networks today.

If you found this course useful, consider becoming a patron and supporter. Support Universitas Scholarium →

Module 1

The Connection

Led by John Backus Simulacrum, with Molerian Matrix Computation Simulacrum (guest)

The question
When you call `numpy.dot(A, B)`, what actually executes? The answer is DGEMM — a Fortran subroutine. How deep does the connection between 1957 Fortran and 2024 ML go?

Outcome
The student can trace NumPy/PyTorch operations to BLAS calls and understands column-major vs row-major layout. (Analytical)
Sub-units
1. ○ 1.1 What NumPy Actually Calls
2. ○ 1.2 Memory Layout and the 1957 Connection
Module 2

Writing Fast Numerical Kernels

Led by John Backus Simulacrum, with Seymour Cray Simulacrum (guest for cache/vectorisation)

The question
Fortran is fast because it makes three promises to the compiler that C cannot. What are they, and how do you write code that exploits them?

Outcome
The student can write and benchmark matrix operations, use DO CONCURRENT, and read compiler vectorisation reports. (Advanced)
Sub-units
1. ○ 2.1 The Matrix Multiply and Why Loop Order Matters
2. ○ 2.2 DO CONCURRENT and Vectorisation
Module 3

Interfacing with Python and the ML Stack

Led by John Backus Simulacrum

The question
The modern ML stack is Python on top and Fortran underneath. How do you build the bridge between them?

Outcome
The student can write Fortran kernels, call them from Python, and make informed decisions about when custom Fortran is justified vs existing libraries. (Practical)
Sub-units
1. ○ 3.1 f2py and ctypes
2. ○ 3.2 Performance Measurement and Decision-Making
Module 4

HPC Foundations

Led by Seymour Cray Simulacrum (guest lead), with John Backus Simulacrum

The question
Supercomputers run Fortran because Fortran runs on supercomputers. How do you parallelise numerical code with OpenMP, MPI and coarrays?

Outcome
The student can parallelise loops with OpenMP, understand MPI for distributed computation, and has seen Fortran's native coarray parallelism. (Advanced)
Sub-units
1. ○ 4.1 Compiler Optimisation and OpenMP
2. ○ 4.2 MPI, Clusters and Coarrays
Module 5

Case Studies

Led by John Backus Simulacrum, with Kahanian Numerical Precision Simulacrum (guest)

The question
Weather prediction, molecular dynamics, neural networks — where does Fortran actually run, and why does precision matter?

Outcome
The student has written a neural network forward pass in Fortran, profiled real code, and understands the precision trade-offs in scientific and ML computation. (Project)
Sub-units
1. ○ 5.1 Weather, Molecular Dynamics and the Neural Network
2. ○ 5.2 Precision, Profiling and Optimisation

COMP 310 · Fortran and the AI Stack

The Connection

Writing Fast Numerical Kernels

Interfacing with Python and the ML Stack

HPC Foundations

Case Studies