← Back
let's make an LLM part 1 'overview'
Sorry, the title was clickbait.
We won't be anywhere NEAR making an LLM for a while.
We first need to construct a machine learning framework.
The framework will be written in C++ with no dependencies on PyTorch, TensorFlow, or any other ML library.
The only dependencies I will be using are GoogleTest for testing, pybind11 for Python bindings, and the CUDA toolkit for the GPU backend.
All you need is just basic programming proficiency in C++ to follow along.
Preview
- Part 02 - Storage and Tensor: Strides, transpose
- Part 03 - Ops: Broadcasting rules, naive matmul (matrix multiplication), gradient checker
- Part 04 - Autograd:
- Part 05 - Optimizing matmul:
- Part 06 - Neural network modules and MNIST: Linear, Adam, cross-entropy
- Part 07 - Transformer: Attention, layernorm, GPT
- Part 08 - CUDA: Kernels
- Part 09 - Cleaning up:
- Part 10 - Training Shakespeare:
- Part 11 - Tokenizer and data pipeline:
- Part 12 - More modern architecture choices and mixed precision:
- Part 13 - Flash Attention:
- Part 14 - KV cache, sampling, and eval harness:
- Part 15 - Training recipe and a real run:
- Part 16 - Fused kernels:
- Part 17 - Flash Attention:
- Part 18 - Tensor core matmul:
- Part 19 - Ring all-reduce and DDP:
- Part 20 - Activation checkpointing:
- ???
The actual code is at github.com/dnexdev/tiramisu.