Schedule
Given the pace of innovations in this area, the following list is subject to change.
Color Legend: Presenter Reviewer Scriber
Introduction
- Jan 8
- Course Introduction
- Anand
- π How to Read a Paper
- π How to Give a Bad Talk
- π Writing Reviews for Systems Conferences
- Paper Presentation Preferences Fill out the form here
- Jan 10
- Overview of Challenges
- Anand
- π Challenges and Applications of Large Language Models
- π Understanding LLMs: A Comprehensive Overview from Training to Inference
- Jan 12
- Paper Presentation Preferences Due
- Jan 15
- No class Martin Luther King, Jr. Day
Basics of LLMs
- Jan 17
- Transformers
- π Attention Is All You Need
- π Huggingface Transformers Course
- π Improving Language Understanding by Generative Pre-Training (GPT) Required
- Amitrajit Rohan Huayi
- π Letβs build GPT: from scratch, in code, spelled out
- π The Illustrated Transformer Required
- Ganesh Aniruddha Aayush
- π The Transformer Family Version 2.0
- Jan 22
- Diffusion
- π Denoising Diffusion Probabilistic Models
- π High-Resolution Image Synthesis with Latent Diffusion Models Required
- Sashankh Azeez Vitaly
- π Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers
- π The Illustrated Stable Diffusion Required
- Abhimanyu Arpan Zeyu
- Jan 24
- Multimodality
- π Flamingo: a Visual Language Model for Few-Shot Learning
- π Multimodality and Large Multimodal Models Required
- Alexander Rishi Jingli
- π Visual Instruction Tuning Required
- Devashish Apoorva Mohit
Project
- Jan 28
- Project Groups Formation Due
- Jan 29
- Project Ideas
- Anand
- π How to Write a Great Research Paper
- π Hints and Principles for Computer System Design
Pre-training
- Jan 31
- Training Approaches for Large Models
- π Fully Sharded Data Parallel: faster AI training with fewer GPUs Required
- Mingzheng Mithilesh Shivashankar
- π PaLM: Scaling Language Modeling with Pathways
- π Pathways: Asynchronous Distributed Dataflow for ML Required
- Rajveer Sera Ziyuan
- π PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel
- Feb 5
- Automating Parallelization Techniques for Large Model Training
- π Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning Required
- Aniruddha Aayush Vima
- π Efficient large-scale language model training on GPU clusters using megatron-LM Required
- Aditya Shubham Amitrajit
- π GSPMD: General and Scalable Parallelization for ML Computation Graphs
- Feb 7
- Extreme Scale Training
- π GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
- π Overlap Communication with Dependent Computation via Decomposition in Large Deep Learning Models Required
- Rohan Kartik Abhimanyu
- π ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning Required
- Divya Ganesh Chinmay
- Feb 9
- Project Proposal Due
Fine-tuning
- Feb 12
- Making Large Models Useful with Alignment
- π Aligning Large Language Models with Human: A Survey
- π LoRA: Low-Rank Adaptation of Large Language Models Required
- Mohit Alexander Sashankh
- π The Power of Scale for Parameter-Efficient Prompt Tuning
- π Training Language Models to Follow Instructions with Human Feedback Required
- Apoorva Aditya Zachary
- Feb 14
- Is Alignment Really Useful?
- π Finetuned Language Models are Zero-Shot Learners Required
- Arpan Rajveer Devashish
- π Large Language Models are Zero-Shot Reasoners
- π LIMA: Less Is More for Alignment Required
- Jingli Huayi Mithilesh
Retrieval & Augmentation
- Feb 19
- Information retrieval
- π ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction
- π Improving Language Models by Retrieving from Trillions of Tokens Required
- Shivashankar Vitaly Mingzheng
- π REALM: Retrieval-Augmented Language Model Pre-Training Required
- Aayush Abhimanyu Sera
- π Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
- π Shall We Pretrain Autoregressive Language Models with Retrieval? A Comprehensive Study
- Feb 21
- Accommodating Longer Contexts
- π Extending Context Window of Large Language Models via Positional Interpolation
- π Lost in the Middle: How Language Models Use Long Contexts Required
- Rishi Divya Azeez
- π MemGPT: Towards LLMs as Operating Systems Required
- Ziyuan Amitrajit Aniruddha
Inference
- Feb 26
- State-of-The-Art Inference Approaches
- π DeepSpeed-Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale
- π Efficient Memory Management for Large Language Model Serving with PagedAttention Required
- Zachary Sashankh Shubham
- π Orca: A Distributed Serving System for Transformer-Based Generative Models Required
- Kartik Chinmay Rohan
- Feb 28
- Removing Inefficiencies
- π DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference Required
- Azeez Devashish Ganesh
- π SARATHI: Efficient LLM Inference by Piggybacking Decodes with Chunked Prefills Required
- Aditya Vima Apoorva
- Mar 4
- Techniques to Accelerate Decoding
- π Accelerating Large Language Model Decoding with Speculative Sampling
- π Breaking the Sequential Dependency of LLM Inference Using Lookahead Decoding Required
- Huayi Shivashankar Alexander
- π Fast Inference from Transformers via Speculative Decoding Required
- Vitaly Ziyuan Rajveer
- Mar 6
- Optimizations for Large Model Inference
- π Efficiently Scaling Transformer Inference
- π FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness Required
- Mithilesh Mingzheng Arpan
- π FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU Required
- Sera Mohit Divya
- π Full Stack Optimization of Transformer Inference: a Survey
- π LLM in a flash: Efficient Large Language Model Inference with Limited Memory
Project
- Mar 11
- Mid-Semester Project presentations
- Mar 13
- Mid-Semester Project presentations
- Mar 18
- No class Spring break
- Mar 20
- No class Spring break
Special Topics
- Mar 25
- Mixture-of-Experts
- π DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale
- π Fast Inference of Mixture-of-Experts Language Models with Offloading Required
- Vima Jingli Aditya
- π Janus: A Unified Distributed Training Framework for Sparse Mixture-of-Experts Models
- π MegaBlocks: Efficient Sparse Training with Mixture-of-Experts Required
- Shubham Mithilesh Kartik
- π Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
- π Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
- π Tutel: Adaptive Mixture-of-Experts at Scale
- Mar 27
- Model Compression
- π Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
- π AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration Required
- Chinmay Zachary Rishi
- π GPT3.int8(): 8-bit Matrix Multiplication for Transformers at Scale Required
- Aniruddha Mohit Abhimanyu
- π GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers
- π QLORA: Efficient Finetuning of Quantized LLMs
- π SqueezeLLM: Dense-and-Sparse Quantization
- Apr 1
- Dynamism in Large Models
- π Apparate: Rethinking Early Exits to Tame Latency-Throughput Tensions in ML Serving
- π Confident Adaptive Language Modeling Required
- Arpan Sera Ziyuan
- π Optimizing Dynamic Neural Networks with Brainstorm Required
- Vima Zachary Huayi
- Apr 3
- Legal & Ethical Considerations
- π Ethical and social risks of harm from Language Models Required
- Rohan Kartik Aayush
- π Foundation Models and Fair Use
- π On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? π¦ Required
- Rajveer Shivashankar Mingzheng
- Apr 8
- Class Canceled Solar Eclipse
- Apr 10
- Security Implications
- π Extracting Training Data from Diffusion Models
- π Extracting Training Data from Large Language Models Required
- Sashankh Apoorva Azeez
- π Identifying and Mitigating the Security Risks of Generative AI Required
- Alex Rishi Chinmay
Conclusion
- Apr 10
- Course Wrap-up
- Anand
- April 15
- Final Project presentations
- Apr 17
- Final Project presentations
- Apr 22
- Final Project presentations
- Apr 26
- Final Project Report + Code Due