Schedule for CMU 10-405/10-605

Date	Class Type	Topic	Resources	Announcements
Mon Jan 13, 2025	Lecture	Overview / Slides (pdf) / Slides (pptx) / Recording	William W. Cohen (1993). Efficient pruning methods for separate-and-conquer rule learning systems Banko, Michele, and Eric Brill (2001). Scaling to very very large corpora for natural language disambiguation Google NGrams Viewer Norvig, Pereira, Halevy (2009). The Unreasonable Effectiveness of Data Yoshua Bengio (2009). Learning Deep Architectures for AI Sample code from lecture for doing ngram queries Hoffman et al (2022). Training Compute-Optimal LLMs
Wed Jan 15, 2025	Lecture	Map-Reduce and Spark / Slides (pdf) / Slides (pptx) / Recording	Visualizing the cost of operations Historical cost of storage Bash Reduce - minimal implementation of Map-Reduce	HW 1 out - Entity Resolution and Naive Bayes in Spark
Fri Jan 17, 2025	Recitation	Recitation: PySpark / Slides (pdf) / Notebook / Recording
Wed Jan 22, 2025	Lecture	Workflows for Map-Reduce Systems / Slides (pdf) / Slides (pptx) / Recording	Ken Church (1994). Unix for Poets Demo code from lecture (all in one file) Code from lecture for doing wordcounts in Spark Code for phrase-finding using Spark A Language Model Approach to Keyphrase Extraction Code for slow implementation of PageRank Faster implementation of PageRank Fastest implementation of PageRank (v2)
Fri Jan 24, 2025	Recitation	Recitation: Linear Algebra Review / Slides (pdf) / Notebook / Recording
Mon Jan 27, 2025	Lecture	Learning as Optimization 1 / Slides (pdf) / Slides (pptx) / Recording	Previous class lecture on Sept 9 2024 Previous class lecture on Sept 11 2024
Wed Jan 29, 2025	Lecture	Learning as Optimization 2 / Slides (pdf) / Slides (pptx) / Recording	Previous class lecture on Sept 16 2024 William's notes on SGD for Logistic Regression and Sparsity Hash Kernels, PMLR 2009 Feature hashing for large scale multitask learning, ICML 2009	HW 2 out - Parallel Linear Regression in Spark HW 1 due
Fri Jan 31, 2025	Recitation	No Recitation
Mon Feb 3, 2025	Lecture	Learning as Optimization 3 / Slides (pdf) / Slides (pptx) / Recording	Hogwild, A Lock-Free Approach to Parallelizing Stochastic Gradient Descent, Rech et al, 2011 FastText classification library Bag of Tricks for Efficient Text Classification, Joulin et al 2016 Communication-Efficient Distributed Deep Learning, Tang et al 2023 Distributed Training Strategies for the Structured Perceptron, McDonald et al, 2010 Large-scale matrix factorization with distributed stochastic gradient descent - Gemulla et al 2011
Wed Feb 5, 2025	Lecture	Randomized Algorithms 1 - Bloom Filters and Count-Min Sketches / Slides (pdf) / Slides (pptx) / Recording	William's Notes on Randomized Algorithms Demo Bloom filter implementation in Python {'Short and Deep': 'Sketching and Neural Networks, Daniely et al'} Code from lecture for bloom filter implementation Sketch Algorithms for Estimating Point Queries in NLP - Goyal et al 2012 Previous course lecture, 9/25/2024 Sketch Algorithms for Estimating Point Queries in NLP, 2012
Fri Feb 7, 2025	Recitation	Recitation: Probability for LSH / Slides (pptx) / Recording
Mon Feb 10, 2025	Lecture	Randomized Algorithms 2 - Locality Sensitive Hashing / Slides (pdf) / Slides (pptx) / Recording	Fast Exact Search in Hamming Space with Multi-Index Hashing, 2014 Accelerating Large-Scale Inference with Anisotropic Vector Quantization, 2020 The FAISS library, 2024	HW 3 out - Locality-Sensitive Hashing
Wed Feb 12, 2025	Lecture	Countmin Application / AutoDiff / Slides (pdf) / Slides (pptx) / Recording	Faithful KB Embeddings - Sun et al, 2020 A simple explanation of reverse-mode automatic differentiation, Justin Domke Sample code for autodiff with Wengert lists Sample code for checking autodiff implementation with PyTorch
Fri Feb 14, 2025	Recitation	Recitation: AWS / Slides (pptx) / Recording
Mon Feb 17, 2025	Lecture	Parallel Optimization with GPUs 1 / Slides (pdf) / Slides (pptx) / Recording	Mythbusters explain GPUs GFlops / dollar up to 2022 Benchmarking GPUs in PyTorch on a Mac Fall 2024 10-605 lecture on ML hardware Code for comparing GPU vs CPU times in PyTorch
Wed Feb 19, 2025	Lecture	Parallel Optimization with GPUs 2 / Slides (pdf) / Slides (pptx) / Recording	Previous class lecture on Nov 4 2024 Previous class lecture on Oct 21 2024
Fri Feb 21, 2025	Recitation	Recitation: Practice Exam		HW 3 due
Mon Feb 24, 2025	Lecture	Exam Review and QA / Slides (pdf) / Slides (pptx) / Recording
Wed Feb 26, 2025	Lecture	Exam 1
Mon Mar 10, 2025	Lecture	Deep Learning: Background and Architectures / Slides (pdf) / Slides (pptx) / Recording	online book on deep networks Understanding the difficulty of training deep feedforward neural networks, Glorot and Bengio, AIStats 2010 Convolution demo Deep Residual Learning for Image Recognition, He et al, CVPR 2016 Efficient Estimation of Word Representations in Vector Space, Mikolov et al, 2013 Visualize Word2Vec Karpathy blog post on LSTMs NMT by jointly learning to align and translate, 2016	HW: LoRA
Wed Mar 12, 2025	Lecture	Deep Learning: Transformers / Slides (pdf) / Slides (pptx) / Recording	The Annotated Transformer LLMs from scratch book on Github A bit like the annotated transformer but with more depth in some areas Dropout paper, Srinivasta et al JMLR 2014 LayerNorm paper, Ba et al, 2016. Neural Machine Translation of Rare Words with Subword Units, Sennrich et al 2016 Huggingface's guide to BPE and similar An Image is Worth 16x16 Words: Transformers for Image Recognition, ICLR 2021 MuRag Multimodal Retrieval-Augmented Generation ..., Chen et al 2022
Fri Mar 14, 2025	Recitation	Recitation: PyTorch / HW4 / Notebook / Recording
Mon Mar 17, 2025	Lecture	Guest lecture - Greg Kochanski, Google Deep Mind, and then Transformer Tokenization / Slides (pdf) / Slides (pptx) / Recording	PDF of Greg's slide deck Neural Machine Translation of Rare Words with Subword Units, Sennrich et al 2016 Huggingface's guide to BPE and similar An Image is Worth 16x16 Words: Transformers for Image Recognition, ICLR 2021 MuRag Multimodal Retrieval-Augmented Generation ..., Chen et al 2022
Wed Mar 19, 2025	Lecture	Hyperparameter parameter search and other topics / Slides (pdf) / Slides (pptx) / Recording	João Lages blog post Hoffman et al (2022). Training Compute-Optimal LLMs (Mis)fitting - a study of scaling laws, ICLR 2025 Hyperband - A Novel Bandit Based Approach to Hyperparameter Optimization, JMLR 2018 Distilling the Knowledge in a Neural Network, Hinton, Vinyals, Dean, 2015 Model Compression, Bucila et al, 2006 Raschka blog post, 2023 Lei Mao blog post, 2023 LLM.int8(), Dettmers et al 2022 Dettmers and Zettlemoyer 2023 PV-Tuning, Malinovskii et al, 2024
Fri Mar 21, 2025	Recitation	Recitation - HW5 / Slides (pdf) / Recording
Mon Mar 24, 2025	Lecture	Quantization and Pruning for LLMs / Slides (pdf) / Slides (pptx) / Recording	Raschka blog post, 2023 Lei Mao blog post, 2023 LLM.int8(), Dettmers et al 2022 Dettmers and Zettlemoyer 2023 PV-Tuning, Malinovskii et al, 2024 The cat neuron paper Learning to Generate Reviews and Discovering Sentiment, Radford et al 2017 To prune or not to prune ..., Zhu and Gupta 2017 A simple and effective approach ... (the Wanda paper), Sun et al Accelerating Sparse Deep Neural Networks, Mishra et al 2021 LLM Pruner, 2023 Shortened Llama, ... Kim et al 2024 Sheared LLama ... , Xia et al, 2024 Han et al 2016	HW: GPT-2 Training and Miniproject released
Wed Mar 26, 2025	Lecture	KV-Caching for LLMs / Slides (pdf) / Slides (pptx) / Recording	LLM Pruner, 2023 Shortened Llama, ... Kim et al 2024 Sheared LLama ... , Xia et al, 2024 Han et al 2016 ETC: Encoding Long and Structured Inputs in Transformers, Ainsle et al 2020 {'Big Bird': 'Transformers for Longer Sequences, Zaheer et al, 2020'} Reformer: the Efficient Transformer, Kitaev et al, 2020 H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models, Zhang et al 2023 Efficient streaming LMs with attention sinks, Xiao et al 2024 SnapKV: LLM Knows What You are Looking for Before Generation, Li et al 2024
Fri Mar 28, 2025	Recitation	Recitation - Miniproject / Slides (pdf) / Slides (pptx) / Recording
Mon Mar 31, 2025	Lecture	Guest lecture - Haitian Sun, Apple / Slides (pdf) / Recording
Wed Apr 2, 2025	Lecture	KV Caching and Model Compression Recap / Slides (pdf) / Slides (pptx) / Recording	SnapKV: LLM Knows What You are Looking for Before Generation, Li et al 2024 Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs, Ge et al 2023
Mon Apr 7, 2025	Lecture	Retrieval Augmented Generation 1 / Slides (pdf) / Slides (pptx) / Recording	DeepHash paper, 2017 DPR paper, 2020 Condenser paper EntityQuestions paper ColBERT paper MoCo paper Contreiver paper MoCo paper	Previous HW due
Wed Apr 9, 2025	Lecture	Retrieval Augmented Generation 2 / Slides (pdf) / Slides (pptx) / Recording	HyDE paper LameR paperx Echo embedding paper PromptEOL paper RAG for knowledge-intensive NLP tasks, 2020 REALM paper, 2020 Fusion in decoder paper, 2021 FiDO paper, 2023 Shazeer paper on optimizing decoder inference with multihead queries LUMEN paper, 2023 GLIMMER paper, 2023 Parallel context windows, 2023 Block-attention for RAG, 2024 Dynamic block-sparse attention for ICL, 2025 FiDO paper Shazeer paper analyzing operational intensity LUMEN paper GLIMMER paper Parallel Context Windows, 2023 Block-Attention for Efficient RAG, 2024 Dynamic Block-Sparse Attention paper, 2025
Fri Apr 11, 2025	Recitation	No Recitation
Mon Apr 14, 2025	Lecture	RAG3 and Embarrasingly Parallel Training / Slides (pdf) / Slides (pptx) / Recording	Parallel Context Windows, 2023 Block-Attention for Efficient RAG, 2024 Dynamic Block-Sparse Attention paper, 2025 Branch-train-merge paper Branch-train-mix paper
Wed Apr 16, 2025	Lecture	Embarassingly Parallel Train 2 and Scalable Evaluation / Slides (pdf) / Slides (pptx) / Recording	Levy and Goldberg demystification of word2vec (1/2) 2014 Levy and Goldberg demystification of word2vec (2/2) 2014 Task arithmetic paper, 2023 Weight disentanglement and neural tangent kernel paper, 2023 TIES-merging paper, 2023 LoRA hub paper, 2024 PHATGOOSE paper, 2024 Information cascades, Ch 16, Easly and Kleinberg, 2010 Salganic, Dodds, Watts, Science 2006 Schelling Spatial Segregation model Prediction Powered Inference, Angelopolous 2023, Science Bayesian PPI, Hofer et al 2024 Kamaloo et al, 2023 Fisch et al, 2024
Fri Apr 18, 2025	Recitation	Recitation - Practice Exam
Mon Apr 21, 2025	Lecture	Exam Review and QA / Slides (pdf) / Slides (pptx) / Recording		Miniproject due
Wed Apr 23, 2025	Lecture	Exam 2

10-405/10-605: ML with Large Datasets, Spring 2025

Schedule (subject to change)