10-605/10-805: ML with Large Datasets, Fall 2025

Schedule

This schedule for 10-605/805 is subject to change.
Date Class Type Topic Resources Announcements
Mon Aug 25, 2025 Lecture Overview / Slides (pdf) / Slides (pptx) / Recording
  • William W. Cohen (1993). Efficient pruning methods for separate-and-conquer rule learning systems
  • Banko, Michele, and Eric Brill (2001). Scaling to very very large corpora for natural language disambiguation
  • Google NGrams Viewer
  • Norvig, Pereira, Halevy (2009). The Unreasonable Effectiveness of Data
  • Yoshua Bengio (2009). Learning Deep Architectures for AI
  • code for doing ngram queries
  • Hoffman et al (2022). Training Compute-Optimal LLMs
HW 1 out - Entity Resolution and Naive Bayes in Spark
Wed Aug 27, 2025 Lecture Map-Reduce and Spark / Slides (pdf) / Slides (pptx) / Recording
  • Visualizing the cost of operations
  • Historical cost of storage
  • code for Bash Reduce - a minimal implementation of Map-Reduce
  • code for Hazsoup - a more readable minimal implementation of Map-Reduce
Fri Aug 29, 2025 Recitation Recitation: PySpark
Wed Sep 3, 2025 Lecture Workflows for Map-Reduce Systems
  • code for sample workflows in Spark
  • code for a minimal map-reduce, without distributed processing, using Spark syntax
  • A Language Model Approach to Keyphrase Extraction
Fri Sep 5, 2025 Recitation Recitation: Linear Algebra Review
Mon Sep 8, 2025 Lecture Learning as Optimization 1
  • Previous class lecture on Sept 11 2024
Wed Sep 10, 2025 Lecture Learning as Optimization 2
  • Previous class lecture on Sept 16 2024
  • William's notes on SGD for Logistic Regression and Sparsity
  • Hash Kernels, PMLR 2009
  • Feature hashing for large scale multitask learning, ICML 2009
HW 2 out - Parallel Linear Regression in Spark
HW 1 due
Fri Sep 12, 2025 Recitation HW1 Writing Session
Mon Sep 15, 2025 Lecture Learning as Optimization 3
  • Hogwild, A Lock-Free Approach to Parallelizing Stochastic Gradient Descent, Rech et al, 2011
  • FastText classification library
  • Bag of Tricks for Efficient Text Classification, Joulin et al 2016
  • Communication-Efficient Distributed Deep Learning, Tang et al 2023
  • Distributed Training Strategies for the Structured Perceptron, McDonald et al, 2010
  • Large-scale matrix factorization with distributed stochastic gradient descent - Gemulla et al 2011
Wed Sep 17, 2025 Lecture Randomized Algorithms 1 - Bloom Filters and Count-Min Sketches
  • William's Notes on Randomized Algorithms
  • code for Bloom Filter
  • Short and Deep: Sketching and Neural Networks, Daniely et al
  • Sketch Algorithms for Estimating Point Queries in NLP, 2012
  • Previous course lecture, 9/25/2024
Fri Sep 19, 2025 Recitation Recitation: Probability for LSH
Mon Sep 22, 2025 Lecture Randomized Algorithms 2 - Locality Sensitive Hashing
  • Fast Exact Search in Hamming Space with Multi-Index Hashing, 2014
  • Accelerating Large-Scale Inference with Anisotropic Vector Quantization, 2020
  • The FAISS library, 2024
HW 3 out - Locality-Sensitive Hashing
Wed Sep 24, 2025 Lecture HW2 Writing Session / AutoDiff
  • A simple explanation of reverse-mode automatic differentiation, Justin Domke
  • code for autodiff with Wengert lists
Fri Sep 26, 2025 Recitation Recitation: AWS
Mon Sep 29, 2025 Lecture Parallel Optimization with GPUs 1
  • Mythbusters explain GPUs
  • GFlops / dollar up to 2022
  • code for benchmarking GPUs
  • Fall 2024 10-605 lecture on ML hardware
Wed Oct 1, 2025 Lecture Parallel Optimization with GPUs 2
  • Previous class lecture on Nov 4 2024
  • Previous class lecture on Oct 21 2024
Fri Oct 3, 2025 Recitation Recitation: Practice Exam
HW 3 due
Mon Oct 6, 2025 Lecture Exam Review and QA
Wed Oct 8, 2025 Lecture Exam 1
Fri Oct 10, 2025 Recitation No Recitation
Mon Oct 20, 2025 Lecture Deep Learning: Background and Architectures
  • online book on deep networks
  • Understanding the difficulty of training deep feedforward neural networks, Glorot and Bengio, AIStats 2010
  • Convolution demo
  • Deep Residual Learning for Image Recognition, He et al, CVPR 2016
  • Efficient Estimation of Word Representations in Vector Space, Mikolov et al, 2013
  • Visualize Word2Vec
  • Karpathy blog post on LSTMs
  • NMT by jointly learning to align and translate, 2016
HW: TBD
Wed Oct 22, 2025 Lecture Deep Learning: Transformers
  • The Annotated Transformer
  • LLMs from scratch book on Github
  • A bit like the annotated transformer but with more depth in some areas
  • Dropout paper, Srinivasta et al JMLR 2014
  • LayerNorm paper, Ba et al, 2016.
  • Neural Machine Translation of Rare Words with Subword Units, Sennrich et al 2016
  • Huggingface's guide to BPE and similar
  • An Image is Worth 16x16 Words: Transformers for Image Recognition, ICLR 2021
  • MuRag Multimodal Retrieval-Augmented Generation ..., Chen et al 2022
Fri Oct 24, 2025 Recitation Recitation: PyTorch / HW4 / Notebook / Recording
Mon Oct 27, 2025 Lecture Guest lecture - Greg Kochanski, Google Deep Mind, and then Transformer Tokenization
  • PDF of Greg's slide deck
  • Neural Machine Translation of Rare Words with Subword Units, Sennrich et al 2016
  • Huggingface's guide to BPE and similar
  • An Image is Worth 16x16 Words: Transformers for Image Recognition, ICLR 2021
  • MuRag Multimodal Retrieval-Augmented Generation ..., Chen et al 2022
Wed Oct 29, 2025 Lecture Hyperparameter parameter search and other topics
  • João Lages blog post
  • Hoffman et al (2022). Training Compute-Optimal LLMs
  • (Mis)fitting - a study of scaling laws, ICLR 2025
  • Hyperband - A Novel Bandit Based Approach to Hyperparameter Optimization, JMLR 2018
  • Distilling the Knowledge in a Neural Network, Hinton, Vinyals, Dean, 2015
  • Model Compression, Bucila et al, 2006
  • Raschka blog post, 2023
  • Lei Mao blog post, 2023
  • LLM.int8(), Dettmers et al 2022
  • Dettmers and Zettlemoyer 2023
  • PV-Tuning, Malinovskii et al, 2024
Fri Oct 31, 2025 Recitation Recitation - HW5
Mon Nov 3, 2025 Lecture Quantization and Pruning for LLMs
  • Raschka blog post, 2023
  • Lei Mao blog post, 2023
  • LLM.int8(), Dettmers et al 2022
  • Dettmers and Zettlemoyer 2023
  • PV-Tuning, Malinovskii et al, 2024
  • The cat neuron paper
  • Learning to Generate Reviews and Discovering Sentiment, Radford et al 2017
  • To prune or not to prune ..., Zhu and Gupta 2017
  • A simple and effective approach ... (the Wanda paper), Sun et al
  • Accelerating Sparse Deep Neural Networks, Mishra et al 2021
  • LLM Pruner, 2023
  • Shortened Llama, ... Kim et al 2024
  • Sheared LLama ... , Xia et al, 2024
  • Han et al 2016
HW: GPT-2 Training and Miniproject released
Wed Nov 5, 2025 Lecture KV-Caching for LLMs
  • LLM Pruner, 2023
  • Shortened Llama, ... Kim et al 2024
  • Sheared LLama ... , Xia et al, 2024
  • Han et al 2016
  • ETC: Encoding Long and Structured Inputs in Transformers, Ainsle et al 2020
  • Big Bird: Transformers for Longer Sequences, Zaheer et al, 2020
  • Reformer: the Efficient Transformer, Kitaev et al, 2020
  • H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models, Zhang et al 2023
  • Efficient streaming LMs with attention sinks, Xiao et al 2024
  • SnapKV: LLM Knows What You are Looking for Before Generation, Li et al 2024
Fri Nov 7, 2025 Recitation Recitation - Miniproject
Mon Nov 10, 2025 Lecture KV Caching and Model Compression Recap
  • SnapKV: LLM Knows What You are Looking for Before Generation, Li et al 2024
  • Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs, Ge et al 2023
Wed Nov 12, 2025 Lecture Contrastive learning and retrieval - Guest Lecture, John Wieting, Google DeepMind
  • DeepHash paper, 2017
  • DPR paper, 2020
  • Condenser paper
  • EntityQuestions paper
  • ColBERT paper
  • MoCo paper
  • Contreiver paper
Previous HW due
Fri Nov 14, 2025 Recitation No Recitation
Mon Nov 17, 2025 Lecture Optimizing Transformer Architectures - Guest Lecture, Michiel de Jong, Cursor
  • HyDE paper
  • LameR paperx
  • Echo embedding paper
  • PromptEOL paper
  • RAG for knowledge-intensive NLP tasks, 2020
  • REALM paper, 2020
  • Fusion in decoder paper, 2021
  • FiDO paper
  • Shazeer paper on optimizing decoder inference with multihead queries
  • Parallel Context Windows, 2023
  • Block-Attention for Efficient RAG, 2024
  • Dynamic Block-Sparse Attention paper, 2025
  • Shazeer paper analyzing operational intensity
  • LUMEN paper
  • GLIMMER paper
Wed Nov 19, 2025 Lecture RAG3 and Embarrasingly Parallel Training
  • Parallel Context Windows, 2023
  • Block-Attention for Efficient RAG, 2024
  • Dynamic Block-Sparse Attention paper, 2025
  • Branch-train-merge paper
  • Branch-train-mix paper
Fri Nov 21, 2025 Recitation No Recitation
Mon Nov 24, 2025 Lecture Embarassingly Parallel Train 2 and Scalable Evaluation
  • Levy and Goldberg demystification of word2vec (1/2) 2014
  • Levy and Goldberg demystification of word2vec (2/2) 2014
  • Task arithmetic paper, 2023
  • Weight disentanglement and neural tangent kernel paper, 2023
  • TIES-merging paper, 2023
  • LoRA hub paper, 2024
  • PHATGOOSE paper, 2024
  • Information cascades, Ch 16, Easly and Kleinberg, 2010
  • Salganic, Dodds, Watts, Science 2006
  • Schelling Spatial Segregation model
  • Prediction Powered Inference, Angelopolous 2023, Science
  • Bayesian PPI, Hofer et al 2024
  • Kamaloo et al, 2023
  • Fisch et al, 2024
Mon Dec 1, 2025 Lecture Exam Review and QA
Miniproject due
Wed Dec 3, 2025 Lecture Exam 2