10-605/10-805: ML with Large Datasets, Fall 2025

Schedule

This schedule for 10-605/805 is subject to change.
Date Class Type Topic Resources Announcements
Mon Aug 25, 2025 Lecture Overview / Slides (pdf) / Slides (pptx)
Resources
  • William W. Cohen (1993). Efficient pruning methods for separate-and-conquer rule learning systems
  • Banko, Michele, and Eric Brill (2001). Scaling to very very large corpora for natural language disambiguation
  • Google NGrams Viewer
  • Norvig, Pereira, Halevy (2009). The Unreasonable Effectiveness of Data
  • Yoshua Bengio (2009). Learning Deep Architectures for AI
  • code for doing ngram queries
  • Hoffman et al (2022). Training Compute-Optimal LLMs
HW 1 out - Entity Resolution and Naive Bayes in Spark
Wed Aug 27, 2025 Lecture Map-Reduce and Spark / Slides (pdf) / Slides (pptx)
Resources
  • Visualizing the cost of operations
  • Historical cost of storage
  • code for Bash Reduce - a minimal implementation of Map-Reduce
  • code for Hazsoup - a more readable minimal implementation of Map-Reduce
Fri Aug 29, 2025 Recitation Recitation: PySpark / Slides (pdf) / Slides (pptx)
Wed Sep 3, 2025 Lecture Workflows for Map-Reduce Systems / Slides (pdf) / Slides (pptx)
Resources
  • code for sample workflows in Spark
  • code for a minimal map-reduce, without distributed processing, using Spark syntax
  • A Language Model Approach to Keyphrase Extraction
Fri Sep 5, 2025 Recitation Recitation: Linear Algebra Review / Slides (pdf) / Notebook
Mon Sep 8, 2025 Lecture Learning as Optimization 1 / Slides (pdf) / Slides (pptx)
Wed Sep 10, 2025 Lecture Learning as Optimization 2 / Slides (pdf) / Slides (pptx)
Resources
  • Previous class lecture on Sept 11 2024
  • Previous class lecture on Sept 16 2024
HW 2 out - Parallel Linear Regression in Spark
HW 1 due
Fri Sep 12, 2025 Recitation HW1 Writing Session
Mon Sep 15, 2025 Lecture Learning as Optimization 3 / Slides (pdf) / Slides (pptx)
Resources
  • William's notes on SGD for Logistic Regression and Sparsity
  • Hash Kernels, PMLR 2009
  • Feature hashing for large scale multitask learning, ICML 2009
  • Hogwild, A Lock-Free Approach to Parallelizing Stochastic Gradient Descent, Rech et al, 2011
  • FastText classification library
  • Bag of Tricks for Efficient Text Classification, Joulin et al 2016
  • Communication-Efficient Distributed Deep Learning, Tang et al 2023
  • Distributed Training Strategies for the Structured Perceptron, McDonald et al, 2010
  • Large-scale matrix factorization with distributed stochastic gradient descent - Gemulla et al 2011
Wed Sep 17, 2025 Lecture Randomized Algorithms 1 - Bloom Filters and Count-Min Sketches / Slides (pdf) / Slides (pptx)
Resources
  • Agarwal et al, A Reliable Effective Terascale Linear Learning System
  • William's Notes on Randomized Algorithms
  • code for Bloom Filter
  • Short and Deep: Sketching and Neural Networks, Daniely et al
  • Sketch Algorithms for Estimating Point Queries in NLP, 2012
  • Previous course lecture, 9/25/2024
Fri Sep 19, 2025 Recitation Recitation 3: Probability, Evaluation Metrics and Minhashing
Resources
  • Recitation 3 Handout
  • Solutions for Recitation 3
Mon Sep 22, 2025 Lecture Randomized Algorithms 2 - Locality Sensitive Hashing / Slides (pdf) / Slides (pptx)
Resources
  • Fast Exact Search in Hamming Space with Multi-Index Hashing, 2014
  • Accelerating Large-Scale Inference with Anisotropic Vector Quantization, 2020
  • The FAISS library, 2024
HW 3 out - Locality-Sensitive Hashing
Wed Sep 24, 2025 Lecture HW2 Writing Session / AutoDiff / Slides (pdf) / Slides (pptx)
Resources
  • A simple explanation of reverse-mode automatic differentiation, Justin Domke
  • code for autodiff with Wengert lists
Fri Sep 26, 2025 Recitation Recitation: AWS / Slides (pdf) / Slides (pptx)
Mon Sep 29, 2025 Lecture Parallel Optimization with GPUs 1 / Slides (pdf) / Slides (pptx)
Resources
  • Mythbusters explain GPUs
  • GFlops / dollar up to 2022
  • code for benchmarking GPUs
  • Fall 2024 10-605 lecture on ML hardware
Wed Oct 1, 2025 Lecture Parallel Optimization with GPUs 2 / Slides (pdf) / Slides (pptx)
Resources
  • Previous class lecture on Nov 4 2024
  • Previous class lecture on Oct 21 2024
Fri Oct 3, 2025 Recitation Recitation: Practice Exam HW 3 due (updated deadline: Sat 10/4)
Mon Oct 6, 2025 Lecture Exam Review and QA / Slides (pdf) / Slides (pptx)
Wed Oct 8, 2025 Lecture Exam 1
Fri Oct 10, 2025 Recitation No Recitation Miniproject: 10-605 students apply to lead a miniproject
Mon Oct 20, 2025 Lecture Deep Learning: Background and Architectures / Slides (pdf) / Slides (pptx)
Resources
  • online book on deep networks
  • Understanding the difficulty of training deep feedforward neural networks, Glorot and Bengio, AIStats 2010
  • Convolution demo
  • Deep Residual Learning for Image Recognition, He et al, CVPR 2016
  • Efficient Estimation of Word Representations in Vector Space, Mikolov et al, 2013
  • Visualize Word2Vec
  • Karpathy blog post on LSTMs
  • NMT by jointly learning to align and translate, 2016
HW 4 out - LoRA
Wed Oct 22, 2025 Lecture Deep Learning: Transformers / Slides (pdf) / Slides (pptx) Miniproject: open-ended miniproject proposals due on Thursday Oct 23
Fri Oct 24, 2025 Recitation Recitation: PyTorch / HW4 / Notebook
Mon Oct 27, 2025 Lecture Hyperparameter Search / Slides (pdf) / Slides (pptx)
Resources
  • PDF of Greg's slide deck
  • Hoffman et al (2022). Training Compute-Optimal LLMs
  • (Mis)fitting - a study of scaling laws, ICLR 2025
  • Hyperband - A Novel Bandit Based Approach to Hyperparameter Optimization, JMLR 2018
Miniproject: feedback on open-ended miniproject proposals given; proposals are posted for class
Wed Oct 29, 2025 Lecture Model compression / Slides (pdf) / Slides (pptx)
Resources
  • Distilling the Knowledge in a Neural Network, Hinton, Vinyals, Dean, 2015
  • Model Compression, Bucila et al, 2006
  • Raschka blog post, 2023
  • Lei Mao blog post, 2023
  • LLM.int8(), Dettmers et al 2022
  • Dettmers and Zettlemoyer 2023
  • PV-Tuning, Malinovskii et al, 2024
  • The cat neuron paper
  • Learning to Generate Reviews and Discovering Sentiment, Radford et al 2017
  • To prune or not to prune ..., Zhu and Gupta 2017
  • A simple and effective approach ... (the Wanda paper), Sun et al
  • Accelerating Sparse Deep Neural Networks, Mishra et al 2021
  • LLM Pruner, 2023
  • Shortened Llama, ... Kim et al 2024
  • Sheared LLama ... , Xia et al, 2024
  • Han et al 2016
Fri Oct 31, 2025 Recitation Recitation - HW5
Mon Nov 3, 2025 Lecture Embarrassingly Parallel Training / Slides (pdf) / Slides (pptx)
Resources
  • Branch-train-merge paper
  • Branch-train-mix paper
  • Levy and Goldberg demystification of word2vec (1/2) 2014
  • Levy and Goldberg demystification of word2vec (2/2) 2014
  • Task arithmetic paper, 2023
  • Weight disentanglement and neural tangent kernel paper, 2023
  • TIES-merging paper, 2023
  • LoRA hub paper, 2024
  • PHATGOOSE paper, 2024
HW 5 out - GPT-2 Training
Miniproject: Structured miniproject details released, project teams formed, and declaration form due
Wed Nov 5, 2025 Lecture KV-Caching for LLMs / Slides (pdf) / Slides (pptx)
Resources
  • ETC: Encoding Long and Structured Inputs in Transformers, Ainsle et al 2020
  • Big Bird: Transformers for Longer Sequences, Zaheer et al, 2020
  • Reformer: the Efficient Transformer, Kitaev et al, 2020
  • H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models, Zhang et al 2023
  • Efficient streaming LMs with attention sinks, Xiao et al 2024
  • SnapKV: LLM Knows What You are Looking for Before Generation, Li et al 2024
Miniproject: structured miniproject survey due
Fri Nov 7, 2025 Recitation Recitation - Structured miniproject / Slides (pdf)
Mon Nov 10, 2025 Lecture KV Caching and Model Compression Recap
Resources
  • SnapKV: LLM Knows What You are Looking for Before Generation, Li et al 2024
  • Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs, Ge et al 2023
Wed Nov 12, 2025 Lecture Contrastive learning and retrieval - Guest Lecture, John Wieting, Google DeepMind
Resources
  • DeepHash paper, 2017
  • DPR paper, 2020
  • Condenser paper
  • EntityQuestions paper
  • ColBERT paper
  • MoCo paper
  • Contreiver paper
Previous HW due
Fri Nov 14, 2025 Recitation HW4 Writing Session
Mon Nov 17, 2025 Lecture Optimizing Transformer Architectures - Guest Lecture, Michiel de Jong, Cursor
Resources
  • HyDE paper
  • LameR paperx
  • Echo embedding paper
  • PromptEOL paper
  • RAG for knowledge-intensive NLP tasks, 2020
  • REALM paper, 2020
  • Fusion in decoder paper, 2021
  • FiDO paper
  • Shazeer paper on optimizing decoder inference with multihead queries
  • Parallel Context Windows, 2023
  • Block-Attention for Efficient RAG, 2024
  • Dynamic Block-Sparse Attention paper, 2025
  • Shazeer paper analyzing operational intensity
  • LUMEN paper
  • GLIMMER paper
Wed Nov 19, 2025 Lecture Wrapup of recent topics and TBD Miniproject: project check in due (written report OR oral check-in)
Fri Nov 21, 2025 Recitation HW5 Writing Session
Mon Nov 24, 2025 Lecture Network Effects and Scalable Evaluation
Resources
  • Levy and Goldberg demystification of word2vec (1/2) 2014
  • Levy and Goldberg demystification of word2vec (2/2) 2014
  • Task arithmetic paper, 2023
  • Weight disentanglement and neural tangent kernel paper, 2023
  • TIES-merging paper, 2023
  • LoRA hub paper, 2024
  • PHATGOOSE paper, 2024
  • Information cascades, Ch 16, Easly and Kleinberg, 2010
  • Salganic, Dodds, Watts, Science 2006
  • Schelling Spatial Segregation model
  • Prediction Powered Inference, Angelopolous 2023, Science
  • Bayesian PPI, Hofer et al 2024
  • Kamaloo et al, 2023
  • Fisch et al, 2024
Mon Dec 1, 2025 Lecture Exam Review and QA Miniprojects due
Wed Dec 3, 2025 Lecture Exam 2