10-405/10-605: ML with Large Datasets, Spring 2025

Schedule (subject to change)

Date Class Type Topic Resources Announcements
Mon Jan 13, 2025 Lecture Overview / Slides (pdf) / Slides (pptx) / Recording
  • William W. Cohen (1993). Efficient pruning methods for separate-and-conquer rule learning systems
  • Banko, Michele, and Eric Brill (2001). Scaling to very very large corpora for natural language disambiguation
  • Google NGrams Viewer
  • Norvig, Pereira, Halevy (2009). The Unreasonable Effectiveness of Data
  • Yoshua Bengio (2009). Learning Deep Architectures for AI
  • Sample code from lecture for doing ngram queries
  • Hoffman et al (2022). Training Compute-Optimal LLMs
Wed Jan 15, 2025 Lecture Map-Reduce and Spark / Slides (pdf) / Slides (pptx) / Recording
  • Visualizing the cost of operations
  • Historical cost of storage
  • Bash Reduce - minimal implementation of Map-Reduce
HW 1 out - Entity Resolution and Naive Bayes in Spark
Fri Jan 17, 2025 Recitation Recitation: PySpark / Slides (pdf) / Notebook / Recording
Wed Jan 22, 2025 Lecture Workflows for Map-Reduce Systems / Slides (pdf) / Slides (pptx) / Recording
  • Ken Church (1994). Unix for Poets
  • Demo code from lecture (all in one file)
  • Code from lecture for doing wordcounts in Spark
  • Code for phrase-finding using Spark
  • A Language Model Approach to Keyphrase Extraction
  • Code for slow implementation of PageRank
  • Faster implementation of PageRank
  • Fastest implementation of PageRank (v2)
Fri Jan 24, 2025 Recitation Recitation: Linear Algebra Review / Slides (pdf) / Notebook / Recording
Mon Jan 27, 2025 Lecture Learning as Optimization 1 / Slides (pdf) / Slides (pptx) / Recording
  • Previous class lecture on Sept 9 2024
  • Previous class lecture on Sept 11 2024
Wed Jan 29, 2025 Lecture Learning as Optimization 2 / Slides (pdf) / Slides (pptx) / Recording
  • Previous class lecture on Sept 16 2024
  • William's notes on SGD for Logistic Regression and Sparsity
  • Hash Kernels, PMLR 2009
  • Feature hashing for large scale multitask learning, ICML 2009
HW 2 out - Parallel Linear Regression in Spark
HW 1 due
Fri Jan 31, 2025 Recitation No Recitation
Mon Feb 3, 2025 Lecture Learning as Optimization 3 / Slides (pdf) / Slides (pptx) / Recording
  • Hogwild, A Lock-Free Approach to Parallelizing Stochastic Gradient Descent, Rech et al, 2011
  • FastText classification library
  • Bag of Tricks for Efficient Text Classification, Joulin et al 2016
  • Communication-Efficient Distributed Deep Learning, Tang et al 2023
  • Distributed Training Strategies for the Structured Perceptron, McDonald et al, 2010
  • Large-scale matrix factorization with distributed stochastic gradient descent - Gemulla et al 2011
Wed Feb 5, 2025 Lecture Randomized Algorithms 1 - Bloom Filters and Count-Min Sketches / Slides (pdf) / Slides (pptx) / Recording
  • William's Notes on Randomized Algorithms
  • Demo Bloom filter implementation in Python
  • {'Short and Deep': 'Sketching and Neural Networks, Daniely et al'}
  • Code from lecture for bloom filter implementation
  • Sketch Algorithms for Estimating Point Queries in NLP - Goyal et al 2012
  • Previous course lecture, 9/25/2024
  • Sketch Algorithms for Estimating Point Queries in NLP, 2012
Fri Feb 7, 2025 Recitation Recitation: Probability for LSH / Slides (pptx) / Recording
Mon Feb 10, 2025 Lecture Randomized Algorithms 2 - Locality Sensitive Hashing / Slides (pdf) / Slides (pptx) / Recording
  • Fast Exact Search in Hamming Space with Multi-Index Hashing, 2014
  • Accelerating Large-Scale Inference with Anisotropic Vector Quantization, 2020
  • The FAISS library, 2024
HW 3 out - Locality-Sensitive Hashing
Wed Feb 12, 2025 Lecture Countmin Application / AutoDiff / Slides (pdf) / Slides (pptx) / Recording
  • Faithful KB Embeddings - Sun et al, 2020
  • A simple explanation of reverse-mode automatic differentiation, Justin Domke
  • Sample code for autodiff with Wengert lists
  • Sample code for checking autodiff implementation with PyTorch
Fri Feb 14, 2025 Recitation Recitation: AWS / Slides (pptx) / Recording
Mon Feb 17, 2025 Lecture Parallel Optimization with GPUs 1 / Slides (pdf) / Slides (pptx) / Recording
  • Mythbusters explain GPUs
  • GFlops / dollar up to 2022
  • Benchmarking GPUs in PyTorch on a Mac
  • Fall 2024 10-605 lecture on ML hardware
  • Code for comparing GPU vs CPU times in PyTorch
Wed Feb 19, 2025 Lecture Parallel Optimization with GPUs 2 / Slides (pdf) / Slides (pptx) / Recording
  • Previous class lecture on Nov 4 2024
  • Previous class lecture on Oct 21 2024
Fri Feb 21, 2025 Recitation Recitation: Practice Exam
HW 3 due
Mon Feb 24, 2025 Lecture Exam Review and QA / Slides (pdf) / Slides (pptx) / Recording
Wed Feb 26, 2025 Lecture Exam 1
Mon Mar 10, 2025 Lecture Deep Learning: Background and Architectures / Slides (pdf) / Slides (pptx) / Recording
  • online book on deep networks
  • Understanding the difficulty of training deep feedforward neural networks, Glorot and Bengio, AIStats 2010
  • Convolution demo
  • Deep Residual Learning for Image Recognition, He et al, CVPR 2016
  • Efficient Estimation of Word Representations in Vector Space, Mikolov et al, 2013
  • Visualize Word2Vec
  • Karpathy blog post on LSTMs
  • NMT by jointly learning to align and translate, 2016
HW: LoRA
Wed Mar 12, 2025 Lecture Deep Learning: Transformers / Slides (pdf) / Slides (pptx) / Recording
  • The Annotated Transformer
  • LLMs from scratch book on Github
  • A bit like the annotated transformer but with more depth in some areas
  • Dropout paper, Srinivasta et al JMLR 2014
  • LayerNorm paper, Ba et al, 2016.
  • Neural Machine Translation of Rare Words with Subword Units, Sennrich et al 2016
  • Huggingface's guide to BPE and similar
  • An Image is Worth 16x16 Words: Transformers for Image Recognition, ICLR 2021
  • MuRag Multimodal Retrieval-Augmented Generation ..., Chen et al 2022
Fri Mar 14, 2025 Recitation Recitation: PyTorch / HW4 / Notebook / Recording
Mon Mar 17, 2025 Lecture Guest lecture - Greg Kochanski, Google Deep Mind, and then Transformer Tokenization / Slides (pdf) / Slides (pptx) / Recording
  • PDF of Greg's slide deck
  • Neural Machine Translation of Rare Words with Subword Units, Sennrich et al 2016
  • Huggingface's guide to BPE and similar
  • An Image is Worth 16x16 Words: Transformers for Image Recognition, ICLR 2021
  • MuRag Multimodal Retrieval-Augmented Generation ..., Chen et al 2022
Wed Mar 19, 2025 Lecture Hyperparameter parameter search and other topics / Slides (pdf) / Slides (pptx) / Recording
  • João Lages blog post
  • Hoffman et al (2022). Training Compute-Optimal LLMs
  • (Mis)fitting - a study of scaling laws, ICLR 2025
  • Hyperband - A Novel Bandit Based Approach to Hyperparameter Optimization, JMLR 2018
  • Distilling the Knowledge in a Neural Network, Hinton, Vinyals, Dean, 2015
  • Model Compression, Bucila et al, 2006
  • Raschka blog post, 2023
  • Lei Mao blog post, 2023
  • LLM.int8(), Dettmers et al 2022
  • Dettmers and Zettlemoyer 2023
  • PV-Tuning, Malinovskii et al, 2024
Fri Mar 21, 2025 Recitation Recitation - HW5 / Slides (pdf) / Recording
Mon Mar 24, 2025 Lecture Quantization and Pruning for LLMs / Slides (pdf) / Slides (pptx) / Recording
  • Raschka blog post, 2023
  • Lei Mao blog post, 2023
  • LLM.int8(), Dettmers et al 2022
  • Dettmers and Zettlemoyer 2023
  • PV-Tuning, Malinovskii et al, 2024
  • The cat neuron paper
  • Learning to Generate Reviews and Discovering Sentiment, Radford et al 2017
  • To prune or not to prune ..., Zhu and Gupta 2017
  • A simple and effective approach ... (the Wanda paper), Sun et al
  • Accelerating Sparse Deep Neural Networks, Mishra et al 2021
  • LLM Pruner, 2023
  • Shortened Llama, ... Kim et al 2024
  • Sheared LLama ... , Xia et al, 2024
  • Han et al 2016
HW: GPT-2 Training and Miniproject released
Wed Mar 26, 2025 Lecture KV-Caching for LLMs / Slides (pdf) / Slides (pptx) / Recording
  • LLM Pruner, 2023
  • Shortened Llama, ... Kim et al 2024
  • Sheared LLama ... , Xia et al, 2024
  • Han et al 2016
  • ETC: Encoding Long and Structured Inputs in Transformers, Ainsle et al 2020
  • {'Big Bird': 'Transformers for Longer Sequences, Zaheer et al, 2020'}
  • Reformer: the Efficient Transformer, Kitaev et al, 2020
  • H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models, Zhang et al 2023
  • Efficient streaming LMs with attention sinks, Xiao et al 2024
  • SnapKV: LLM Knows What You are Looking for Before Generation, Li et al 2024
Fri Mar 28, 2025 Recitation Recitation - Miniproject / Slides (pdf) / Slides (pptx) / Recording
Mon Mar 31, 2025 Lecture Guest lecture - Haitian Sun, Apple / Slides (pdf) / Recording
Wed Apr 2, 2025 Lecture KV Caching and Model Compression Recap / Slides (pdf) / Slides (pptx) / Recording
  • SnapKV: LLM Knows What You are Looking for Before Generation, Li et al 2024
  • Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs, Ge et al 2023
Mon Apr 7, 2025 Lecture Retrieval Augmented Generation 1 / Slides (pdf) / Slides (pptx) / Recording
  • DeepHash paper, 2017
  • DPR paper, 2020
  • Condenser paper
  • EntityQuestions paper
  • ColBERT paper
  • MoCo paper
  • Contreiver paper
  • MoCo paper
Previous HW due
Wed Apr 9, 2025 Lecture Retrieval Augmented Generation 2 / Slides (pdf) / Slides (pptx) / Recording
  • HyDE paper
  • LameR paperx
  • Echo embedding paper
  • PromptEOL paper
  • RAG for knowledge-intensive NLP tasks, 2020
  • REALM paper, 2020
  • Fusion in decoder paper, 2021
  • FiDO paper, 2023
  • Shazeer paper on optimizing decoder inference with multihead queries
  • LUMEN paper, 2023
  • GLIMMER paper, 2023
  • Parallel context windows, 2023
  • Block-attention for RAG, 2024
  • Dynamic block-sparse attention for ICL, 2025
  • FiDO paper
  • Shazeer paper analyzing operational intensity
  • LUMEN paper
  • GLIMMER paper
  • Parallel Context Windows, 2023
  • Block-Attention for Efficient RAG, 2024
  • Dynamic Block-Sparse Attention paper, 2025
Fri Apr 11, 2025 Recitation No Recitation
Mon Apr 14, 2025 Lecture RAG3 and Embarrasingly Parallel Training / Slides (pdf) / Slides (pptx) / Recording
  • Parallel Context Windows, 2023
  • Block-Attention for Efficient RAG, 2024
  • Dynamic Block-Sparse Attention paper, 2025
  • Branch-train-merge paper
  • Branch-train-mix paper
Wed Apr 16, 2025 Lecture Embarassingly Parallel Train 2 and Scalable Evaluation / Slides (pdf) / Slides (pptx) / Recording
  • Levy and Goldberg demystification of word2vec (1/2) 2014
  • Levy and Goldberg demystification of word2vec (2/2) 2014
  • Task arithmetic paper, 2023
  • Weight disentanglement and neural tangent kernel paper, 2023
  • TIES-merging paper, 2023
  • LoRA hub paper, 2024
  • PHATGOOSE paper, 2024
  • Information cascades, Ch 16, Easly and Kleinberg, 2010
  • Salganic, Dodds, Watts, Science 2006
  • Schelling Spatial Segregation model
  • Prediction Powered Inference, Angelopolous 2023, Science
  • Bayesian PPI, Hofer et al 2024
  • Kamaloo et al, 2023
  • Fisch et al, 2024
Fri Apr 18, 2025 Recitation Recitation - Practice Exam
Mon Apr 21, 2025 Lecture Exam Review and QA / Slides (pdf) / Slides (pptx) / Recording
Miniproject due
Wed Apr 23, 2025 Lecture Exam 2