D.Eng. Preparation - 2 Month Plan

I've completed the University of Alberta Reinforcement Learning Specialization which involved reading most of the Reinforcement Learning: An Introduction. In my original and updated RL learning roadmaps I was striving to actually start implementing something. I did get a start on some Pytorch implementations of basic algorithms but ended up focusing on the coursera course and the textbook. Having completed that it's pretty apparent to me that I need to do a deeper dive in to a refresh of the math. It's been 22 years since I graduated comp sci and I haven't been practicing math at all. It makes a lot more sense to focus on that weakness and starting a literature review than diving deeper in to implementations right now.

Rather than launching into a capstone project, the next 8 weeks will focus on refreshing my mathematical foundations and beginning my literature review with Sutton's Reading list for Rich's RL approach to AI. This should position me well for the D.Eng. program starting in January 2026.

Overview

Duration: 8 weeks (November 2025 - December 2025)
Total Time Commitment: ~11-12 hours/week
Math Track: DeepLearning.AI Mathematics for Machine Learning and Data Science (3-course specialization, ~93 hours)
Literature Track: Sutton's 10 foundational papers spanning 35 years of RL research

Overview
Rationale
8-Week Curriculum
Sutton's 10 Core Papers
Course Details
Synthesis & Outcomes

8-Week Curriculum

Week-by-Week Breakdown

Week	Math Course	Focus Area	Sutton's Readings
1	Linear Algebra M1-M2 (10 hrs)	Foundations: vectors, matrices, value functions	[1] S&B textbook (review) + [2] TD learning (Sutton 1988)
2	Linear Algebra M3-M4 (11 hrs)	Linear transformations, temporal abstraction	[3] IDBD/gradient descent (Sutton 1992) + [4] Options/temporal abstraction (Sutton et al 1999)
3	Linear Algebra M5 (11 hrs)	Eigenvalues, dimensionality reduction, GVFs	[5] Horde/GVF architecture (Sutton et al 2011) + [6] Alberta Plan (Sutton et al 2023)
4	Calculus M1-M2 (13 hrs)	Derivatives, gradients, optimization fundamentals	[10] Dyna-style planning (Sutton et al 2008)
5	Calculus M3 (13 hrs)	Gradient descent in neural networks, modern perspective	[8] Gaps in planning video (Sutton 2021) + [9] Model-based RL (Sutton 2020)
6	Probability & Statistics M1-M2 (11 hrs)	Distributions, hypothesis testing, reward estimation	[7] Reward-respecting subtasks (Sutton et al 2023)
7-8	Probability & Statistics M3-M5 (22 hrs)	Confidence intervals, Bayesian methods, comprehensive synthesis	Synthesis: Integrate all 10 readings into unified research framework

Total: ~93 hours math courses + 20-25 hours paper reading = 113-118 hours over 8 weeks

Sutton's 10 Foundational Readings

This reading list comes directly from Sutton's recommendations for understanding his RL approach to AI (Oct 2023). The list below follows his recommended order and includes his commentary on why each reading matters.

[1] Reinforcement Learning: An Introduction

Sutton & Barto (2018), 2nd Edition
Status: You've already read most of this; may review chapters as needed
Why it matters (Sutton's perspective): "This is the best starting point and reference. It is also the best reference for deeply understanding the reasons behind the core algorithms."
Key concepts: Discrete states, tabular agents, function approximation, basic planning (Chapter 8)
Review during: Weeks 1-8 as needed

[2] Learning to Predict by the Methods of Temporal Differences

Sutton (1988)
Why it matters: Introduces TD(λ), "the one algorithm that I feel we all must understand completely"
Sutton's note: "Of course TD(λ) is also presented in [1], but it is spread across Chapters 6 and 12, and besides, sometimes going back in time is a fun way to learn"
Related resource: Sutton provides a video on TD learning (included in that document)
Read during: Week 1; establish foundational understanding of temporal difference learning

[3] Adapting Bias by Gradient Descent: An Incremental Version of Delta-Bar-Delta

Sutton (1992)
Why it matters: "The key IDBD paper on step-size optimization" - essential content not in the RL textbook
Significance: Precursor to modern adaptive learning rate methods
Read during: Week 2; pair with understanding of gradient descent

[4] Between MDPs and semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning

Sutton, Precup, & Singh (1999)
Why it matters: "The options paper on temporal abstraction" - essential content not in the RL textbook
Foundation for: Hierarchical RL, multi-timescale decision-making, GVFs
Read during: Week 2; connects to linear transformations and eigenvalue concepts

[5] Horde: A Scalable Real-time Architecture for Learning Knowledge from Unsupervised Sensorimotor Interaction

Sutton, Modayil, Delp, Degris, Pilarski, White, & Precup (2011)
Why it matters: "The best first paper on GVFs" - essential content not in the RL textbook
Significance: Demonstrates GVF architecture applied to real robotics
Key concept: Generalized Value Functions as the core architecture
Read during: Week 3; after learning about eigenvalues and dimensionality reduction

[6] The Alberta Plan for AI Research

Sutton, Bowling, & Pilarski (2023)
Why it matters: "After [1-5] this plan should make sense to you"
Scope: 50-year research roadmap unifying all previous work toward intelligent agents
Relevance to you: Step 12 (Intelligence Amplification) directly addresses capstone direction
Sutton's context: "The one missing bit from [1-5] is advanced planning"
Read during: Week 3 (initial read) and Weeks 7-8 (deep re-read with full math foundation)

[7] Reward-Respecting Subtasks for Model-Based Reinforcement Learning

Sutton, Machado, Holland, Timbers, Tanner, & White (2023)
Why it matters: Part of Sutton's discussion of planning; "comes close to being a complete tabular-ish prototype AI"
Focus: Constructing task hierarchies that respect reward structure
Read during: Week 6; pairs with probability/statistics understanding of reward functions

[8] Gaps in the Foundations of Planning with Approximation (video)

Sutton (2021)
Why it matters: "On planning I recommend the video [8]"
Content: Identifies open problems and foundational issues in planning with function approximation
Format: Video (easier engagement than dense papers)
Watch during: Week 5; contextualizes current state of the field

[9] Toward a New Approach to Model-based Reinforcement Learning

Sutton (2020)
Why it matters: Sutton's unpublished perspective; "if you want more, consider the unpublished paper [9]"
Focus: Vision for next-generation model-based RL beyond current approaches
Read during: Week 5; complements video [8] on planning foundations

[10] Dyna-Style Planning with Linear Function Approximation and Prioritized Sweeping

Sutton, Szepesvari, Geramifard, Bowling (2008)
Why it matters: "The Linear Dyna paper [10]" - practical integration of planning and learning
Focus: How to combine model-based planning with function approximation
Read during: Week 4; pairs with calculus understanding of optimization and gradient descent

Course Details

Course 1: Linear Algebra for Machine Learning and Data Science

Duration: 34 hours (Weeks 1-3, ~11 hrs/week)

Key Topics: - Vectors and matrices: representing data, properties (rank, singularity, linear independence) - Matrix operations: dot product, inverse, determinants - Linear transformations and their interpretation - Eigenvalues and eigenvectors (essential for dimensionality reduction and GVFs) - Principal Component Analysis (PCA) - Applications to machine learning problems

Why this matters for Sutton papers: - Value functions are vectors in state space - Policy iteration uses matrix operations on transition dynamics - Eigenvalues relate to stationary distributions and convergence - GVFs and Horde architecture use projections (linear transformations) - PCA provides intuition for dimensionality reduction in feature learning

Course 2: Calculus for Machine Learning and Data Science

Duration: 26 hours (Weeks 4-5, ~13 hrs/week)

Key Topics: - Derivatives and gradients of functions - Analytic and approximate optimization - Chain rule and backpropagation - Gradient descent and variants - Loss functions and neural network training

Why this matters for Sutton papers: - TD learning uses gradient descent on value function approximation - Semi-gradient methods (C3 course review + Dyna paper) - Optimization of planning computations - Neural network function approximation (modern RL systems) - Understanding convergence properties

Course 3: Probability & Statistics for Machine Learning & Data Science

Duration: 33 hours (Weeks 6-8, ~11 hrs/week)

Key Topics: - Probability distributions and their properties - Hypothesis testing and confidence intervals - Bayesian inference and Bayesian statistics - Maximum Likelihood Estimation (MLE) and Maximum A Posteriori (MAP) - A/B testing and statistical comparison methods - Exploratory Data Analysis

Why this matters for Sutton papers: - Reward estimation as probabilistic inference - Uncertainty in value function approximation - Bayesian methods for exploration-exploitation - Statistical validation of RL agent improvements - Healthcare/security domain requires rigorous statistical methodology

Reading Strategy for Sutton Papers

I have a small git workflow setup to manage the papers and build my literature review. Hoping to do a separate blog posting on that workflow and publish the repo. The workflow goes like this.

Workflow Overview

Add papers to Zotero - Store all 10 Sutton papers in "D.Eng Literature Review" collection
Create structured notes - Use make note TITLE="..." (CLI interface) to scaffold notes from the template
Fill note template - Problem, Method, Findings, Limitations, Relevance, Replication
Update syntheses - Biweekly, synthesize findings into thematic documents
Build documents - Monthly, generate PDF/DOCX with full citations

Setup (One-time)

In Zotero: 1. Create a collection: "D.Eng Literature Review" 2. Install Better BibTeX 3. Right-click collection → Export → Better BibLaTeX 4. Check ☑ "Keep updated" 5. Export to: /Users/keith/code/litreview/bib/library.bib

In litreview repo:

cd litreview
make install        # Install dependencies
make sync-zotero    # Verify Better BibTeX setup

Weekly: Read Papers & Create Notes

For each paper:

cd litreview
make note TITLE="Sutton & Barto (2018) - Reinforcement Learning: An Introduction"
# Creates: notes/sutton-barto-2018-reinforcement-learning-an-introduction.md

Fill in the structured template: - Problem: What research question does this paper address? - Method: What approach/algorithms are used? - Findings: Key results and conclusions - Limitations: Acknowledged weaknesses or open questions - Relevance to D.Eng: How does this connect to research direction? (Intelligence Amplification, healthcare IAM) - Replication: Could you implement this? Any code references?

Example note structure:

# Sutton & Barto (2018) - Reinforcement Learning: An Introduction

## Problem
Comprehensive foundation for RL: how do agents learn from interaction?

## Method
- Value functions and Bellman equations
- Dynamic programming approaches
- Temporal difference learning
- Function approximation

## Findings
...

## Limitations
...

## Relevance to D.Eng
- Core reference for entire capstone
- Establishes foundational concepts for Alberta Plan
- TD learning basis for GVF architecture

## Replication
- Implementation in learnrl/ package
- Focus on understanding value function geometry

Biweekly: Synthesis & Integration

Update synthesis documents that integrate papers thematically:

# Edit main synthesis
vim litreview/syntheses/literature_review.md

# Or create theme-specific syntheses
vim litreview/syntheses/sutton-research-arc.md

Key themes to synthesize: 1. Foundations (1988-1992): TD learning → gradient descent → incremental learning 2. Temporal Abstraction (1999-2008): Options → Dyna → hierarchical RL 3. GVF Architecture (2011): Horde as realization of abstract ideas 4. Modern Vision (2020-2023): Gaps, model-based RL, reward-respecting subtasks 5. Alberta Plan (2023): Integration and 50-year roadmap

Link to notes in syntheses:

Sutton's foundational [TD learning paper](../notes/sutton-1988.md) established
the core learning mechanism used throughout all subsequent work.

Monthly: Build & Commit

Generate formatted documents with citations:

cd litreview
make build
# Creates:
# - docs/literature_review.pdf
# - docs/literature_review.docx

Commit work:

git add notes/ syntheses/ bib/
git commit -m "Add notes for Sutton papers weeks 1-3"

Output Structure

Notes and syntheses will be organized in the litreview repo:

/Users/keith/code/litreview/
├── notes/                        # Individual paper notes
│   ├── sutton-barto-2018.md
│   ├── sutton-1988-td-learning.md
│   ├── sutton-1992-gradient-descent.md
│   ├── sutton-precup-singh-1999-options.md
│   ├── sutton-modayil-2011-horde.md
│   ├── sutton-szepesvari-2008-dyna.md
│   ├── sutton-2020-model-based-rl.md
│   ├── sutton-2021-gaps-video.md
│   ├── sutton-machado-2023-reward-respecting.md
│   └── sutton-bowling-pilarski-2023-alberta-plan.md
├── syntheses/
│   ├── literature_review.md      # Master document
│   └── sutton-research-arc.md    # Thematic integration (Weeks 7-8)
├── bib/
│   └── library.bib              # Auto-exported from Zotero
└── docs/
    ├── literature_review.pdf    # Generated monthly
    └── literature_review.docx   # Generated monthly

Synthesis & Outcomes

Weeks 7-8: Integration Document

By the end of Week 6, you'll have deep notes on all 10 papers. Weeks 7-8 focus on creating a comprehensive synthesis document that shows:

Historical arc: How each paper built on previous work
1988-1992: TD learning foundations
1999-2011: Temporal abstraction (Options → GVFs)
2008: Reconciling planning with function approximation
2020-2023: Modern perspective and open problems
2023: Alberta Plan as unifying vision
Key concepts across papers:
Value functions and their approximation
Temporal abstraction and hierarchical learning
Generalized Value Functions (GVFs) as core architecture
Learning from unsupervised interaction
Planning and model-based learning reconciliation
Mathematical foundations:
Bellman equations and their operator properties
Gradient descent in non-stationary environments
Eigenstructure of transition matrices
Bayesian interpretation of learning
Path to Intelligence Amplification (Alberta Plan Step 12):
How do GVFs provide explainability?
How does reward-respecting structure support safe RL?
How does continual learning connect to non-stationary healthcare IAM?

Outcomes at Week 8

Mathematical Mastery: - Fluency with linear algebra (vectors, matrices, transformations, eigenvalues) - Strong calculus foundation (gradients, optimization, chain rule) - Rigorous statistics and probability for ML applications - Hands-on Python implementations in all three courses

Research Foundation: - Deep understanding of Sutton's 40-year research program - Clear narrative: TD learning → temporal abstraction → GVFs → Alberta Plan - Recognition of how classical RL concepts scale to modern systems - Appreciation for open problems in planning + approximation

D.Eng. Readiness: - Mathematical sophistication for doctoral coursework - Comprehensive literature foundation in core RL concepts - Research vision grounded in Alberta Plan framework - Prepared to define dissertation research direction in Intelligence Amplification

Artifacts: - Three Coursera specialization certificates - Detailed notes on 10 seminal papers - Integration document showing Sutton's research program - Clear understanding of how mathematics supports RL theory

Next Steps (After Week 8)

January 2026: D.Eng. program begins
Spring 2026: Capstone project?
Year 1 D.Eng.: Determin research topic. Prepare dissertation proposal.
Years 2-3 D.Eng.: Dissertation research

Resources

Courses

Mathematics for Machine Learning and Data Science (DeepLearning.AI)
Linear Algebra for Machine Learning and Data Science
Calculus for Machine Learning and Data Science
Probability & Statistics for Machine Learning & Data Science

Sutton's 10 Foundational Readings (Full Citations)

From Sutton's Reading List for His RL Approach to AI (Oct 2023)

[1] Sutton, R. S., & Barto, A. G. (2018). Reinforcement Learning: An Introduction (2nd ed.). MIT Press. http://incompleteideas.net/book/the-book-2nd.html
Note: Sutton recommends the linked online version, which is slightly more up to date than the printed version
[2] Sutton, R. S. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3(1), 9-44.
Note: Sutton also recommends a video on TD learning; available in his reading list document
[3] Sutton, R. S. (1992). Adapting bias by gradient descent: An incremental version of delta-bar-delta. Journal of Artificial Intelligence Research, 1, 161-180.
[4] Sutton, R. S., Precup, D., & Singh, S. (1999). Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Machine Learning, 31(2-3), 181-211.
[5] Sutton, R. S., Modayil, J., Delp, M., Degris, T., Pilarski, P. M., White, A., & Precup, D. (2011). Horde: A scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction. Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), 2395-2401.
[6] Sutton, R. S., Bowling, M., & Pilarski, P. M. (2023). The Alberta plan for AI research. arXiv preprint arXiv:2208.11173.
[7] Sutton, R. S., Machado, M. C., Holland, G. Z., Timbers, F., Tanner, B., & White, A. (2023). Reward-respecting subtasks for model-based reinforcement learning. In Learning for Dynamics and Control Conference.
[8] Sutton, R. S. (2021). Gaps in the foundations of planning with approximation. Video lecture.
[9] Sutton, R. S. (2020). Toward a new approach to model-based reinforcement learning. Unpublished manuscript. DeepMind Alberta.
[10] Sutton, R. S., Szepesvári, C., Geramifard, A., & Bowling, M. (2008). Dyna-style planning with linear function approximation and prioritized sweeping. In Proceedings of the International Conference on Machine Learning (ICML), 1007-1014.

Is this a game... or is it real?