I'm a month into my initial Reinforcement Learning Roadmap and a couple things have changed. First of all I've been accepted to University of Michigan Dearborn to begin a Doctor of Engineering degree in January 2026. Second my learnings from the first four weeks have convinced me that I need to focus more on the foundations of RL and less on other peoples implementations. The Hugging Face course on RL is great but it's higher level than I want to be at right now. This updated roadmap focuses on continued study of the basics and my own implementations of the algorithms which I have started in a Python module I've dubbed learnrl.

To prepare for the D.Eng the updated plan removes all the Hugging Face courses and replaces the time I would spend on that with reading research papers a capstone in the final four weeks. The plan is:

Complete the book and Coursera courses over the next 8 weeks
Implement algorithms in learnrl which can be used for the Coursera capstone
Begin rearch for capstone in weeks 9-12
Use implementations in the learnrl module for a healthcare or cybersecurity capstone

For the capstone I'm currently leaning towards something related to Identity and Access Managment (IAM) as that is a growing concern as we move more critical resources to the cloud. However I'm also interested in autonomous networks and may look into something in that area. I'm going to have to see how things evolve as I continue to learn and dive deeper into research papers.

3-Month RL Foundations Roadmap (~10 hrs/week)

Cadence: ~10 hrs/week × 12 weeks (completing before D.Eng. start in January) Stack: Python, NumPy, PyTorch, Gymnasium, pytest, MLflow, Docker Focus: Deep understanding of RL fundamentals through from-scratch implementations, applied to healthcare cybersecurity (adaptive IAM?)

3-Month RL Foundations Roadmap (~10 hrs/week)
Table of Contents
Phase 1: Foundations (Weeks 1–4)
Phase 2: Coursera Capstone Project (Weeks 5–8)
Phase 3: Healthcare IAM Research Capstone (Weeks 9–12)
Applied Math Track (2-3 hrs/week)
Healthcare/Cybersecurity Paper Reading (1-2 hrs/week)
Coursera Module Coverage
Portfolio & Publication Goals
Outcomes at Week 12
Resources

Phase 1: Foundations (Weeks 1–4)

Week	Core RL Learning (7 hrs)	Engineering/Implementation (3 hrs)
1	C1 Fundamentals + C2 Sample-based Learning COMPLETE ✅; S&B Ch.1–6	Created `learnrl/` package, EpsilonGreedyBandit, BanditTestEnvironment with comprehensive tests (UCB/Thompson Sampling TODO) ✅
2	C3 M1–M3: Function Approximation foundations, tile coding; S&B Ch.9	Implement Monte Carlo (prediction & control), TD(0) in `learnrl/td/` with tests; PolicyIteration/ValueIteration
3	C3 M4: Control with approximation, semi-gradient methods; S&B Ch.10	Implement SARSA, Q-Learning, Expected SARSA in `learnrl/td/`; compare on-policy vs off-policy on Cliff Walking
4	Off-policy learning with approximation, deadly triad; S&B Ch.11	Implement n-step TD, eligibility traces; implement importance sampling; complete `td/` module test coverage

Math (2-3 hrs/week): 3Blue1Brown Linear Algebra + Khan Academy Probability basics

Phase 2: Coursera Capstone Project (Weeks 5–8)

Week	Core RL Learning (7 hrs)	Engineering/Implementation (3 hrs)
5	C4 M1+M2: Formalize problem as MDP, implement environment, apply 3 algorithms from `learnrl.td` and compare performance	Implement linear function approximation with tile coding in `learnrl/function_approx/`; add MLflow experiment tracking
6	C4 M3: Identify key parameters affecting agent performance; explore parameter space	Build automated parameter exploration framework; implement semi-gradient methods in `function_approx/`
7	C4 M4: Implement Expected SARSA or Q-Learning with Neural Networks + RMSProp; verify correctness	Implement neural network function approximation in `learnrl/function_approx/nn_fa.py` with RMSProp optimizer
8	C4 M5: Parameter study with statistical analysis, visualize learned agents, complete C4 final submission	Complete function approximation module with full test coverage; prepare Healthcare IAM MDP formulation

Math (2-3 hrs/week): Hyperparameter optimization, RMSProp and adaptive learning rates, statistical comparison of algorithms, parameter sensitivity analysis Healthcare/Cyber Reading: Function approximation in RL, hyperparameter tuning methodologies, begin healthcare authentication background reading

Phase 3: Healthcare IAM Research Capstone (Weeks 9–12)

Project: Intelligence Amplification for Dynamic Authorization in Healthcare IAM Alberta Plan Alignment: Step 12 (Intelligence Amplification) - Real-world IA in safety-critical domain

Detailed week-by-week plan available in ⚠️ TODO

Week	Core Learning (3 hrs)	Capstone Implementation (7 hrs)
9	C4 review and reflection; Alberta Plan paper; GVF foundations; Healthcare PBAC architecture; Safe RL and continual learning	MDP formulation for dynamic authorization with GVFs, PBAC simulator, baseline policies (manual, role-tier, peer-based)
10	RLHF methodologies; Uncertainty estimation; Active learning; IA evaluation methods	RL+RLHF implementation using `learnrl`, GVF ensemble for uncertainty, ITSM escalation policy, continual learning
11	IA metrics and evaluation; Statistical testing for security; Temporal uniformity in practice	Evaluation framework with IA metrics (decision quality, cognitive load), safety analysis (HIPAA/PHIPA, privilege escalation), statistical comparison
12	Academic paper writing for IA/security; Alberta Plan contribution framing	Paper draft "Intelligence Amplification for Dynamic Authorization in Healthcare IAM" (Alberta Plan Step 12), demo interface, results visualization

Math (2-3 hrs/week): Statistics for A/B testing, confidence intervals, GVF prediction accuracy, IA metrics (decision quality, learning efficiency), statistical validation for security Healthcare/Cyber Reading: Role mining and PBAC optimization, RLHF papers, Alberta Plan and GVF papers, IA case studies, healthcare IAM challenges, safe RL deployment

Applied Math Track (2-3 hrs/week)

Focus: Practical application rather than theoretical depth

Weeks 1-4: Foundations - 3Blue1Brown – Essence of Linear Algebra (focus on matrix operations, eigenvectors in practice) - Khan Academy probability exercises (focus on distributions, expectation) - StatQuest – Probability & Bayes

Weeks 5-8: Optimization & Statistics - Gradient descent variations and practical considerations - Hyperparameter optimization methods - Statistical significance testing for ML

Weeks 9-12: Applied Statistics & Security - A/B testing methodology for RL experiments - Confidence intervals and error analysis - Sample efficiency metrics - Risk quantification and safety constraints in RL - Statistical validation for security applications

Milestone: By week 12, should be able to: - Implement gradient-based optimization from scratch - Design statistically valid RL experiments with safety constraints - Explain mathematical concepts behind RL algorithms in engineering terms - Apply statistical methods to evaluate security/healthcare RL systems

Healthcare/Cybersecurity Paper Reading (1-2 hrs/week)

Focus: RL applications in healthcare and cybersecurity domains

Weeks 1-4: Foundations & Survey Papers - Survey papers on RL in healthcare and cybersecurity - Case studies of ML/RL in clinical decision support - Overview of adaptive security systems

Weeks 5-8: Authentication & Access Control - Risk-based authentication systems - Adaptive access control and IAM - Behavioral biometrics and anomaly detection - Papers on safe RL and constrained optimization

Weeks 9-12: Alberta Plan + Healthcare IAM Specific - Alberta Plan paper (Sutton, Bowling, Pilarski 2023) - Required reading Week 9 - GVF papers: Horde architecture (Sutton et al. 2011), reward-respecting subtasks - Intelligence Amplification: Historical papers (Licklider 1960, Engelbart 1962), modern IA applications - RLHF papers: Learning from human preferences (Christiano et al. 2017), reward modeling - Role mining and PBAC: Automated role assignment, RBAC/ABAC optimization - Healthcare IAM challenges and HIPAA/PHIPA compliance - Case studies of RL deployment in security-critical systems - Evaluation methodologies for IA systems and security ML

Coursera Module Coverage

Course / Module	Scheduled Week(s)
C1. Fundamentals of RL	✅ COMPLETE
M1 Welcome
M2 Intro to Sequential Decision-Making (bandits)
M3 Markov Decision Processes
M4 Value Functions & Bellman Equations
M5 Dynamic Programming
C2. Sample-based Learning Methods	✅ COMPLETE
M1 Welcome
M2 Monte Carlo (pred & control)
M3 TD for Prediction
M4 TD for Control (SARSA, Q-Learning)
M5 Planning, Learning & Acting (Dyna)
C3. Prediction & Control w/ Function Approximation	W2–W4
M1 Welcome	W2
M2 On-policy Prediction w/ Approx	W2
M3 Constructing Features (tile coding)	W2
M4 Control w/ Approx (semi-gradient methods)	W3
M5 Policy Gradient (optional, skip for now)	Post-W12 if time
C4. Capstone	W5–W8
M1: Formalize problem as MDP (Coursera's problem)	W5
M2: Choose and compare algorithms	W5
M3: Parameter identification and exploration	W6
M4: Neural network implementation with RMSProp	W7
M5: Parameter study and statistical analysis	W8
Healthcare IAM Capstone (Your research project)	W9–W12
Apply C4 methodology to healthcare IAM problem	W9–W12

Portfolio & Publication Goals

Technical Portfolio (github.com/j-klawson/learnrl): - Clean, well-tested implementations of core RL algorithms from scratch - Comprehensive documentation explaining algorithmic decisions - Reproducible experiments with statistical analysis - Coursera C4 Capstone: Complete RL system applying algorithms to Coursera's problem (Weeks 5-8) - Healthcare IAM Capstone: Adaptive authentication for healthcare IAM (Weeks 9-12)

Publication Target: - Paper: "Safe Reinforcement Learning for Adaptive Authentication in Healthcare IAM Systems" - Venues: USENIX HealthSec Workshop, IEEE Security & Privacy Workshop, ACM CCS Workshop - Timeline: Draft by Week 12, submit Q1 2026

D.Eng. Preparation: - Strong foundations in bandits → MDPs → DP → TD → function approximation - Experience with neural network function approximation and RMSProp - Two capstone projects: practice (C4) + research (Healthcare IAM) - Understanding of safe RL and constrained optimization (critical for healthcare) - Real-world problem formulation experience - Publication-ready research demonstrating applied research capability

Repository Structure:

learnrl/
├── bandits/              # Week 1: k-armed bandits ✅
│   ├── epsilon_greedy.py ✅
│   ├── ucb.py (TODO)
│   └── thompson_sampling.py (TODO)
├── dp/                   # Week 1-2: Dynamic programming ✅
│   ├── value_iteration.py ✅
│   └── policy_iteration.py ✅
├── td/                   # Week 2-4: Temporal difference learning
│   ├── monte_carlo.py
│   ├── td_zero.py
│   ├── sarsa.py
│   ├── q_learning.py
│   └── expected_sarsa.py
├── function_approx/      # Week 5-8: Function approximation
│   ├── tile_coding.py
│   ├── linear_fa.py
│   ├── semi_gradient.py
│   └── nn_fa.py          # Neural network FA with RMSProp
├── capstone/             # Week 9-12: Healthcare IAM
│   ├── auth_env.py       # Authentication environment
│   ├── policies.py       # Baseline and RL policies
│   ├── safety.py         # Safety constraints
│   └── evaluation.py     # Metrics and analysis
├── utils/
│   ├── bandit_env.py ✅
│   ├── gridworld_env.py ✅
│   └── stats.py          # Statistical testing
└── tests/                # 156 tests, 92% coverage ✅

Outcomes at Week 12

Technical Mastery: - Deep understanding of RL foundations (bandits → MDPs → DP → TD → function approximation) - From-scratch implementations demonstrating algorithmic understanding - Experience with safe RL and constrained optimization for critical systems - Statistical methodology for evaluating RL systems

Research Capability: - Formulated real-world problem as MDP (adaptive authentication) - Applied RL to safety-critical domain (healthcare IAM) - Paper draft ready for workshop submission - Experience with empirical evaluation and statistical validation

D.Eng. Readiness: - Strong theoretical foundations for advanced coursework - Practical experience applying RL to healthcare cybersecurity - Publication demonstrating research capability - Clear dissertation direction (safe RL for healthcare/security)

Portfolio Artifacts: - github.com/j-klawson/learnrl: Foundational RL implementations - Capstone project: Adaptive authentication system - Paper: "Safe RL for Adaptive Authentication in Healthcare IAM" - Comprehensive documentation and reproducible experiments

Resources

Reinforcement Learning

UAlberta RL Specialization - Primary course material
Sutton & Barto RL Book (2e) - The definitive RL textbook
OpenAI Spinning Up - Excellent educational resource
David Silver's RL Course - Video lectures

Engineering & Testing

pytest Documentation - Testing framework
MLflow - Experiment tracking
GitHub Actions - CI/CD automation
Python Packaging Guide - Package development
NumPy Documentation - Numerical computing

Applied Math

3Blue1Brown: Essence of Linear Algebra
3Blue1Brown: Essence of Calculus
Khan Academy – Statistics & Probability
StatQuest - Statistical concepts explained clearly
Mathematics for Machine Learning (Deisenroth, Faisal, Ong) — free PDF

Healthcare & Cybersecurity Applications

IEEE Security & Privacy - Security research journal
ACM CCS - Computer and communications security conference
Search terms for papers: "adaptive authentication", "risk-based access control", "reinforcement learning security", "safe reinforcement learning", "healthcare IAM"

Is this a game... or is it real?

RL Roadmap - Updated 3 Month Plan