Is this a game... or is it real?

RL Roadmap - Updated 3 Month Plan

Written on

I'm a month into my initial Reinforcement Learning Roadmap and a couple things have changed. First of all I've been accepted to University of Michigan Dearborn to begin a Doctor of Engineering degree in January 2026. Second my learnings from the first four weeks have convinced me that I need to focus more on the foundations of RL and less on other peoples implementations. The Hugging Face course on RL is great but it's higher level than I want to be at right now. This updated roadmap focuses on continued study of the basics and my own implementations of the algorithms which I have started in a Python module I've dubbed learnrl.

To prepare for the D.Eng the updated plan removes all the Hugging Face courses and replaces the time I would spend on that with reading research papers a capstone in the final four weeks. The plan is:

  1. Complete the book and Coursera courses over the next 8 weeks
  2. Implement algorithms in learnrl which can be used for the Coursera capstone
  3. Begin rearch for capstone in weeks 9-12
  4. Use implementations in the learnrl module for a healthcare or cybersecurity capstone

For the capstone I'm currently leaning towards something related to Identity and Access Managment (IAM) as that is a growing concern as we move more critical resources to the cloud. However I'm also interested in autonomous networks and may look into something in that area. I'm going to have to see how things evolve as I continue to learn and dive deeper into research papers.

3-Month RL Foundations Roadmap (~10 hrs/week)

Cadence: ~10 hrs/week × 12 weeks (completing before D.Eng. start in January) Stack: Python, NumPy, PyTorch, Gymnasium, pytest, MLflow, Docker Focus: Deep understanding of RL fundamentals through from-scratch implementations, applied to healthcare cybersecurity (adaptive IAM?)


Table of Contents


Phase 1: Foundations (Weeks 1–4)

Week Core RL Learning (7 hrs) Engineering/Implementation (3 hrs)
1 C1 Fundamentals + C2 Sample-based Learning COMPLETE ✅; S&B Ch.1–6 Created learnrl/ package, EpsilonGreedyBandit, BanditTestEnvironment with comprehensive tests (UCB/Thompson Sampling TODO) ✅
2 C3 M1–M3: Function Approximation foundations, tile coding; S&B Ch.9 Implement Monte Carlo (prediction & control), TD(0) in learnrl/td/ with tests; PolicyIteration/ValueIteration
3 C3 M4: Control with approximation, semi-gradient methods; S&B Ch.10 Implement SARSA, Q-Learning, Expected SARSA in learnrl/td/; compare on-policy vs off-policy on Cliff Walking
4 Off-policy learning with approximation, deadly triad; S&B Ch.11 Implement n-step TD, eligibility traces; implement importance sampling; complete td/ module test coverage

Math (2-3 hrs/week): 3Blue1Brown Linear Algebra + Khan Academy Probability basics


Phase 2: Coursera Capstone Project (Weeks 5–8)

Week Core RL Learning (7 hrs) Engineering/Implementation (3 hrs)
5 C4 M1+M2: Formalize problem as MDP, implement environment, apply 3 algorithms from learnrl.td and compare performance Implement linear function approximation with tile coding in learnrl/function_approx/; add MLflow experiment tracking
6 C4 M3: Identify key parameters affecting agent performance; explore parameter space Build automated parameter exploration framework; implement semi-gradient methods in function_approx/
7 C4 M4: Implement Expected SARSA or Q-Learning with Neural Networks + RMSProp; verify correctness Implement neural network function approximation in learnrl/function_approx/nn_fa.py with RMSProp optimizer
8 C4 M5: Parameter study with statistical analysis, visualize learned agents, complete C4 final submission Complete function approximation module with full test coverage; prepare Healthcare IAM MDP formulation

Math (2-3 hrs/week): Hyperparameter optimization, RMSProp and adaptive learning rates, statistical comparison of algorithms, parameter sensitivity analysis Healthcare/Cyber Reading: Function approximation in RL, hyperparameter tuning methodologies, begin healthcare authentication background reading


Phase 3: Healthcare IAM Research Capstone (Weeks 9–12)

Project: Intelligence Amplification for Dynamic Authorization in Healthcare IAM Alberta Plan Alignment: Step 12 (Intelligence Amplification) - Real-world IA in safety-critical domain

Detailed week-by-week plan available in ⚠️ TODO

Week Core Learning (3 hrs) Capstone Implementation (7 hrs)
9 C4 review and reflection; Alberta Plan paper; GVF foundations; Healthcare PBAC architecture; Safe RL and continual learning MDP formulation for dynamic authorization with GVFs, PBAC simulator, baseline policies (manual, role-tier, peer-based)
10 RLHF methodologies; Uncertainty estimation; Active learning; IA evaluation methods RL+RLHF implementation using learnrl, GVF ensemble for uncertainty, ITSM escalation policy, continual learning
11 IA metrics and evaluation; Statistical testing for security; Temporal uniformity in practice Evaluation framework with IA metrics (decision quality, cognitive load), safety analysis (HIPAA/PHIPA, privilege escalation), statistical comparison
12 Academic paper writing for IA/security; Alberta Plan contribution framing Paper draft "Intelligence Amplification for Dynamic Authorization in Healthcare IAM" (Alberta Plan Step 12), demo interface, results visualization

Math (2-3 hrs/week): Statistics for A/B testing, confidence intervals, GVF prediction accuracy, IA metrics (decision quality, learning efficiency), statistical validation for security Healthcare/Cyber Reading: Role mining and PBAC optimization, RLHF papers, Alberta Plan and GVF papers, IA case studies, healthcare IAM challenges, safe RL deployment


Applied Math Track (2-3 hrs/week)

Focus: Practical application rather than theoretical depth

Weeks 1-4: Foundations - 3Blue1Brown – Essence of Linear Algebra (focus on matrix operations, eigenvectors in practice) - Khan Academy probability exercises (focus on distributions, expectation) - StatQuest – Probability & Bayes

Weeks 5-8: Optimization & Statistics - Gradient descent variations and practical considerations - Hyperparameter optimization methods - Statistical significance testing for ML

Weeks 9-12: Applied Statistics & Security - A/B testing methodology for RL experiments - Confidence intervals and error analysis - Sample efficiency metrics - Risk quantification and safety constraints in RL - Statistical validation for security applications

Milestone: By week 12, should be able to: - Implement gradient-based optimization from scratch - Design statistically valid RL experiments with safety constraints - Explain mathematical concepts behind RL algorithms in engineering terms - Apply statistical methods to evaluate security/healthcare RL systems


Healthcare/Cybersecurity Paper Reading (1-2 hrs/week)

Focus: RL applications in healthcare and cybersecurity domains

Weeks 1-4: Foundations & Survey Papers - Survey papers on RL in healthcare and cybersecurity - Case studies of ML/RL in clinical decision support - Overview of adaptive security systems

Weeks 5-8: Authentication & Access Control - Risk-based authentication systems - Adaptive access control and IAM - Behavioral biometrics and anomaly detection - Papers on safe RL and constrained optimization

Weeks 9-12: Alberta Plan + Healthcare IAM Specific - Alberta Plan paper (Sutton, Bowling, Pilarski 2023) - Required reading Week 9 - GVF papers: Horde architecture (Sutton et al. 2011), reward-respecting subtasks - Intelligence Amplification: Historical papers (Licklider 1960, Engelbart 1962), modern IA applications - RLHF papers: Learning from human preferences (Christiano et al. 2017), reward modeling - Role mining and PBAC: Automated role assignment, RBAC/ABAC optimization - Healthcare IAM challenges and HIPAA/PHIPA compliance - Case studies of RL deployment in security-critical systems - Evaluation methodologies for IA systems and security ML


Coursera Module Coverage

Course / Module Scheduled Week(s)
C1. Fundamentals of RL ** ✅ COMPLETE**
M1 Welcome
M2 Intro to Sequential Decision-Making (bandits)
M3 Markov Decision Processes
M4 Value Functions & Bellman Equations
M5 Dynamic Programming
C2. Sample-based Learning Methods ** ✅ COMPLETE**
M1 Welcome
M2 Monte Carlo (pred & control)
M3 TD for Prediction
M4 TD for Control (SARSA, Q-Learning)
M5 Planning, Learning & Acting (Dyna)
C3. Prediction & Control w/ Function Approximation W2–W4
M1 Welcome W2
M2 On-policy Prediction w/ Approx W2
M3 Constructing Features (tile coding) W2
M4 Control w/ Approx (semi-gradient methods) W3
M5 Policy Gradient (optional, skip for now) Post-W12 if time
C4. Capstone W5–W8
M1: Formalize problem as MDP (Coursera's problem) W5
M2: Choose and compare algorithms W5
M3: Parameter identification and exploration W6
M4: Neural network implementation with RMSProp W7
M5: Parameter study and statistical analysis W8
Healthcare IAM Capstone (Your research project) W9–W12
Apply C4 methodology to healthcare IAM problem W9–W12

Portfolio & Publication Goals

Technical Portfolio (github.com/j-klawson/learnrl): - Clean, well-tested implementations of core RL algorithms from scratch - Comprehensive documentation explaining algorithmic decisions - Reproducible experiments with statistical analysis - Coursera C4 Capstone: Complete RL system applying algorithms to Coursera's problem (Weeks 5-8) - Healthcare IAM Capstone: Adaptive authentication for healthcare IAM (Weeks 9-12)

Publication Target: - Paper: "Safe Reinforcement Learning for Adaptive Authentication in Healthcare IAM Systems" - Venues: USENIX HealthSec Workshop, IEEE Security & Privacy Workshop, ACM CCS Workshop - Timeline: Draft by Week 12, submit Q1 2026

D.Eng. Preparation: - Strong foundations in bandits → MDPs → DP → TD → function approximation - Experience with neural network function approximation and RMSProp - Two capstone projects: practice (C4) + research (Healthcare IAM) - Understanding of safe RL and constrained optimization (critical for healthcare) - Real-world problem formulation experience - Publication-ready research demonstrating applied research capability

Repository Structure:

learnrl/
├── bandits/              # Week 1: k-armed bandits ✅
   ├── epsilon_greedy.py 
   ├── ucb.py (TODO)
   └── thompson_sampling.py (TODO)
├── dp/                   # Week 1-2: Dynamic programming ✅
   ├── value_iteration.py 
   └── policy_iteration.py 
├── td/                   # Week 2-4: Temporal difference learning
   ├── monte_carlo.py
   ├── td_zero.py
   ├── sarsa.py
   ├── q_learning.py
   └── expected_sarsa.py
├── function_approx/      # Week 5-8: Function approximation
   ├── tile_coding.py
   ├── linear_fa.py
   ├── semi_gradient.py
   └── nn_fa.py          # Neural network FA with RMSProp
├── capstone/             # Week 9-12: Healthcare IAM
   ├── auth_env.py       # Authentication environment
   ├── policies.py       # Baseline and RL policies
   ├── safety.py         # Safety constraints
   └── evaluation.py     # Metrics and analysis
├── utils/
   ├── bandit_env.py 
   ├── gridworld_env.py 
   └── stats.py          # Statistical testing
└── tests/                # 156 tests, 92% coverage ✅

Outcomes at Week 12

Technical Mastery: - Deep understanding of RL foundations (bandits → MDPs → DP → TD → function approximation) - From-scratch implementations demonstrating algorithmic understanding - Experience with safe RL and constrained optimization for critical systems - Statistical methodology for evaluating RL systems

Research Capability: - Formulated real-world problem as MDP (adaptive authentication) - Applied RL to safety-critical domain (healthcare IAM) - Paper draft ready for workshop submission - Experience with empirical evaluation and statistical validation

D.Eng. Readiness: - Strong theoretical foundations for advanced coursework - Practical experience applying RL to healthcare cybersecurity - Publication demonstrating research capability - Clear dissertation direction (safe RL for healthcare/security)

Portfolio Artifacts: - github.com/j-klawson/learnrl: Foundational RL implementations - Capstone project: Adaptive authentication system - Paper: "Safe RL for Adaptive Authentication in Healthcare IAM" - Comprehensive documentation and reproducible experiments


Resources

Reinforcement Learning

Engineering & Testing

Applied Math

Healthcare & Cybersecurity Applications

  • IEEE Security & Privacy - Security research journal
  • ACM CCS - Computer and communications security conference
  • Search terms for papers: "adaptive authentication", "risk-based access control", "reinforcement learning security", "safe reinforcement learning", "healthcare IAM"