ROADMAP.md 5.2 KB

Roadmap: Math CMS — Difficulty Calibration & Intelligent Exam

Overview

This roadmap transforms a difficulty calibration system that currently runs in isolation into one that drives intelligent exam assembly. The journey follows a strict gate-based progression: first validate that the calibration algorithm actually predicts outcomes (and fix the dual-scale data bug), then wire validated calibration into the production assembly pipeline, then build mastery-based adaptive matching on top, and finally add longitudinal health monitoring to catch drift before it harms students.

Phases

Phase Numbering:

  • Integer phases (1, 2, 3, 4): Planned milestone work
  • Decimal phases (2.1, 2.2): Urgent insertions (marked with INSERTED)

Decimal phases appear between their surrounding integers in numeric order.

  • Phase 1: Validation & Data Audit - Verify calibration accuracy with temporal backtesting, fix dual-scale bug, audit coverage
  • Phase 2: Assembly Integration - Wire validated calibrated difficulty into all exam assembly paths with fallback
  • Phase 3: Adaptive Matching - Map student mastery to optimal difficulty category for zone-of-proximal-development targeting
  • Phase 4: Health Monitoring - Detect calibration drift over time with actionable alerts

Phase Details

Phase 1: Validation & Data Audit

Goal: The calibration algorithm is verified against held-out historical data, the difficulty scale is unified to 0-1, and coverage gaps are visible -- forming a PASS/FAIL gate that must open before any production wiring Depends on: Nothing (first phase) Requirements: VAL-01, VAL-02, VAL-03, VAL-04, VAL-05 Success Criteria (what must be TRUE):

  1. Running a walk-forward backtest on historical answer data produces a PASS or FAIL verdict (not an indeterminate result)
  2. The backtest report shows Brier Skill Score, Pearson correlation, and calibration-vs-actual error rate metrics in a readable format
  3. If the backtest FAILs (metrics below threshold), no calibrated values can enter the production assembly pipeline
  4. All questions use a single unified 0-1 difficulty scale with no 0-5 scale values mixed in
  5. A coverage report shows what fraction of questions per knowledge point have sufficient calibration samples Plans: 2 plans

Plans:

  • 01-01-PLAN.md — Walk-forward backtest service with PASS/FAIL gate and Brier/Pearson metrics
  • 01-02-PLAN.md — Difficulty scale audit and calibration coverage report per knowledge point

Phase 2: Assembly Integration

Goal: All exam assembly paths use calibrated difficulty values when available, fall back gracefully to original values when not, and the difficulty distribution strategy is active by default Depends on: Phase 1 (PASS gate must open) Requirements: ASM-01, ASM-02, ASM-03, ASM-04 Success Criteria (what must be TRUE):

  1. IntelligentExamController and LearningAnalyticsService both use calibrated difficulty as the primary value when assembling exams
  2. Questions without sufficient calibration data automatically use the original questions.difficulty value without errors or gaps
  3. Difficulty distribution strategy is active by default in all exam types (diagnostic, practice, error-review)
  4. Each assembled exam records whether each question's difficulty came from calibration or original value, enabling post-hoc audit Plans: 2 plans

Plans:

  • 02-01-PLAN.md — Wire resolver into AssembleExamTaskJob and LearningAnalyticsService, activate distribution by default
  • 02-02-PLAN.md — Persist difficulty_source in paper_questions and verify with feature tests

Phase 3: Adaptive Matching

Goal: Exams automatically target each student's optimal learning zone by mapping their per-knowledge-point mastery to a difficulty category, closing the "answer-calibrate-assemble-re-answer" loop Depends on: Phase 2 (calibrated assembly working in production) Requirements: ADP-01, ADP-02 Success Criteria (what must be TRUE):

  1. The system computes a target difficulty range per student per knowledge point based on mastery data, targeting 60-75% expected correctness
  2. Exam assembly automatically shifts difficulty distribution weights so the resulting exam difficulty falls within the student's target zone Plans: TBD

Phase 4: Health Monitoring

Goal: Calibration quality is tracked longitudinally and drift is detected before it degrades exam quality, with actionable alerts Depends on: Phase 3 (accumulated production data from wired pipeline) Requirements: HLT-01, HLT-02 Success Criteria (what must be TRUE):

  1. The system periodically compares current calibration values against historical baselines and detects when drift exceeds a defined threshold
  2. Drift alerts include the direction (easier/harder), magnitude, affected knowledge points, and count of affected questions Plans: TBD

Progress

Execution Order: Phases execute in numeric order: 1 → 2 → 3 → 4

Phase Plans Complete Status Completed
1. Validation & Data Audit 0/2 Not started -
2. Assembly Integration 0/2 Not started -
3. Adaptive Matching 0/? Not started -
4. Health Monitoring 0/? Not started -