Model Quartermaster
The Model Quartermaster (MQM) is an adaptive model-selection engine that observes LLM call patterns across sessions, computes weighted signal scores, and predicts the optimal model to use. A legacy Quartermaster (QM) variant also provides 5-signal tool prediction.
Architecture
┌──────────────────────────────────────────────────────────┐
│ Quartermaster System │
│ │
│ Model Calls ──► 6 Signal Computers ──► Fusion │
│ (trajectory, episodic, cost, quality, historical, │
│ reflection) │
│ │ │
│ ▼ │
│ Weighted Prediction │
│ (automate | suggest | defer) │
│ │ │
│ ▼ │
│ Correct? ──► Reinforcement Learning (EMA weight update) │
└──────────────────────────────────────────────────────────┘
Six Prediction Signals
| Signal | Source | Weight |
|---|---|---|
| Trajectory | Recent model usage patterns and sequences | Dynamic (EMA) |
| Episodic | Similar conversation context cosine matching | Dynamic (EMA) |
| Historical | Past performance data for task categories | Dynamic (EMA) |
| Cost | Cost efficiency optimization across models | Dynamic (EMA) |
| Quality | Expected quality based on model capabilities | Dynamic (EMA) |
| Reflection | Per-turn reflection feedback integration | Dynamic (EMA) |
Prediction Confidence Levels
| Confidence | Action | Safe Tools | Unsafe Tools |
|---|---|---|---|
| ≥ 85% | Enforce | Auto-execute | Suggest |
| 65–84% | Suggest | Suggest | Suggest |
| < 65% | Defer | Defer | Defer |
Safe tools are read-only, non-destructive operations. Unsafe tools (shell exec, file write, network) are never auto-executed.
Active Mode Threshold
The Quartermaster requires 50 observations before entering active prediction mode. Before this threshold, it operates in learning-only mode — collecting data, computing signal baselines, but not making predictions.
Reinforcement Learning
After each prediction, correctness is evaluated:
- Reward (EMA α = 0.15): Correct signal weights increase
- Punishment (EMA α = 0.25): Incorrect weights decrease faster
- Convergence: Weights stabilize after ~200–500 observations
Context Fingerprinting
The contexts.ts module generates 12-feature fingerprints from conversation context for cosine similarity matching of similar situations. Features include model distribution, message length, token composition, and task category indicators.
Database Schema
The Quartermaster persists state in SQLite tables:
qm_patterns— Learned prediction patternsqm_decisions— Historical predictions and outcomesqm_session_state— Per-session quartermaster stateqm_model_stats— Aggregated model statisticsqm_weights— Current signal weight values
Observability
The Quartermaster emits:
- Lens events: Every prediction, decision evaluation, weight update, and pattern learned
- Prometheus metrics: Prediction accuracy, confidence distribution, signal weights, mode changes
CLI Interface
See cortex qm and cortex mqm for the command-line interface.
See also: Model Router