Model Quartermaster

The Model Quartermaster (MQM) is an adaptive model-selection engine that observes LLM call patterns across sessions, computes weighted signal scores, and predicts the optimal model to use. A legacy Quartermaster (QM) variant also provides 5-signal tool prediction.

Architecture

┌──────────────────────────────────────────────────────────┐
│                   Quartermaster System                     │
│                                                           │
│  Model Calls ──► 6 Signal Computers ──► Fusion            │
│  (trajectory, episodic, cost, quality, historical,        │
│   reflection)                                             │
│                              │                            │
│                              ▼                            │
│                     Weighted Prediction                    │
│                     (automate | suggest | defer)          │
│                              │                            │
│                              ▼                            │
│  Correct? ──► Reinforcement Learning (EMA weight update)  │
└──────────────────────────────────────────────────────────┘

Six Prediction Signals

Signal	Source	Weight
Trajectory	Recent model usage patterns and sequences	Dynamic (EMA)
Episodic	Similar conversation context cosine matching	Dynamic (EMA)
Historical	Past performance data for task categories	Dynamic (EMA)
Cost	Cost efficiency optimization across models	Dynamic (EMA)
Quality	Expected quality based on model capabilities	Dynamic (EMA)
Reflection	Per-turn reflection feedback integration	Dynamic (EMA)

Prediction Confidence Levels

Confidence	Action	Safe Tools	Unsafe Tools
≥ 85%	Enforce	Auto-execute	Suggest
65–84%	Suggest	Suggest	Suggest
< 65%	Defer	Defer	Defer

Safe tools are read-only, non-destructive operations. Unsafe tools (shell exec, file write, network) are never auto-executed.

Active Mode Threshold

The Quartermaster requires 50 observations before entering active prediction mode. Before this threshold, it operates in learning-only mode — collecting data, computing signal baselines, but not making predictions.

Reinforcement Learning

After each prediction, correctness is evaluated:

Reward (EMA α = 0.15): Correct signal weights increase
Punishment (EMA α = 0.25): Incorrect weights decrease faster
Convergence: Weights stabilize after ~200–500 observations

Context Fingerprinting

The contexts.ts module generates 12-feature fingerprints from conversation context for cosine similarity matching of similar situations. Features include model distribution, message length, token composition, and task category indicators.

Database Schema

The Quartermaster persists state in SQLite tables:

qm_patterns — Learned prediction patterns
qm_decisions — Historical predictions and outcomes
qm_session_state — Per-session quartermaster state
qm_model_stats — Aggregated model statistics
qm_weights — Current signal weight values

Observability

The Quartermaster emits:

Lens events: Every prediction, decision evaluation, weight update, and pattern learned
Prometheus metrics: Prediction accuracy, confidence distribution, signal weights, mode changes

CLI Interface

See cortex qm and cortex mqm for the command-line interface.