Observability

Cortex includes an in-process observability layer with a Prometheus-compatible metrics registry and a distributed tracing system supporting multiple backends and OTLP export.

Architecture

┌──────────────────────────────────────────────────────────┐
│                    Observability                           │
│                                                           │
│  Metrics Registry                    Tracing System       │
│  ┌────────────────────┐              ┌──────────────┐    │
│  │ Counters           │              │ Span Creation │    │
│  │ Gauges             │              │ Attributes    │    │
│  │ Histograms         │              │ Status Codes  │    │
│  └────────┬───────────┘              └──────┬───────┘    │
│           │                                 │             │
│           ▼                                 ▼             │
│  ┌────────────────────┐              ┌──────────────┐    │
│  │ /metrics endpoint  │              │ Backends:     │    │
│  │ (Prometheus text)  │              │  - lens       │    │
│  └────────────────────┘              │  - stdout     │    │
│                                       │  - otlp       │    │
│                                       │  - none       │    │
│                                       └──────────────┘    │
└──────────────────────────────────────────────────────────┘

Metrics Registry

In-process Prometheus-compatible registry with three metric types:

TypePurposeExample
CounterMonotonically increasing countAgent turns, tool calls, errors
GaugePoint-in-time valueCPU %, memory MB, uptime seconds
HistogramDistribution of valuesTurn duration, token I/O, cost per turn

Pre-Registered Metrics

MetricTypeLabels
agent_turns_totalCounterprovider, model
agent_turn_duration_secondsHistogramprovider, model
token_input_totalCounterprovider, model
token_output_totalCounterprovider, model
cost_usd_totalCounterprovider, model
tool_calls_totalCountertool, status
errors_totalCountertype
validator_intents_totalCounterstatus (approved/rejected)
scheduler_jobs_totalCounterstatus
memory_consolidations_totalCountertier
cpu_percentGauge
memory_mbGauge
uptime_secondsGauge
qm_predictions_totalCounterconfidence_level
qm_accuracyGaugesignal
mqm_weightsGaugesignal

Distributed Tracing

The tracing system creates spans throughout the agent lifecycle:

Span NameParentAttributes
agent_turnrootsession_id, turn_id, model
memory_retrievalagent_turntier, result_count, duration_ms
llm_callagent_turnprovider, model, tokens_in, tokens_out
tool_executionagent_turntool_name, approved, duration_ms
reflectionagent_turnpatterns_found
quartermaster_predictagent_turnconfidence, signal_scores

Backend Configuration

{
  "tracing": {
    "backend": "otlp",
    "otlpEndpoint": "https://collector.example.com:4318/v1/traces"
  }
}

Supported backends:

  • lens: Store traces in lens.db audit log (default)
  • stdout: Print spans to stdout for debugging
  • otlp: Export to OpenTelemetry collector
  • none: Disable tracing entirely

See also: Databases, Quartermaster