Model Router (RouteLLM)

The model router provides intelligent LLM provider selection through a cascading confidence-based routing system. It automatically escalates to more capable models when the current model's confidence is low.

CascadeRouter

CascadeRouter wraps multiple providers in a cascade chain. On each call:

1. Tries first provider in cascade (cheapest)
2. Calls estimateConfidence(text) — heuristic based on hedging language
3. If confidence < threshold → tries next provider in cascade
4. Returns last result if all providers exhausted

This cost-optimized approach ensures simple queries use cheaper models while complex reasoning escalates to more capable ones.

Configuration

{
  "router": {
    "enabled": true,
    "confidenceThreshold": 0.7,
    "cascade": [
      { "provider": "ollama", "model": "llama3.2:3b" },
      { "provider": "ollama", "model": "llama3.1:8b" },
      { "provider": "anthropic", "model": "claude-sonnet-4-20250514" }
    ]
  }
}
OptionDefaultDescription
enabledfalseEnable or disable the router
confidenceThreshold0.7Minimum confidence score to accept a response (0.0–1.0)
cascade[]Ordered list of provider+model pairs to try

Supported Providers

ProviderKindDefault Model
Anthropicanthropicclaude-sonnet-4-20250514
OpenAIopenaigpt-4o
Google Geminigooglegemini-2.0-flash
Mistralmistralmistral-large-latest
Groqgroqllama-3.3-70b-versatile
DeepSeekdeepseekdeepseek-chat
OpenRouteropenrouterroutes to 200+ models
xAI (Grok)xaigrok-2-latest
Together AItogetherLlama-3.3-70B-Instruct-Turbo
AWS BedrockbedrockClaude/Llama/Titan
Coherecoherecommand-r-plus
Ollamaollamallama3.2 (local)

Provider Interface

All providers implement the LLMProvider interface:

interface LLMProvider {
  readonly name: string;
  readonly defaultModel: string;
  complete(options: CompletionOptions): Promise<CompletionResult>;
  stream(options: CompletionOptions): AsyncIterable<CompletionChunk>;
}

Features

  • Multi-Provider: 12+ LLM providers through a unified interface
  • Cascading Router: Cheapest model first, escalate on low confidence
  • Failover: Automatic fallback to next provider on errors
  • Cost Optimization: Route simple queries to cheaper/ local models
  • Streaming: All providers support streaming responses