Model Router (RouteLLM)
The model router provides intelligent LLM provider selection through a cascading confidence-based routing system. It automatically escalates to more capable models when the current model's confidence is low.
CascadeRouter
CascadeRouter wraps multiple providers in a cascade chain. On each call:
1. Tries first provider in cascade (cheapest)
2. Calls estimateConfidence(text) — heuristic based on hedging language
3. If confidence < threshold → tries next provider in cascade
4. Returns last result if all providers exhausted
This cost-optimized approach ensures simple queries use cheaper models while complex reasoning escalates to more capable ones.
Configuration
{
"router": {
"enabled": true,
"confidenceThreshold": 0.7,
"cascade": [
{ "provider": "ollama", "model": "llama3.2:3b" },
{ "provider": "ollama", "model": "llama3.1:8b" },
{ "provider": "anthropic", "model": "claude-sonnet-4-20250514" }
]
}
}
| Option | Default | Description |
|---|---|---|
enabled | false | Enable or disable the router |
confidenceThreshold | 0.7 | Minimum confidence score to accept a response (0.0–1.0) |
cascade | [] | Ordered list of provider+model pairs to try |
Supported Providers
| Provider | Kind | Default Model |
|---|---|---|
| Anthropic | anthropic | claude-sonnet-4-20250514 |
| OpenAI | openai | gpt-4o |
| Google Gemini | google | gemini-2.0-flash |
| Mistral | mistral | mistral-large-latest |
| Groq | groq | llama-3.3-70b-versatile |
| DeepSeek | deepseek | deepseek-chat |
| OpenRouter | openrouter | routes to 200+ models |
| xAI (Grok) | xai | grok-2-latest |
| Together AI | together | Llama-3.3-70B-Instruct-Turbo |
| AWS Bedrock | bedrock | Claude/Llama/Titan |
| Cohere | cohere | command-r-plus |
| Ollama | ollama | llama3.2 (local) |
Provider Interface
All providers implement the LLMProvider interface:
interface LLMProvider {
readonly name: string;
readonly defaultModel: string;
complete(options: CompletionOptions): Promise<CompletionResult>;
stream(options: CompletionOptions): AsyncIterable<CompletionChunk>;
}
Features
- Multi-Provider: 12+ LLM providers through a unified interface
- Cascading Router: Cheapest model first, escalate on low confidence
- Failover: Automatic fallback to next provider on errors
- Cost Optimization: Route simple queries to cheaper/ local models
- Streaming: All providers support streaming responses