WebSocket Protocol
The CortexPrism WebSocket provides real-time streaming chat, audio communication, file upload, and tool call reasoning inspection.
Connection
ws://127.0.0.1:3000/ws # Client WebSocket
ws://127.0.0.1:3000/ws/node # Node WebSocket (Hub ↔ Node)
Authentication: when webAuth.requireAuth is enabled, the /ws endpoint checks session cookies before upgrading connections.
Client → Server Messages
{ "type": "chat", "message": "Hello", "sessionId": "sess_abc123", "files": [...] }
{ "type": "ping" }
{ "type": "new_session" }
{ "type": "select_agent", "agentId": "agent-1" }
{ "type": "audio_chunk", "data": "<base64>" }
{ "type": "audio_end" }
{ "type": "speak", "text": "Hello world" }
Chat Message Fields
| Field | Type | Required | Description |
|---|---|---|---|
type | "chat" | Yes | Message type |
message | string | Yes | User message text |
sessionId | string | No | Resume existing session |
files | array | No | Uploaded files [{filename, mimeType, data (base64)}] |
File Upload
Files are received as base64 over WebSocket alongside chat messages. Saved to working directory and agent workspace. PDFs get text auto-extracted. Images included as multimodal content blocks for supported providers.
Server → Client Messages
{ "type": "connected" }
{ "type": "session", "sessionId": "sess_abc123" }
{ "type": "start" }
{ "type": "chunk", "delta": "Hello" }
{ "type": "reasoning", "content": "Agent is considering..." }
{ "type": "tool_call", "tool": "web_search", "args": {"query": "..."} }
{ "type": "tool_result", "tool": "web_search", "result": "..." }
{ "type": "done", "tokensIn": 100, "tokensOut": 50, "costUsd": 0.001, "durationMs": 800 }
{ "type": "error", "error": "Something went wrong" }
{ "type": "pong" }
{ "type": "audio", "data": "<base64 mp3>", "format": "mp3" }
{ "type": "voice_state", "listening": true, "enabled": true }
{ "type": "file_change", "path": "/workspace/file.ts" }
Done Message Fields
| Field | Type | Description |
|---|---|---|
tokensIn | number | Input tokens used |
tokensOut | number | Output tokens generated |
costUsd | number | Estimated cost in USD |
durationMs | number | Total turn duration |
modelMode | 'manual' | 'auto' | Model selection mode |
resolvedProvider | string | LLM provider used |
resolvedModel | string | LLM model used |
autoFallback | boolean | Whether Auto mode fell back to heuristic |
autoFallbackReason | string | Reason for fallback |
Reasoning Message
The reasoning message type delivers the agent's internal decision-making process as a separate stream. In the Web UI, this appears in a collapsible panel toggled by a 🔬 Reasoning button.
Voice/Audio Messages
- Client → Server:
audio_chunk/audio_end - Server → Client:
speak/audio/voice_state
Transcribed speech is dispatched directly into the agent loop as a user message. Auto-TTS synthesizes agent responses to audio before the done signal.
Session Resume
Include an existing sessionId in a chat message to resume across WebSocket reconnects:
{ "type": "chat", "message": "Continue our conversation", "sessionId": "sess_abc123" }
Protocol Notes
- Tool call XML (
<tool_call>) and bare JSON are stripped from chunks using a brace-depth walker algorithm - Streaming is buffered internally when tools are registered; only clean prose reaches the client
- Tool calls split across multiple WebSocket chunks are properly buffered and stripped
- The
file_changeevent broadcasts on file edits, renames, and deletes - WebSocket connections are upgraded from standard HTTP at
/ws - Node WebSocket at
/ws/nodeuses token-based registration with heartbeat/ACK protocol
See Also
- Agent Loop — How messages flow through the execution engine
- Voice Pipeline — Speech-to-text and text-to-speech
- REST API — Full API reference