CortexPrism v0.51.0 — Agent Autonomy: Runtime Tool Forging, Multi-Agent Orchestration, HEXACO Personalities

CortexPrism v0.51.0 is here — the agent autonomy release that fundamentally transforms what agents can do. Instead of being passive executors of pre-defined tools, agents can now create their own tools at runtime, orchestrate complex multi-agent workflows, express distinct personalities through a scientifically-grounded HEXACO system, and participate in standardized memory benchmarking. Plus 10 built-in specialist agents, checkpoint time-travel UI, and a streamlined navigation experience.

Runtime Tool Forging — Agents Create Tools

The most transformative feature in v0.51.0 is runtime tool forging. Agents can now create, test, and export custom tools on-demand through four new built-in tools:

tool_forge takes a name, description, and TypeScript code. It runs a static safety scan against unsafe patterns, optionally calls an LLM security judge, executes pure-compute code in a Deno Worker (sandboxed) or shell-touching code in the Docker sandbox, and registers the result in a session-scoped forged-tool registry.
forged_call invokes a previously forged tool by name with arbitrary arguments.
tool_export promotes a forged tool to the persistent skills system (lifecycle: candidate) so it survives across sessions.
tool_list_forged lists all forged tools registered in the current session.

This enables adaptive tooling: when an agent encounters a task that no existing tool handles well, it can create a specialized tool for that task, test it, and if successful, export it as a reusable skill. The system includes comprehensive safety measures: static pattern scanning, optional LLM security review, sandboxed execution, and session-scoped isolation.

Multi-Agent Orchestration — 6 Composable Strategies

A single orchestrate tool now exposes six composable multi-agent execution strategies, all backed by the robust spawnSubAgent system:

Sequential chains agents; each receives the previous agent's output as context. Perfect for multi-step workflows where each step builds on the previous.
Parallel runs agents concurrently via Promise.allSettled; a synthesiser agent merges outputs. Ideal for gathering multiple perspectives or approaches simultaneously.
Debate assigns N agents to argue positions for R rounds; an impartial judge synthesises the final answer. Excellent for exploring multiple viewpoints and reaching consensus.
Review-loop has a writer agent draft and a reviewer agent critique, iterating up to max_iterations times until the reviewer emits an approval keyword. Perfect for iterative refinement.
Hierarchical uses a coordinator agent to decompose the task, worker agents execute sub-tasks in parallel, and the coordinator synthesises results. Great for complex project coordination.
Graph accepts user-defined DAGs of {id, task, dependsOn[]} nodes with topological execution and dependency context injection. Maximum flexibility for custom workflows.

HEXACO Personality System

Agents can now be configured with a six-factor HEXACO personality (honesty, emotionality, extraversion, agreeableness, conscientiousness, openness — each ∈ [0, 1]). This scientifically-grounded personality model drives:

System prompt injection — buildPersonalityPrompt() generates a natural-language paragraph describing the agent's voice, honesty, emotional tone, extraversion, agreeableness, conscientiousness, and openness, prepended to the system prompt on every turn.
Memory retrieval bias — getMemoryBiasWeights() returns per-tier multipliers (episodic, semantic, procedural, preference) and BM25/vector balance weights derived from personality scores.
Response style hints — buildResponseStyleHints() produces brief post-processing nudges (structured output, warmth, perspective acknowledgement, creative alternatives).
MQM routing hints — getMqmPersonalityHints() returns accuracyWeight, creativityWeight, and preferFast signals for the Model Quartermaster.

The personality field is optional on AgentConfig; absent or neutral scores (0.5) produce no change in behavior.

Memory Benchmark Runner — LongMemEval Compatible

Memory systems need standardized evaluation. v0.51.0 introduces a comprehensive benchmarking subsystem:

Core runner (src/eval/memory-bench.ts) supports configurable concurrency, token-overlap + Jaccard scoring, per-category aggregation, and JSON persistence to ~/.cortex/data/memory_bench_results.json and memory_bench_history.json.
CLI command cortex eval memory supports --suite <file>, --sample <n>, --full, and --json flags.
REST API — GET /api/eval/memory/results, GET /api/eval/memory/history, POST /api/eval/memory/run.
Web UI — new Memory Benchmark page with summary stat cards, per-category accuracy bar chart, per-question result table, and historical run trend table. One-click ▶ Run Benchmark button triggers a live run via the API.
CI workflow — .github/workflows/memory-bench.yml runs the benchmark weekly (Monday 06:00 UTC) and on manual dispatch; results are uploaded as a GitHub Actions artifact and summarised in the job step summary.

The benchmark is compatible with LongMemEval standards, enabling cross-system comparison of memory performance.

10 Built-in Agents (5 New, 5 Refined)

The agent roster now ships with 10 selectable built-in agents. Five new specialist agents join the existing five:

Writer ✍️ — technical documentation, changelogs, READMEs, API references
DevOps 🚀 — Docker, Kubernetes, Terraform, CI/CD pipelines
Security 🔐 — OWASP Top 10 auditing, CVE scanning, compliance review (read-only)
Code Reviewer 👁️ — structured BLOCKER/SUGGESTION/NITPICK/QUESTION review format (read-only)
QA / Tester 🧪 — test generation, coverage analysis, regression discipline

All five existing agents (Assistant, Developer, Researcher, Architect, Analyst) received deep soul rewrites adding Capabilities, Guardrails, and Limitations sections, explicit sub-agent delegation hints, and improved output format specs.

Checkpoint Time-Travel UI

The Memori page (/memori) now renders a full two-panel timeline: a session-grouped checkpoint list on the left and a rich detail view on the right. Each checkpoint shows turn number, goals, message count, tool calls, and workspace snapshot. Two action buttons — Resume here (restore the checkpoint into the current session) and Branch from here (fork into a new child session) — are available on every checkpoint. This makes it easy to experiment with different approaches from any point in a conversation.

The UI underwent significant consolidation to reduce navigation complexity:

Sandbox now includes a Code Runner tab (previously standalone coderunner page)
Remote & Computer merges the former remote (Remote Agents) and computer (Computer Use) pages into one page with two tabs
MCP merges mcp (Connections) and mcp-gateway (Gateway) into one page with two tabs
System Health (formerly Daemons) merges daemon process monitoring and OS health metrics into one page with two tabs
Automation expands to a 5-tab hub: Hooks, Triggers, Workflows, Jobs, and Eval — replacing four separate nav entries
Extensions gains a Panels tab, absorbing the standalone Plugin Panels page
Activity (Lens) moved from the Knowledge category to System, where audit/observability tooling belongs

Plugin System Enhancements

The plugin system received major improvements:

Extensions top-nav category — plugins now have a dedicated Extensions top-nav tab (sixth tab in the header). Plugin-contributed panels appear as first-class sub-nav items under Extensions.
Plugin sidebar slot injection — plugins declaring ui:panel now have their panels registered in the ui-slots registry at load time. A new GET /api/plugins/slots endpoint exposes live slot registrations. Sidebar plugins are clickable and open in inline modal iframes.
Plugin middleware pipeline hooks — ESM plugins can now export middlewarePre and middlewarePost functions. When loaded, plugins declaring these capabilities have their functions automatically registered as pre-tool/post-tool pipeline hooks.
Plugin event bus wiring — the plugin event bus now receives live agent lifecycle events: agent:turn-start, tool:pre-execute, tool:post-execute, and agent:turn-end.

Technical Excellence

CortexPrism maintains its commitment to technical excellence:

Deno 2.x strict TypeScript throughout the codebase — single binary, no Docker required
SQLite (WAL mode) via libSQL for reliable data persistence
6 workspace packages with 41 pure TypeScript contract interfaces in a clean dependency graph
24 LLM providers with a unified streaming interface
5-tier persistent memory with hybrid search, automatic learning, and health monitoring
Rigorous security — Parallax policy validator + LLM supervisor + 18 recently resolved security issues
Zero telemetry — everything runs on your hardware

Get Started

Ready to experience agent autonomy? Installation is simple:

# Install
curl -fsSL https://cortexprism.io/install.sh | bash

# Setup and start
cortex setup
cortex serve

# Open http://localhost:3000

Already running? Upgrade in place:

cortex self update

The project is Apache 2.0 licensed, fully open source, and has zero telemetry. Everything runs on your hardware.

GitHub: github.com/CortexPrism/cortex Changelog: CHANGELOG.md

Built with Deno. 6 packages. Agent autonomy. Runtime tool forging. Multi-agent orchestration. HEXACO personalities. Zero telemetry.