Roadmap

What we’re building.

Hand-curated. Updated as work moves. Items here are commitments — when one ships it moves to the changelog, not down the page.

Want to influence what’s next? Email info@aldo.tech. Customer pulls move things.

Now

8 items

In flight this week. Either nearly done or actively coded against.

this week
openai-compat adapter — downgrade `response_format: json_object` to `text` when target rejects it
platform
After the local-discovery + harness-exit fixes shipped, re-running the agency dry-run against LM Studio surfaced the next layer: the openai-compat adapter maps `decoding.mode: json` (in the agency YAMLs) to `response_format: { type: 'json_object' }`. OpenAI accepts that; LM Studio's stricter spec only accepts `'json_schema'` or `'text'`. Three paths: (a) per-provider response_format mapping (cleanest, half-day), (b) author local-friendly agency variants with `decoding.mode: free`, (c) per-provider config knob in the catalog YAML. After (a), qwen3.x / deepseek-r1 can complete a full agency cascade against LM Studio at $0.
this week
preTool / postTool hooks fire from inside the engine dispatch loop
platform
The Wave-CLI hooks system loads `~/.aldo/hooks.json` + `<workspace>/.aldo/hooks.json` and fires preRun / postRun around every TUI turn. preTool and postTool entries are loaded but don't fire yet — that needs a hook point inside the engine's tool-dispatch loop so we can shell out before/after every tool call without forcing every caller to instrument. The lib + settings shape are stable; this is a one-engine-PR change.
this week
Local-model tool-use coaching for thinking-style models
platform
qwen3.6 / DeepSeek-R1 / similar thinking models reason about which tools to call and then emit the calls as prose instead of `tool_call` deltas. The aldo CLI infrastructure routes fine; the system prompt + `tool_choice: "auto"` flag in the openai-compat adapter doesn't reliably nudge them to emit structured calls. Two-day fix: stronger system prompt for the iterative loop + an adapter knob for `tool_choice: "required"` when the agent spec opts in. Captured during the Wave-CLI dogfood against LM Studio.
this week
Customer engagement UI — milestones, sign-off, comments
web
The Wave-Agency push (2026-05-05) shipped the engagement-surface API: /v1/engagements, milestones with sign-off + reject + reason captured, threaded comments in three kinds (comment / change_request / architecture_decision), all tenant-scoped. The customer-facing pages (/engagements list, /engagements/[slug] detail with milestone timeline + comment thread + sign-off button) are the natural follow-up — purely frontend, the wire surface is complete.
this week
In-flight termination on tenant budget-cap crossing
platform
Wave-Agency landed the engagement-level USD cap at the POST /v1/runs gate (HTTP 402 tenant_budget_exceeded). The next chunk wires the same check inside the iterative loop’s pre-step termination predicate so a stuck run also stops mid-cycle, plus the supervisor pre-spawn hook so the composite tree halts before fanning out children.
this week
live:network harness instrumentation — fast-fail on per-stage progress
eval
The Wave-Agency dogfood smoke surfaced a real signal: the live:network run wedges between bootstrap and runtime.runAgent on a fresh disposable worktree (process at 0% CPU, no Ollama traffic, no .aldo-memory directory). The harness needs per-stage instrumentation + fast-fail timeouts so a single dispatch reports "stuck in stage X for 60s" instead of going silent. After that, the dogfood-against-local-Ollama story turns up either nothing (✅) or a real punch list ($0 of inference, either way).
this week
mcp.aldo.tech hosted MCP endpoint — DNS + edge route
ops
The Streamable-HTTP MCP server (@aldo-ai/mcp-platform) is built, tested, container ready. Pure ops follow-up: DNS A record, edge nginx route to the new container, TLS via the existing certbot path, docker-compose entry. Once live, ChatGPT custom GPTs / Cursor / any HTTP-only MCP client can drive ALDO directly.
this week
Publish Python + TypeScript SDKs and the VS Code extension
sdk
All three are dry-run green; the release workflows have confirm-version guards. Awaiting PyPI / npm / VSCE tokens + the VS Code Marketplace publisher account, then the workflows fire and the public install paths light up.

6 items

Confirmed direction. Picked up the moment Now clears.

1–2 weeks
Stripe live billing — flip pricing CTAs to real checkout
platform
Backend is 100% wired (webhook switchboard, subscription store, trial-gate, customer portal). Five env vars away from live: STRIPE_SECRET_KEY, STRIPE_WEBHOOK_SIGNING_SECRET, STRIPE_PRICE_SOLO, STRIPE_PRICE_TEAM, STRIPE_BILLING_PORTAL_RETURN_URL. Push secrets + redeploy and the pricing page is chargeable.
1–2 weeks
Engine resolve-from-store of agent.promptRef
platform
Wave-4 shipped prompts as first-class data with version history. The wire shape + UI are done; the engine still inlines prompt text. One-file follow-up in @aldo-ai/registry to read promptRef → fetch from prompts-store → cache per-run.
1–2 weeks
Production PromptRunner via gateway
platform
Today /v1/prompts/:id/test returns a deterministic stub. Wiring the real gateway through (capability routing, privacy enforcement, telemetry into usage_records) lights up the prompt playground end-to-end.
1–2 weeks
Git OAuth-app installation (GitHub + GitLab)
mcp
The wave-3 git integration ships with PAT auth — paste a PAT into the connect form. OAuth apps remove that step entirely: customers click "Install ALDO" on GitHub, repos are connected via the app installation token, no PAT minting required.
1–2 weeks
OCI Helm chart publish workflow
ops
charts/aldo-ai is in-repo, helm-lint clean, kubeconform 37/37 against k8s 1.31. Operators self-hosting today clone the repo. The publish workflow pushes the chart to ghcr.io so `helm install oci://ghcr.io/aldo-tech-labs/charts/aldo-ai` works, and the chart README on ArtifactHub becomes the docs entry point.
2 weeks
Background scanner picks up inputs (today: re-spawns empty)
platform
The scanner that recovers orphaned queued runs spawns the engine with empty inputs because runs.inputs_jsonb does not yet exist. New migration adds the column; POST /v1/runs persists the inputs alongside the queued row; scanner reads them back. Closes the only correctness gap in the recovery path.

Later

6 items

Committed. Sequenced behind Next based on customer pulls + dependencies.

1–2 quarters
SOC 2 Type 1 — auditor + evidence collection scaffolding
security
Multi-month elapsed — months of evidence + an auditor. Engineering posture is already tight (privacy-tier router, audit log, encrypted secrets, runbook, retention enforcement). The auditor relationship + Vanta-shape evidence platform is the next slice.
1 quarter
SSO / SAML on /login — mid-market unblock
security
Email + password is fine for solo + tiny team. The first 5+ seat customer needs OIDC + SAML. Identity-store schema, SCIM provisioning, and the /login UX flip are the three pieces.
1 quarter
Per-row USD cost in eval-playground
platform
The playground table reserves the cost column today but reports honest 0 because the gateway does not yet surface per-call USD on the response. Gateway change, not playground change.
when first tenant hits the threshold
Spend dashboard SQL pivot
platform
JS-side bucket fold beats 3 round-trips on pglite up to ~1M usage rows in a 90-day window. Once a tenant exceeds that, pivot to date_trunc + GROUP BY in Postgres. Documented at the bottom of routes/spend.ts.
1 quarter
Real-cluster Helm e2e (kind in CI + per-cloud nightly)
platform
The chart lints + templates + kubeconforms green offline. To prevent a regression that lints but breaks on `helm install` against a real apiserver, add a kind-in-CI job and per-cloud (EKS / GKE / AKS) nightlies.
1 quarter
Bidirectional git sync — write agent edits back via PR
mcp
Today the wave-3 git integration is read-only: changes flow repo → ALDO. Bidirectional means an edit to an agent in /agents/[name] opens a PR in the connected repo. Net-new wedge — combined with the read-only sync, the repo becomes the source of truth and ALDO is the IDE.

Maybe

3 items

Conditional. Lands only when a specific signal arrives.

—
EU data residency — second region + tenant routing
platform
Quarter-scale build. Only worth it for a confirmed EU customer who would not sign without it. Today's posture (single-region) is a procurement question we answer honestly; the build is a question we answer with cash on the table.
—
Long-tail observability exporters (Datadog, Grafana, OTLP, Slack)
platform
Build 2–3 only when a named customer asks. The catalog approach is a procurement-checklist trap; we would rather ship the two integrations a real customer needs deeply than thirty integrations no one uses.
—
Drag-drop visual workflow builder
web
Explicit non-goal per the platform invariants — the wedge is "agents are data" (YAML + git). Could become a yes if a customer with non-engineer authors ever needs it; would ship as one-way export to YAML so the source of truth stays declarative.

End of 2027 — 1.0

vision

What ALDO AI looks like at the end of 2027. Not a list of features — the shape of the product when the next 18 months land. Subject to change as customers pull us in directions we haven’t imagined yet, but this is the bet.

Hire-grade
Hiring an agent feels like hiring a contractor
A non-engineer drops a brief into ALDO; the platform resolves the right team, hands them the right tools, runs the work with the privacy posture the org needs, and reports back with citations + cost. The agent registry, the eval harness, the privacy router, the spend dashboard — all of it disappears into one workflow: scope → run → review → ship. The reference agency we run on internally is the worked example everyone forks.
Local 1st-class
Local frontier-class is the default for sensitive work
By end-2027 a 70B-class open model on a developer laptop or a small on-prem box matches frontier on most non-research tasks. ALDO routes to it by default for privacy_tier=sensitive, and the eval harness proves on every promotion that the local route did not regress. Cloud is the surge buffer, not the substrate.
Repo as truth
Bidirectional git sync — the repo is the agent IDE
Agents live in a customer’s monorepo as YAML + system prompts; ALDO is the runtime + the review surface. PR opens with eval scores attached; merge promotes; rollback is `git revert`. No "ALDO console drift vs production" — the console IS the production view of the repo. Composes with every CI/CD pipeline that exists.
Trust
SOC 2 Type 2, HIPAA, EU residency, FedRAMP Moderate in flight
The compliance posture caught up to the engineering posture (which has always been ahead). Procurement reviews close in days, not quarters. The privacy-tier router is auditable end-to-end and survives every red-team / pen-test cycle.
Distribution
mcp.aldo.tech is the way most clients reach ALDO
Hosted MCP endpoint with per-tenant auth, observability, and rate limits. Claude Desktop / Claude Code / Cursor / ChatGPT GPTs / Continue / Zed / Windsurf / VS Code all drop one config block and have the entire ALDO surface (agents, runs, datasets, evals) at their fingertips. The platform spreads through the protocol it was built around, not through SDKs we have to ship one-by-one.
Self-host
Helm chart on ArtifactHub; Terraform modules per cloud
A regulated customer goes from "we want this" to a running internal ALDO in under 4 hours with our docs + their existing k8s. The chart is real-cluster validated nightly across EKS / GKE / AKS / kind; Terraform modules cover IRSA / Workload Identity bindings. The "Enterprise — packaged build" line on the pricing page is a download URL, not marketing copy.
Observability
Trace search rivals Datadog APM for agent runs
Span-level filters, latency + cost heatmaps, OTLP export to whatever the customer already has. The flame graph drills into the model call, the tool call, the sub-agent, the diff against the previous run. A platform engineer who has never seen ALDO can debug a customer’s agent regression in 5 minutes.
Eval gate
Eval-gated promotion the industry copies
The same rubric that scored an agent in the playground gates its promotion to production. Customers ship agents like services: every change has a test, every regression blocks the deploy, every rollback restores the prior known-good. Adoption of the eval-gated promotion pattern is itself one of our best growth channels.
Customers
20–50 paying teams; 3–5 lighthouse design partners
Mix of small teams using ALDO Cloud and regulated orgs running self-host. Two named lighthouse partners are public references; three more are private. ARR > $2M with healthy gross margins. We grew without raising; if we raise, it’s for distribution, not survival.

If we land 70% of this, we’ve built the first agent platform a real engineering org would standardise on instead of patching together LangSmith + Braintrust + a framework + a vendor SLA every quarter.

Explicitly not doing

Listing these here so a prospect can disqualify us fast — your time matters more than our pipeline.

Hyperscaler-shape managed cloud (Bedrock / Vertex / Foundry)
Wrong moat. Bedrock and friends own enterprise procurement + IAM + 15+ compliance certs each — we cannot beat them at their own game and we should not try.
LangChain-style framework
We are framework-agnostic by design. The platform invariant: every code path goes through the gateway by capability + privacy + cost. Adding a framework above that would re-introduce the lock-in we exist to prevent.
Vibe-coding studio
Other vendors say "not production-ready" out loud. We say the opposite: every primitive (specs, runs, evals, replays) is engineered to ship in production on day one.

Now

openai-compat adapter — downgrade `response_format: json_object` to `text` when target rejects it

preTool / postTool hooks fire from inside the engine dispatch loop

Local-model tool-use coaching for thinking-style models

Customer engagement UI — milestones, sign-off, comments

In-flight termination on tenant budget-cap crossing

live:network harness instrumentation — fast-fail on per-stage progress

mcp.aldo.tech hosted MCP endpoint — DNS + edge route

Publish Python + TypeScript SDKs and the VS Code extension

Next

Stripe live billing — flip pricing CTAs to real checkout

Engine resolve-from-store of agent.promptRef

Production PromptRunner via gateway

Git OAuth-app installation (GitHub + GitLab)

OCI Helm chart publish workflow

Background scanner picks up inputs (today: re-spawns empty)

Later

SOC 2 Type 1 — auditor + evidence collection scaffolding

SSO / SAML on /login — mid-market unblock

Per-row USD cost in eval-playground

Spend dashboard SQL pivot

Real-cluster Helm e2e (kind in CI + per-cloud nightly)

Bidirectional git sync — write agent edits back via PR

Maybe

EU data residency — second region + tenant routing

Long-tail observability exporters (Datadog, Grafana, OTLP, Slack)

Drag-drop visual workflow builder

End of 2027 — 1.0

Hiring an agent feels like hiring a contractor

Local frontier-class is the default for sensitive work

Bidirectional git sync — the repo is the agent IDE

SOC 2 Type 2, HIPAA, EU residency, FedRAMP Moderate in flight

mcp.aldo.tech is the way most clients reach ALDO

Helm chart on ArtifactHub; Terraform modules per cloud

Trace search rivals Datadog APM for agent runs

Eval-gated promotion the industry copies

20–50 paying teams; 3–5 lighthouse design partners

Explicitly not doing