Skip to main content

Roadmap

What we’re building.

Hand-curated. Updated as work moves. Items here are commitments — when one ships it moves to the changelog, not down the page.

Want to influence what’s next? Email info@aldo.tech. Customer pulls move things.

Now

8 items

In flight this week. Either nearly done or actively coded against.

  1. this week

    openai-compat adapter — downgrade `response_format: json_object` to `text` when target rejects it

    platform

    After the local-discovery + harness-exit fixes shipped, re-running the agency dry-run against LM Studio surfaced the next layer: the openai-compat adapter maps `decoding.mode: json` (in the agency YAMLs) to `response_format: { type: 'json_object' }`. OpenAI accepts that; LM Studio's stricter spec only accepts `'json_schema'` or `'text'`. Three paths: (a) per-provider response_format mapping (cleanest, half-day), (b) author local-friendly agency variants with `decoding.mode: free`, (c) per-provider config knob in the catalog YAML. After (a), qwen3.x / deepseek-r1 can complete a full agency cascade against LM Studio at $0.

  2. this week

    preTool / postTool hooks fire from inside the engine dispatch loop

    platform

    The Wave-CLI hooks system loads `~/.aldo/hooks.json` + `<workspace>/.aldo/hooks.json` and fires preRun / postRun around every TUI turn. preTool and postTool entries are loaded but don't fire yet — that needs a hook point inside the engine's tool-dispatch loop so we can shell out before/after every tool call without forcing every caller to instrument. The lib + settings shape are stable; this is a one-engine-PR change.

  3. this week

    Local-model tool-use coaching for thinking-style models

    platform

    qwen3.6 / DeepSeek-R1 / similar thinking models reason about which tools to call and then emit the calls as prose instead of `tool_call` deltas. The aldo CLI infrastructure routes fine; the system prompt + `tool_choice: "auto"` flag in the openai-compat adapter doesn't reliably nudge them to emit structured calls. Two-day fix: stronger system prompt for the iterative loop + an adapter knob for `tool_choice: "required"` when the agent spec opts in. Captured during the Wave-CLI dogfood against LM Studio.

  4. this week

    Customer engagement UI — milestones, sign-off, comments

    web

    The Wave-Agency push (2026-05-05) shipped the engagement-surface API: /v1/engagements, milestones with sign-off + reject + reason captured, threaded comments in three kinds (comment / change_request / architecture_decision), all tenant-scoped. The customer-facing pages (/engagements list, /engagements/[slug] detail with milestone timeline + comment thread + sign-off button) are the natural follow-up — purely frontend, the wire surface is complete.

  5. this week

    In-flight termination on tenant budget-cap crossing

    platform

    Wave-Agency landed the engagement-level USD cap at the POST /v1/runs gate (HTTP 402 tenant_budget_exceeded). The next chunk wires the same check inside the iterative loop’s pre-step termination predicate so a stuck run also stops mid-cycle, plus the supervisor pre-spawn hook so the composite tree halts before fanning out children.

  6. this week

    live:network harness instrumentation — fast-fail on per-stage progress

    eval

    The Wave-Agency dogfood smoke surfaced a real signal: the live:network run wedges between bootstrap and runtime.runAgent on a fresh disposable worktree (process at 0% CPU, no Ollama traffic, no .aldo-memory directory). The harness needs per-stage instrumentation + fast-fail timeouts so a single dispatch reports "stuck in stage X for 60s" instead of going silent. After that, the dogfood-against-local-Ollama story turns up either nothing (✅) or a real punch list ($0 of inference, either way).

  7. this week

    mcp.aldo.tech hosted MCP endpoint — DNS + edge route

    ops

    The Streamable-HTTP MCP server (@aldo-ai/mcp-platform) is built, tested, container ready. Pure ops follow-up: DNS A record, edge nginx route to the new container, TLS via the existing certbot path, docker-compose entry. Once live, ChatGPT custom GPTs / Cursor / any HTTP-only MCP client can drive ALDO directly.

  8. this week

    Publish Python + TypeScript SDKs and the VS Code extension

    sdk

    All three are dry-run green; the release workflows have confirm-version guards. Awaiting PyPI / npm / VSCE tokens + the VS Code Marketplace publisher account, then the workflows fire and the public install paths light up.

Next

6 items

Confirmed direction. Picked up the moment Now clears.

  1. 1–2 weeks

    Stripe live billing — flip pricing CTAs to real checkout

    platform

    Backend is 100% wired (webhook switchboard, subscription store, trial-gate, customer portal). Five env vars away from live: STRIPE_SECRET_KEY, STRIPE_WEBHOOK_SIGNING_SECRET, STRIPE_PRICE_SOLO, STRIPE_PRICE_TEAM, STRIPE_BILLING_PORTAL_RETURN_URL. Push secrets + redeploy and the pricing page is chargeable.

  2. 1–2 weeks

    Engine resolve-from-store of agent.promptRef

    platform

    Wave-4 shipped prompts as first-class data with version history. The wire shape + UI are done; the engine still inlines prompt text. One-file follow-up in @aldo-ai/registry to read promptRef → fetch from prompts-store → cache per-run.

  3. 1–2 weeks

    Production PromptRunner via gateway

    platform

    Today /v1/prompts/:id/test returns a deterministic stub. Wiring the real gateway through (capability routing, privacy enforcement, telemetry into usage_records) lights up the prompt playground end-to-end.

  4. 1–2 weeks

    Git OAuth-app installation (GitHub + GitLab)

    mcp

    The wave-3 git integration ships with PAT auth — paste a PAT into the connect form. OAuth apps remove that step entirely: customers click "Install ALDO" on GitHub, repos are connected via the app installation token, no PAT minting required.

  5. 1–2 weeks

    OCI Helm chart publish workflow

    ops

    charts/aldo-ai is in-repo, helm-lint clean, kubeconform 37/37 against k8s 1.31. Operators self-hosting today clone the repo. The publish workflow pushes the chart to ghcr.io so `helm install oci://ghcr.io/aldo-tech-labs/charts/aldo-ai` works, and the chart README on ArtifactHub becomes the docs entry point.

  6. 2 weeks

    Background scanner picks up inputs (today: re-spawns empty)

    platform

    The scanner that recovers orphaned queued runs spawns the engine with empty inputs because runs.inputs_jsonb does not yet exist. New migration adds the column; POST /v1/runs persists the inputs alongside the queued row; scanner reads them back. Closes the only correctness gap in the recovery path.

Later

6 items

Committed. Sequenced behind Next based on customer pulls + dependencies.

  1. 1–2 quarters

    SOC 2 Type 1 — auditor + evidence collection scaffolding

    security

    Multi-month elapsed — months of evidence + an auditor. Engineering posture is already tight (privacy-tier router, audit log, encrypted secrets, runbook, retention enforcement). The auditor relationship + Vanta-shape evidence platform is the next slice.

  2. 1 quarter

    SSO / SAML on /login — mid-market unblock

    security

    Email + password is fine for solo + tiny team. The first 5+ seat customer needs OIDC + SAML. Identity-store schema, SCIM provisioning, and the /login UX flip are the three pieces.

  3. 1 quarter

    Per-row USD cost in eval-playground

    platform

    The playground table reserves the cost column today but reports honest 0 because the gateway does not yet surface per-call USD on the response. Gateway change, not playground change.

  4. when first tenant hits the threshold

    Spend dashboard SQL pivot

    platform

    JS-side bucket fold beats 3 round-trips on pglite up to ~1M usage rows in a 90-day window. Once a tenant exceeds that, pivot to date_trunc + GROUP BY in Postgres. Documented at the bottom of routes/spend.ts.

  5. 1 quarter

    Real-cluster Helm e2e (kind in CI + per-cloud nightly)

    platform

    The chart lints + templates + kubeconforms green offline. To prevent a regression that lints but breaks on `helm install` against a real apiserver, add a kind-in-CI job and per-cloud (EKS / GKE / AKS) nightlies.

  6. 1 quarter

    Bidirectional git sync — write agent edits back via PR

    mcp

    Today the wave-3 git integration is read-only: changes flow repo → ALDO. Bidirectional means an edit to an agent in /agents/[name] opens a PR in the connected repo. Net-new wedge — combined with the read-only sync, the repo becomes the source of truth and ALDO is the IDE.

Maybe

3 items

Conditional. Lands only when a specific signal arrives.

  1. EU data residency — second region + tenant routing

    platform

    Quarter-scale build. Only worth it for a confirmed EU customer who would not sign without it. Today's posture (single-region) is a procurement question we answer honestly; the build is a question we answer with cash on the table.

  2. Long-tail observability exporters (Datadog, Grafana, OTLP, Slack)

    platform

    Build 2–3 only when a named customer asks. The catalog approach is a procurement-checklist trap; we would rather ship the two integrations a real customer needs deeply than thirty integrations no one uses.

  3. Drag-drop visual workflow builder

    web

    Explicit non-goal per the platform invariants — the wedge is "agents are data" (YAML + git). Could become a yes if a customer with non-engineer authors ever needs it; would ship as one-way export to YAML so the source of truth stays declarative.

End of 2027 — 1.0

vision

What ALDO AI looks like at the end of 2027. Not a list of features — the shape of the product when the next 18 months land. Subject to change as customers pull us in directions we haven’t imagined yet, but this is the bet.

  • Hire-grade

    Hiring an agent feels like hiring a contractor

    A non-engineer drops a brief into ALDO; the platform resolves the right team, hands them the right tools, runs the work with the privacy posture the org needs, and reports back with citations + cost. The agent registry, the eval harness, the privacy router, the spend dashboard — all of it disappears into one workflow: scope → run → review → ship. The reference agency we run on internally is the worked example everyone forks.

  • Local 1st-class

    Local frontier-class is the default for sensitive work

    By end-2027 a 70B-class open model on a developer laptop or a small on-prem box matches frontier on most non-research tasks. ALDO routes to it by default for privacy_tier=sensitive, and the eval harness proves on every promotion that the local route did not regress. Cloud is the surge buffer, not the substrate.

  • Repo as truth

    Bidirectional git sync — the repo is the agent IDE

    Agents live in a customer’s monorepo as YAML + system prompts; ALDO is the runtime + the review surface. PR opens with eval scores attached; merge promotes; rollback is `git revert`. No "ALDO console drift vs production" — the console IS the production view of the repo. Composes with every CI/CD pipeline that exists.

  • Trust

    SOC 2 Type 2, HIPAA, EU residency, FedRAMP Moderate in flight

    The compliance posture caught up to the engineering posture (which has always been ahead). Procurement reviews close in days, not quarters. The privacy-tier router is auditable end-to-end and survives every red-team / pen-test cycle.

  • Distribution

    mcp.aldo.tech is the way most clients reach ALDO

    Hosted MCP endpoint with per-tenant auth, observability, and rate limits. Claude Desktop / Claude Code / Cursor / ChatGPT GPTs / Continue / Zed / Windsurf / VS Code all drop one config block and have the entire ALDO surface (agents, runs, datasets, evals) at their fingertips. The platform spreads through the protocol it was built around, not through SDKs we have to ship one-by-one.

  • Self-host

    Helm chart on ArtifactHub; Terraform modules per cloud

    A regulated customer goes from "we want this" to a running internal ALDO in under 4 hours with our docs + their existing k8s. The chart is real-cluster validated nightly across EKS / GKE / AKS / kind; Terraform modules cover IRSA / Workload Identity bindings. The "Enterprise — packaged build" line on the pricing page is a download URL, not marketing copy.

  • Observability

    Trace search rivals Datadog APM for agent runs

    Span-level filters, latency + cost heatmaps, OTLP export to whatever the customer already has. The flame graph drills into the model call, the tool call, the sub-agent, the diff against the previous run. A platform engineer who has never seen ALDO can debug a customer’s agent regression in 5 minutes.

  • Eval gate

    Eval-gated promotion the industry copies

    The same rubric that scored an agent in the playground gates its promotion to production. Customers ship agents like services: every change has a test, every regression blocks the deploy, every rollback restores the prior known-good. Adoption of the eval-gated promotion pattern is itself one of our best growth channels.

  • Customers

    20–50 paying teams; 3–5 lighthouse design partners

    Mix of small teams using ALDO Cloud and regulated orgs running self-host. Two named lighthouse partners are public references; three more are private. ARR > $2M with healthy gross margins. We grew without raising; if we raise, it’s for distribution, not survival.

If we land 70% of this, we’ve built the first agent platform a real engineering org would standardise on instead of patching together LangSmith + Braintrust + a framework + a vendor SLA every quarter.

Explicitly not doing

Listing these here so a prospect can disqualify us fast — your time matters more than our pipeline.

  • Hyperscaler-shape managed cloud (Bedrock / Vertex / Foundry)

    Wrong moat. Bedrock and friends own enterprise procurement + IAM + 15+ compliance certs each — we cannot beat them at their own game and we should not try.

  • LangChain-style framework

    We are framework-agnostic by design. The platform invariant: every code path goes through the gateway by capability + privacy + cost. Adding a framework above that would re-introduce the lock-in we exist to prevent.

  • Vibe-coding studio

    Other vendors say "not production-ready" out loud. We say the opposite: every primitive (specs, runs, evals, replays) is engineered to ship in production on day one.

See what already shipped

Hand-curated changelog updated on every meaningful release. Newest at the top.

View changelog →