Guide

Dataset uploads

Upload JSONL or CSV; bind to suites; share across agents.

Datasets are reusable bundles of cases. One dataset can back multiple suites; multiple suites can target one agent. Versioning follows the rest of the registry: every push gets a new version, the live pointer moves explicitly.

#Format

Datasets are JSONL or CSV. JSONL is preferred — it preserves structured ground-truth fields:

{"input": "Refactor the auth module", "ground_truth": "preserved-public-api", "tags": ["refactor"]}
{"input": "Find the off-by-one", "ground_truth": "loop bound off by one", "tags": ["debug"]}

CSV is fine for flat schemas; the column header row determines the field names.

#Upload

Via CLI:

aldo dataset push ./changelog-cases.jsonl --name changelog-cases

Via the API:

curl -X POST https://ai.aldo.tech/api/auth-proxy/v1/datasets \
  -H "Authorization: Bearer $ALDO_API_KEY" \
  -H "Content-Type: application/x-ndjson" \
  --data-binary @changelog-cases.jsonl

#Privacy tier

Datasets carry a privacy tier. A sensitive dataset can only be used by sensitive runs — the platform won't bind it to a suite that runs against a less-strict tier.

#Sharing

A dataset is tenant-scoped by default. To share across tenants, the operator must explicitly mark it shared (only available in self-hosted deployments).

#Inspecting

The Datasets page in the control plane shows row counts, tag distribution, and a sample of recent rows. Click into a dataset to see its version history and which suites bind to it.

Source: apps/web/content/docs/guides/dataset-uploads.md
Was this helpful?
Dataset uploads · ALDO AI Docs