KindLM CLI Reference
Installation
npm install -g @kindlm/cli
# Or use npx (no install)
npx @kindlm/cli test kindlm.yaml
Commands
kindlm init
Scaffolds a new KindLM project in the current directory.
kindlm init
kindlm init --template agent # agent-focused template
kindlm init --template basic # simple prompt test template
kindlm init --template compliance # EU AI Act compliance template
Creates:
kindlm.yaml # Config file
schemas/ # JSON schema directory
example.schema.json # Example output schema
.kindlm/ # Local data directory (gitignored)
baselines/ # Baseline snapshots
Exit codes: 0 success, 1 write error
kindlm validate <config>
Validates a config file without executing any tests or calling any providers.
kindlm validate kindlm.yaml
kindlm validate ./path/to/config.yaml
Validates:
- YAML syntax
- Zod schema compliance
- Cross-reference integrity (prompt refs, model refs, provider refs)
- Schema file existence (checks paths resolve)
- Variable completeness (all
{{vars}}in prompts have matching test vars)
Exit codes: 0 valid, 1 invalid (prints errors to stderr)
kindlm test
Executes the test suite and produces reports. After running, test results are cached to .kindlm/last-run.json so they can be uploaded to KindLM Cloud via kindlm upload without re-running.
# Basic usage
kindlm test
# With options
kindlm test \
-c kindlm.yaml \
-s my-suite \
--reporter pretty \
--runs 5 \
--gate 95 \
--compliance
Flags:
| Flag | Type | Default | Description |
|---|---|---|---|
-c, --config | string | kindlm.yaml | Path to config file |
-s, --suite | string | — | Run a specific suite |
--reporter | pretty|json|junit | pretty | Output format |
--runs | int | from config | Override repeat run count |
--gate | number | from config | Fail if pass rate below threshold (percent) |
--compliance | boolean | false | Generate EU AI Act compliance report |
Exit codes: 0 = all gates passed, 1 = failure or gates failed
kindlm baseline <subcommand>
Manage local baselines.
# Save current report as a baseline
kindlm baseline set kindlm-report.json --label "v2.1-release"
# List baselines
kindlm baseline list
# Compare a report against a baseline
kindlm baseline compare kindlm-report.json --baseline "v2.1-release"
# Remove a baseline
kindlm baseline remove "v2.1-release"
Baselines are stored in .kindlm/baselines/ as JSON snapshots. Each baseline contains the output text per test case (for drift comparison) and the summary metrics (for delta reporting).
kindlm login
Authenticate with KindLM Cloud. Create an API token in the Cloud dashboard, then paste it here.
# Interactive: prompts for token paste
kindlm login
# Non-interactive: pass token directly
kindlm login --token klm_abc123
# Check current auth status
kindlm login --status
# Remove stored credentials
kindlm login --logout
Flags:
| Flag | Type | Description |
|---|---|---|
-t, --token | string | API token (skips interactive prompt) |
--status | boolean | Show current authentication status |
--logout | boolean | Remove stored credentials |
Token is stored in ~/.kindlm/credentials (file permissions 600). The KINDLM_API_TOKEN environment variable can also be used as an alternative to stored credentials.
kindlm upload
Upload the last test run to KindLM Cloud. Reads cached results from .kindlm/last-run.json (written automatically by kindlm test).
# Upload with auto-detected project name (from git remote)
kindlm upload
# Specify project name explicitly
kindlm upload --project acme-support
# Use a specific token (overrides stored credentials)
kindlm upload --token klm_abc123
Flags:
| Flag | Type | Description |
|---|---|---|
-t, --token | string | API token (overrides stored/env token) |
-p, --project | string | Project name (defaults to git remote name or cwd basename) |
How it works: The upload command finds or creates the project and suite in Cloud, creates a run, batch-inserts all test results, and finalizes the run with aggregated metrics. Git commit SHA, branch, and CI environment are auto-detected.
Useful when tests are run in a CI step and upload happens in a separate step.
Cloud is optional. The CLI works fully offline with all features.
loginanduploadare only needed if you want Cloud dashboard features. Free plan: 1 project, 7-day history. Team ($49/mo): 5 projects, 90 days. Enterprise ($299/mo): unlimited.
Terminal Output (Pretty Format)
┌─────────────────────────────────────────────────────┐
│ KindLM v0.1.0 │
│ Suite: support-agent-regression │
│ Config: a1b2c3d4e5f6 │
│ Models: claude-sonnet, gpt-4o │
│ Tests: 4 × 2 models × 3 repeats = 24 executions │
└─────────────────────────────────────────────────────┘
Running tests...
✓ refund-double-charge (claude-sonnet) 3/3 passed [1.2s]
✓ refund-double-charge (gpt-4o) 3/3 passed [0.9s]
✓ refund-order-not-found (claude-sonnet) 3/3 passed [1.1s]
✗ refund-order-not-found (gpt-4o) 2/3 passed [1.0s]
Run 2: TOOL_CALL_MISSING — Expected "lookup_order" was never called
✓ escalation-legal-threat (claude-sonnet) 3/3 passed [1.3s]
✓ escalation-legal-threat (gpt-4o) 3/3 passed [1.1s]
✓ greeting-response (claude-sonnet) 3/3 passed [0.5s]
✓ greeting-response (gpt-4o) 3/3 passed [0.4s]
┌─────────────────────────────────────────────────────┐
│ Summary │
├─────────────┬────────────────────────────────────────┤
│ Pass rate │ 87.5% (7/8 aggregated) │
│ Schema │ 0 failures │
│ PII │ 0 failures │
│ Judge avg │ 0.89 │
│ Drift │ 0.04 │
│ Cost │ $0.12 │
│ Latency │ 940ms avg │
├─────────────┼────────────────────────────────────────┤
│ Gates │ ✗ FAILED │
│ │ ✗ passRateMin: 87.5% < 95.0% │
│ │ ✓ schemaFailuresMax: 0 ≤ 0 │
│ │ ✓ piiFailuresMax: 0 ≤ 0 │
│ │ ✓ judgeAvgMin: 89.0% ≥ 80.0% │
│ │ ✓ driftScoreMax: 0.04 ≤ 0.15 │
└─────────────┴────────────────────────────────────────┘
Top failures:
1. refund-order-not-found (gpt-4o) — TOOL_CALL_MISSING
Report: kindlm-report.json
Compliance: ./compliance-reports/kindlm-compliance-a1b2c3d4-2026-02-15.md
Exit code: 1 (gates failed)
JUnit XML Output
<?xml version="1.0" encoding="UTF-8"?>
<testsuites name="support-agent-regression" tests="8" failures="1" time="8.5">
<testsuite name="claude-sonnet" tests="4" failures="0" time="4.1">
<testcase name="refund-double-charge" classname="support-agent-regression.claude-sonnet" time="1.2"/>
<testcase name="refund-order-not-found" classname="support-agent-regression.claude-sonnet" time="1.1"/>
<testcase name="escalation-legal-threat" classname="support-agent-regression.claude-sonnet" time="1.3"/>
<testcase name="greeting-response" classname="support-agent-regression.claude-sonnet" time="0.5"/>
</testsuite>
<testsuite name="gpt-4o" tests="4" failures="1" time="3.4">
<testcase name="refund-double-charge" classname="support-agent-regression.gpt-4o" time="0.9"/>
<testcase name="refund-order-not-found" classname="support-agent-regression.gpt-4o" time="1.0">
<failure type="TOOL_CALL_MISSING" message="Expected tool "lookup_order" was never called">
Pass rate: 2/3 (66.7%)
Run 2: TOOL_CALL_MISSING — Expected "lookup_order" was never called. Called: issue_refund
</failure>
</testcase>
<testcase name="escalation-legal-threat" classname="support-agent-regression.gpt-4o" time="1.1"/>
<testcase name="greeting-response" classname="support-agent-regression.gpt-4o" time="0.4"/>
</testsuite>
</testsuites>
CI Integration Examples
GitHub Actions
name: KindLM Regression Tests
on:
pull_request:
paths:
- 'prompts/**'
- 'kindlm.yaml'
- 'schemas/**'
jobs:
kindlm-test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: '20'
- run: npm install -g @kindlm/cli
- name: Run KindLM tests
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
run: kindlm test --reporter json --compliance
- name: Upload to KindLM Cloud
if: always()
env:
KINDLM_API_TOKEN: ${{ secrets.KINDLM_API_TOKEN }}
run: kindlm upload --project my-project
GitLab CI
kindlm-test:
stage: test
image: node:20
variables:
ANTHROPIC_API_KEY: $ANTHROPIC_API_KEY
OPENAI_API_KEY: $OPENAI_API_KEY
script:
- npm install -g @kindlm/cli
- kindlm test kindlm.yaml --format json --junit junit.xml --out kindlm-report.json
artifacts:
reports:
junit: junit.xml
paths:
- kindlm-report.json
- compliance-reports/
when: always
rules:
- changes:
- prompts/**
- kindlm.yaml
- schemas/**
Environment Variables
| Variable | Purpose |
|---|---|
GOOGLE_API_KEY | Google Gemini API key |
MISTRAL_API_KEY | Mistral API key |
CO_API_KEY | Cohere API key |
KINDLM_API_TOKEN | Cloud API token (alternative to kindlm login) |
KINDLM_CLOUD_URL | Cloud API URL override (default: https://api.kindlm.com) |
KINDLM_NO_COLOR | Disable ANSI colors |
KINDLM_DEBUG | Enable debug logging |
CI | Auto-detected; disables interactive features |
Google Gemini
KindLM supports Google Gemini models via the Generative Language API.
Configuration
providers:
gemini:
apiKeyEnv: GOOGLE_API_KEY
models:
- id: gemini-2.0-flash
provider: gemini
model: gemini-2.0-flash
params:
temperature: 0
maxTokens: 2048
Notes
- API key: Get one at https://aistudio.google.com/apikey
- Cost tracking: Pricing table included for Gemini 2.0 Flash, 1.5 Pro, 1.5 Flash variants
- Tool support: Full function calling support
- System prompts: Supported via Gemini's
systemInstructionfield
Mistral
KindLM supports Mistral models via the Mistral API (OpenAI-compatible format).
Configuration
providers:
mistral:
apiKeyEnv: MISTRAL_API_KEY
models:
- id: mistral-large
provider: mistral
model: mistral-large-latest
params:
temperature: 0
maxTokens: 2048
Notes
- API key: Get one at https://console.mistral.ai/
- Cost tracking: Not available (returns null)
- Tool support: Full function calling support
Cohere
KindLM supports Cohere models via the v2 Chat API.
Configuration
providers:
cohere:
apiKeyEnv: CO_API_KEY
models:
- id: command-r-plus
provider: cohere
model: command-r-plus
params:
temperature: 0
maxTokens: 2048
Notes
- API key: Get one at https://dashboard.cohere.com/
- Cost tracking: Not available (returns null)
- Tool support: Full function calling support
- Parameter naming:
topPis automatically mapped to Cohere'spparameter
Ollama (Local Models)
KindLM supports Ollama for running tests against local open-source models with zero API cost.
Configuration
providers:
ollama:
# No apiKeyEnv needed — Ollama runs locally
# baseUrl: http://localhost:11434 # default
models:
- id: llama3.2
provider: ollama
model: llama3.2
params:
temperature: 0
maxTokens: 2048
- id: mistral
provider: ollama
model: mistral
params:
temperature: 0
Usage
# Ensure Ollama is running
ollama serve
# Pull the model if not already downloaded
ollama pull llama3.2
# Run tests (no API key needed)
kindlm test -c kindlm.yaml
Notes
- No API key required: The
apiKeyEnvfield is optional for Ollama - Cost: Always reported as $0.00 (local inference)
- Tool support: Ollama supports tool calling for compatible models
- Custom server: Use
baseUrlto point to a remote Ollama instance - Mixed providers: You can test the same prompts against both cloud and local models:
providers:
openai:
apiKeyEnv: OPENAI_API_KEY
ollama: {}
models:
- id: gpt-4o
provider: openai
model: gpt-4o
- id: llama3.2
provider: ollama
model: llama3.2