KindLM CLI Reference

Installation

npm install -g @kindlm/cli

# Or use npx (no install)
npx @kindlm/cli test kindlm.yaml

Commands

kindlm init

Scaffolds a new KindLM project in the current directory.

kindlm init
kindlm init --template agent     # agent-focused template
kindlm init --template basic     # simple prompt test template
kindlm init --template compliance # EU AI Act compliance template

Creates:

kindlm.yaml               # Config file
schemas/                   # JSON schema directory
  example.schema.json      # Example output schema
.kindlm/                  # Local data directory (gitignored)
  baselines/               # Baseline snapshots

Exit codes: 0 success, 1 write error


kindlm validate <config>

Validates a config file without executing any tests or calling any providers.

kindlm validate kindlm.yaml
kindlm validate ./path/to/config.yaml

Validates:

  • YAML syntax
  • Zod schema compliance
  • Cross-reference integrity (prompt refs, model refs, provider refs)
  • Schema file existence (checks paths resolve)
  • Variable completeness (all {{vars}} in prompts have matching test vars)

Exit codes: 0 valid, 1 invalid (prints errors to stderr)


kindlm test

Executes the test suite and produces reports. After running, test results are cached to .kindlm/last-run.json so they can be uploaded to KindLM Cloud via kindlm upload without re-running.

# Basic usage
kindlm test

# With options
kindlm test \
  -c kindlm.yaml \
  -s my-suite \
  --reporter pretty \
  --runs 5 \
  --gate 95 \
  --compliance

Flags:

FlagTypeDefaultDescription
-c, --configstringkindlm.yamlPath to config file
-s, --suitestringRun a specific suite
--reporterpretty|json|junitprettyOutput format
--runsintfrom configOverride repeat run count
--gatenumberfrom configFail if pass rate below threshold (percent)
--compliancebooleanfalseGenerate EU AI Act compliance report

Exit codes: 0 = all gates passed, 1 = failure or gates failed


kindlm baseline <subcommand>

Manage local baselines.

# Save current report as a baseline
kindlm baseline set kindlm-report.json --label "v2.1-release"

# List baselines
kindlm baseline list

# Compare a report against a baseline
kindlm baseline compare kindlm-report.json --baseline "v2.1-release"

# Remove a baseline
kindlm baseline remove "v2.1-release"

Baselines are stored in .kindlm/baselines/ as JSON snapshots. Each baseline contains the output text per test case (for drift comparison) and the summary metrics (for delta reporting).


kindlm login

Authenticate with KindLM Cloud. Create an API token in the Cloud dashboard, then paste it here.

# Interactive: prompts for token paste
kindlm login

# Non-interactive: pass token directly
kindlm login --token klm_abc123

# Check current auth status
kindlm login --status

# Remove stored credentials
kindlm login --logout

Flags:

FlagTypeDescription
-t, --tokenstringAPI token (skips interactive prompt)
--statusbooleanShow current authentication status
--logoutbooleanRemove stored credentials

Token is stored in ~/.kindlm/credentials (file permissions 600). The KINDLM_API_TOKEN environment variable can also be used as an alternative to stored credentials.


kindlm upload

Upload the last test run to KindLM Cloud. Reads cached results from .kindlm/last-run.json (written automatically by kindlm test).

# Upload with auto-detected project name (from git remote)
kindlm upload

# Specify project name explicitly
kindlm upload --project acme-support

# Use a specific token (overrides stored credentials)
kindlm upload --token klm_abc123

Flags:

FlagTypeDescription
-t, --tokenstringAPI token (overrides stored/env token)
-p, --projectstringProject name (defaults to git remote name or cwd basename)

How it works: The upload command finds or creates the project and suite in Cloud, creates a run, batch-inserts all test results, and finalizes the run with aggregated metrics. Git commit SHA, branch, and CI environment are auto-detected.

Useful when tests are run in a CI step and upload happens in a separate step.

Cloud is optional. The CLI works fully offline with all features. login and upload are only needed if you want Cloud dashboard features. Free plan: 1 project, 7-day history. Team ($49/mo): 5 projects, 90 days. Enterprise ($299/mo): unlimited.


Terminal Output (Pretty Format)

┌─────────────────────────────────────────────────────┐
│  KindLM v0.1.0                                       │
│  Suite: support-agent-regression                      │
│  Config: a1b2c3d4e5f6                                │
│  Models: claude-sonnet, gpt-4o                       │
│  Tests: 4 × 2 models × 3 repeats = 24 executions    │
└─────────────────────────────────────────────────────┘

Running tests...

  ✓ refund-double-charge (claude-sonnet) 3/3 passed [1.2s]
  ✓ refund-double-charge (gpt-4o)       3/3 passed [0.9s]
  ✓ refund-order-not-found (claude-sonnet) 3/3 passed [1.1s]
  ✗ refund-order-not-found (gpt-4o)     2/3 passed [1.0s]
      Run 2: TOOL_CALL_MISSING — Expected "lookup_order" was never called
  ✓ escalation-legal-threat (claude-sonnet) 3/3 passed [1.3s]
  ✓ escalation-legal-threat (gpt-4o)    3/3 passed [1.1s]
  ✓ greeting-response (claude-sonnet)   3/3 passed [0.5s]
  ✓ greeting-response (gpt-4o)          3/3 passed [0.4s]

┌─────────────────────────────────────────────────────┐
│  Summary                                             │
├─────────────┬────────────────────────────────────────┤
│ Pass rate   │ 87.5% (7/8 aggregated)                 │
│ Schema      │ 0 failures                             │
│ PII         │ 0 failures                             │
│ Judge avg   │ 0.89                                   │
│ Drift       │ 0.04                                   │
│ Cost        │ $0.12                                  │
│ Latency     │ 940ms avg                              │
├─────────────┼────────────────────────────────────────┤
│ Gates       │ ✗ FAILED                               │
│             │ ✗ passRateMin: 87.5% < 95.0%           │
│             │ ✓ schemaFailuresMax: 0 ≤ 0             │
│             │ ✓ piiFailuresMax: 0 ≤ 0                │
│             │ ✓ judgeAvgMin: 89.0% ≥ 80.0%           │
│             │ ✓ driftScoreMax: 0.04 ≤ 0.15           │
└─────────────┴────────────────────────────────────────┘

Top failures:
  1. refund-order-not-found (gpt-4o) — TOOL_CALL_MISSING

Report: kindlm-report.json
Compliance: ./compliance-reports/kindlm-compliance-a1b2c3d4-2026-02-15.md
Exit code: 1 (gates failed)

JUnit XML Output

<?xml version="1.0" encoding="UTF-8"?>
<testsuites name="support-agent-regression" tests="8" failures="1" time="8.5">
  <testsuite name="claude-sonnet" tests="4" failures="0" time="4.1">
    <testcase name="refund-double-charge" classname="support-agent-regression.claude-sonnet" time="1.2"/>
    <testcase name="refund-order-not-found" classname="support-agent-regression.claude-sonnet" time="1.1"/>
    <testcase name="escalation-legal-threat" classname="support-agent-regression.claude-sonnet" time="1.3"/>
    <testcase name="greeting-response" classname="support-agent-regression.claude-sonnet" time="0.5"/>
  </testsuite>
  <testsuite name="gpt-4o" tests="4" failures="1" time="3.4">
    <testcase name="refund-double-charge" classname="support-agent-regression.gpt-4o" time="0.9"/>
    <testcase name="refund-order-not-found" classname="support-agent-regression.gpt-4o" time="1.0">
      <failure type="TOOL_CALL_MISSING" message="Expected tool &quot;lookup_order&quot; was never called">
Pass rate: 2/3 (66.7%)
Run 2: TOOL_CALL_MISSING — Expected "lookup_order" was never called. Called: issue_refund
      </failure>
    </testcase>
    <testcase name="escalation-legal-threat" classname="support-agent-regression.gpt-4o" time="1.1"/>
    <testcase name="greeting-response" classname="support-agent-regression.gpt-4o" time="0.4"/>
  </testsuite>
</testsuites>

CI Integration Examples

GitHub Actions

name: KindLM Regression Tests
on:
  pull_request:
    paths:
      - 'prompts/**'
      - 'kindlm.yaml'
      - 'schemas/**'

jobs:
  kindlm-test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-node@v4
        with:
          node-version: '20'

      - run: npm install -g @kindlm/cli

      - name: Run KindLM tests
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
        run: kindlm test --reporter json --compliance

      - name: Upload to KindLM Cloud
        if: always()
        env:
          KINDLM_API_TOKEN: ${{ secrets.KINDLM_API_TOKEN }}
        run: kindlm upload --project my-project

GitLab CI

kindlm-test:
  stage: test
  image: node:20
  variables:
    ANTHROPIC_API_KEY: $ANTHROPIC_API_KEY
    OPENAI_API_KEY: $OPENAI_API_KEY
  script:
    - npm install -g @kindlm/cli
    - kindlm test kindlm.yaml --format json --junit junit.xml --out kindlm-report.json
  artifacts:
    reports:
      junit: junit.xml
    paths:
      - kindlm-report.json
      - compliance-reports/
    when: always
  rules:
    - changes:
        - prompts/**
        - kindlm.yaml
        - schemas/**

Environment Variables

VariablePurpose
GOOGLE_API_KEYGoogle Gemini API key
MISTRAL_API_KEYMistral API key
CO_API_KEYCohere API key
KINDLM_API_TOKENCloud API token (alternative to kindlm login)
KINDLM_CLOUD_URLCloud API URL override (default: https://api.kindlm.com)
KINDLM_NO_COLORDisable ANSI colors
KINDLM_DEBUGEnable debug logging
CIAuto-detected; disables interactive features

Google Gemini

KindLM supports Google Gemini models via the Generative Language API.

Configuration

providers:
  gemini:
    apiKeyEnv: GOOGLE_API_KEY

models:
  - id: gemini-2.0-flash
    provider: gemini
    model: gemini-2.0-flash
    params:
      temperature: 0
      maxTokens: 2048

Notes

  • API key: Get one at https://aistudio.google.com/apikey
  • Cost tracking: Pricing table included for Gemini 2.0 Flash, 1.5 Pro, 1.5 Flash variants
  • Tool support: Full function calling support
  • System prompts: Supported via Gemini's systemInstruction field

Mistral

KindLM supports Mistral models via the Mistral API (OpenAI-compatible format).

Configuration

providers:
  mistral:
    apiKeyEnv: MISTRAL_API_KEY

models:
  - id: mistral-large
    provider: mistral
    model: mistral-large-latest
    params:
      temperature: 0
      maxTokens: 2048

Notes

  • API key: Get one at https://console.mistral.ai/
  • Cost tracking: Not available (returns null)
  • Tool support: Full function calling support

Cohere

KindLM supports Cohere models via the v2 Chat API.

Configuration

providers:
  cohere:
    apiKeyEnv: CO_API_KEY

models:
  - id: command-r-plus
    provider: cohere
    model: command-r-plus
    params:
      temperature: 0
      maxTokens: 2048

Notes

  • API key: Get one at https://dashboard.cohere.com/
  • Cost tracking: Not available (returns null)
  • Tool support: Full function calling support
  • Parameter naming: topP is automatically mapped to Cohere's p parameter

Ollama (Local Models)

KindLM supports Ollama for running tests against local open-source models with zero API cost.

Configuration

providers:
  ollama:
    # No apiKeyEnv needed — Ollama runs locally
    # baseUrl: http://localhost:11434   # default

models:
  - id: llama3.2
    provider: ollama
    model: llama3.2
    params:
      temperature: 0
      maxTokens: 2048
  - id: mistral
    provider: ollama
    model: mistral
    params:
      temperature: 0

Usage

# Ensure Ollama is running
ollama serve

# Pull the model if not already downloaded
ollama pull llama3.2

# Run tests (no API key needed)
kindlm test -c kindlm.yaml

Notes

  • No API key required: The apiKeyEnv field is optional for Ollama
  • Cost: Always reported as $0.00 (local inference)
  • Tool support: Ollama supports tool calling for compatible models
  • Custom server: Use baseUrl to point to a remote Ollama instance
  • Mixed providers: You can test the same prompts against both cloud and local models:
providers:
  openai:
    apiKeyEnv: OPENAI_API_KEY
  ollama: {}

models:
  - id: gpt-4o
    provider: openai
    model: gpt-4o
  - id: llama3.2
    provider: ollama
    model: llama3.2