KindLM
Regression testing and compliance guardrails for agentic AI workflows.
npm install -g @kindlm/cli
kindlm init
kindlm test kindlm.yaml
What KindLM Does
KindLM is a CLI tool that runs test suites against your LLM-powered features and agents. Define test cases in YAML, run them locally or in CI, and get clear pass/fail results with actionable failure reasons.
What makes it different:
- Agent-aware — assert on tool calls (which tool was called, with what arguments, in what order), not just text output
- LLM-as-judge — evaluate subjective quality ("is this response empathetic?") using configurable judge models
- Compliance reports — generate EU AI Act–aligned test documentation for auditors
- CI-native — exit codes, JUnit XML, JSON reports. Drop it into your pipeline in 5 minutes
Quick Example
# kindlm.yaml
kindlm: 1
project: "my-agent"
providers:
anthropic:
apiKeyEnv: "ANTHROPIC_API_KEY"
models:
- id: "claude-sonnet"
provider: "anthropic"
model: "claude-sonnet-4-5-20250929"
params:
temperature: 0.2
prompts:
support:
system: |
You are a support agent. Use lookup_order(order_id) to find orders.
Respond in JSON with { "action": string, "message": string }.
user: "{{message}}"
tests:
- name: "refund-request"
prompt: "support"
vars:
message: "Refund order #123 please"
tools:
- name: "lookup_order"
responses:
- when: { order_id: "123" }
then: { order_id: "123", total: 49.99, status: "delivered" }
expect:
output:
format: "json"
schemaFile: "./schemas/response.schema.json"
toolCalls:
- tool: "lookup_order"
argsMatch: { order_id: "123" }
guardrails:
pii:
enabled: true
keywords:
deny: ["not my problem"]
judge:
- criteria: "Response acknowledges the refund request professionally"
minScore: 0.8
gates:
passRateMin: 0.95
schemaFailuresMax: 0
$ kindlm test kindlm.yaml
✓ refund-request (claude-sonnet) 3/3 passed [1.2s]
Pass rate: 100% | Schema: 0 failures | Judge avg: 0.92
Gates: ✓ PASSED
Features
Assertions
| Type | What It Checks |
|---|---|
| Schema | JSON parse + JSON Schema validation (AJV) |
| PII | Regex patterns for SSN, credit cards, emails, custom |
| Keywords | Deny list (forbidden words) + allow list |
| Judge | LLM evaluates output against natural language criteria |
| Tool calls | Correct tool, correct arguments, correct order, shouldNotCall |
| Drift | Compare output against a baseline (LLM judge or field diff) |
| Contains | Output must/must not contain specific substrings |
Reports
- Terminal — colored summary with top failures
- JSON — full structured report for programmatic use
- JUnit XML — plug into any CI system's test reporting
- Compliance — EU AI Act–aligned markdown report with audit hashes
Agent Testing
KindLM simulates tool responses so you can test agent behavior without calling real APIs:
tools:
- name: "lookup_order"
responses:
- when: { order_id: "123" }
then: { order_id: "123", total: 49.99 }
defaultResponse: { error: "Order not found" }
The engine runs a multi-turn conversation: sends the prompt, intercepts tool calls, returns simulated responses, and continues until the model produces a final text response. Then all assertions run against the full conversation.
Baseline Drift Detection
# Save a baseline
kindlm baseline set kindlm-report.json --label "v2.0-release"
# Future runs compare against it
kindlm test kindlm.yaml --baseline latest
Drift is measured using LLM-as-judge comparison by default — it understands semantic changes, not just string differences.
Compliance Reports
compliance:
enabled: true
framework: "eu-ai-act"
outputDir: "./compliance-reports"
metadata:
systemName: "Customer Support Agent"
riskLevel: "limited"
operator: "ACME Corp"
Generates a structured markdown document mapping test results to EU AI Act Annex IV requirements, with SHA-256 artifact hashes for audit trail.
CI Integration
# GitHub Actions
- run: kindlm test kindlm.yaml --junit junit.xml --format json
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
Exit code 0 = all gates passed. Exit code 1 = gates failed. That's it.
Cloud (Optional)
Upload results to KindLM Cloud for history, trends, and team collaboration:
kindlm login
kindlm test kindlm.yaml --upload true
The CLI works fully offline with every feature. Cloud is optional and adds:
| Open Source (Free) | Team ($49/mo) | Enterprise ($299/mo) | |
|---|---|---|---|
| CLI (all assertions, providers, reports) | ✓ | ✓ | ✓ |
| Compliance reports (local markdown) | ✓ | ✓ | ✓ |
| Cloud dashboard + test history | — | 90 days | Unlimited |
| Team members | — | 10 | Unlimited |
| Compliance PDF export | — | ✓ | ✓ |
| Signed compliance reports | — | — | ✓ |
| SSO / SAML | — | — | ✓ |
| Audit log API | — | — | ✓ |
| Slack + webhook alerts | — | ✓ | ✓ |
Philosophy: The open-source CLI solves the engineering problem. Cloud solves the organizational problem. Engineers choose the tool. Their company pays for the dashboard.
Documentation
| Document | Description |
|---|---|
| Config Reference | Complete YAML config schema |
| Assertions | All assertion types with examples |
| Providers | Provider adapter setup |
| CI Integration | GitHub Actions, GitLab CI, Jenkins |
| Compliance | EU AI Act report generation |
| Cloud | Cloud API and dashboard |
License
MIT