KindLM — Error Handling Specification

Principle: Errors are data, not surprises. Every function that can fail returns a Result type. The CLI is the only layer that converts errors to human-readable messages and exit codes. Core never prints to stdout. Core never calls process.exit().

1. Result Type

All core functions that can fail return this discriminated union:

type Result<T, E = KindlmError> =
  | { success: true; data: T }
  | { success: false; error: E };

KindlmError

interface KindlmError {
  code: ErrorCode;
  message: string;          // Human-readable, shown to user
  details?: Record<string, unknown>;  // Machine-readable context
  cause?: Error;            // Original error (for stack traces in debug mode)
}

type ErrorCode =
  // Config errors (1xx)
  | 'CONFIG_NOT_FOUND'       // 100 — kindlm.yaml doesn't exist
  | 'CONFIG_PARSE_ERROR'     // 101 — YAML syntax error
  | 'CONFIG_VALIDATION_ERROR'// 102 — Zod validation failed
  | 'CONFIG_FILE_REF_ERROR'  // 103 — Referenced file doesn't exist (schemaFile, system_prompt_file)
  
  // Provider errors (2xx)
  | 'PROVIDER_NOT_FOUND'     // 200 — Unknown provider string
  | 'PROVIDER_AUTH_ERROR'    // 201 — API key missing or invalid
  | 'PROVIDER_RATE_LIMIT'    // 202 — 429 from provider
  | 'PROVIDER_TIMEOUT'       // 203 — Request timed out
  | 'PROVIDER_API_ERROR'     // 204 — Non-retryable API error (400, 500)
  | 'PROVIDER_NETWORK_ERROR' // 205 — DNS, connection refused, etc.
  
  // Assertion errors (3xx)
  | 'ASSERTION_EVAL_ERROR'   // 300 — Assertion logic failed unexpectedly
  | 'SCHEMA_FILE_ERROR'      // 301 — JSON Schema file invalid or not found
  | 'JUDGE_EVAL_ERROR'       // 302 — Judge model failed to return a score
  
  // Engine errors (4xx)
  | 'ENGINE_MAX_TURNS'       // 400 — Multi-turn loop hit max iterations
  | 'ENGINE_EMPTY_RESPONSE'  // 401 — Provider returned empty content
  
  // Baseline errors (5xx)
  | 'BASELINE_NOT_FOUND'     // 500 — No baseline saved
  | 'BASELINE_CORRUPT'       // 501 — Baseline JSON can't be parsed
  | 'BASELINE_VERSION_MISMATCH' // 502 — Baseline from incompatible version
  
  // Cloud errors (6xx)
  | 'CLOUD_AUTH_ERROR'       // 600 — Not logged in or token expired
  | 'CLOUD_UPLOAD_ERROR'     // 601 — Upload failed
  | 'CLOUD_PLAN_LIMIT'       // 602 — Feature requires higher plan
  | 'CLOUD_RATE_LIMIT'       // 603 — Cloud API rate limited
  
  // System errors (9xx)
  | 'UNKNOWN_ERROR';         // 999 — Unexpected error

2. Error Flow

Provider API → ProviderAdapter → Engine → Reporter → CLI → User
     ↓              ↓              ↓          ↓         ↓
  HTTP error    Result type    Result type  Result   Exit code
  (thrown)      (caught at     (propagated) (propagated) + message
                 boundary)

Layer responsibilities:

Layer	Catches	Returns	Allowed to print
Provider adapter	HTTP errors, JSON parse errors	`Result<ProviderResponse>`	No
Assertion handler	Evaluation errors	`Result<AssertionResult>`	No
Engine	Nothing — propagates Results	`Result<RunResult>`	No
Reporter	Nothing — receives RunResult	`Result<string>` (formatted output)	No
CLI	Unwraps all Results	Exit code + stderr/stdout	Yes

3. Provider Error Handling

Provider adapters are the boundary between external HTTP APIs and our code. They're the only place where try/catch is used in core.

// packages/core/src/providers/openai.ts
async complete(request: ProviderRequest): Promise<Result<ProviderResponse>> {
  try {
    const response = await fetch('https://api.openai.com/v1/chat/completions', {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${this.apiKey}`,
        'Content-Type': 'application/json',
      },
      body: JSON.stringify(toOpenAIFormat(request)),
      signal: AbortSignal.timeout(request.timeout ?? 30_000),
    });

    if (response.status === 401) {
      return {
        success: false,
        error: {
          code: 'PROVIDER_AUTH_ERROR',
          message: `OpenAI API key is invalid or missing. Set OPENAI_API_KEY environment variable.`,
        }
      };
    }

    if (response.status === 429) {
      const retryAfter = response.headers.get('retry-after');
      return {
        success: false,
        error: {
          code: 'PROVIDER_RATE_LIMIT',
          message: `OpenAI rate limit hit. Retry after ${retryAfter ?? 'unknown'} seconds.`,
          details: { retryAfter, provider: 'openai' },
        }
      };
    }

    if (!response.ok) {
      const body = await response.text();
      return {
        success: false,
        error: {
          code: 'PROVIDER_API_ERROR',
          message: `OpenAI returned ${response.status}: ${body.slice(0, 200)}`,
          details: { status: response.status, body },
        }
      };
    }

    const data = await response.json();
    return { success: true, data: fromOpenAIFormat(data) };

  } catch (err) {
    if (err instanceof DOMException && err.name === 'AbortError') {
      return {
        success: false,
        error: {
          code: 'PROVIDER_TIMEOUT',
          message: `OpenAI request timed out after ${request.timeout ?? 30_000}ms`,
          cause: err as Error,
        }
      };
    }

    return {
      success: false,
      error: {
        code: 'PROVIDER_NETWORK_ERROR',
        message: `Failed to connect to OpenAI: ${(err as Error).message}`,
        cause: err as Error,
      }
    };
  }
}

Retry logic

Retries happen in the engine, not in adapters. Adapters are stateless — they make one request and return one Result.

// packages/core/src/engine/retry.ts
const RETRYABLE_CODES: ErrorCode[] = [
  'PROVIDER_RATE_LIMIT',
  'PROVIDER_TIMEOUT',
  'PROVIDER_NETWORK_ERROR',
];

async function withRetry<T>(
  fn: () => Promise<Result<T>>,
  maxRetries: number = 2,
  backoffMs: number = 1000,
): Promise<Result<T>> {
  let lastResult: Result<T>;
  
  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    lastResult = await fn();
    
    if (lastResult.success) return lastResult;
    if (!RETRYABLE_CODES.includes(lastResult.error.code)) return lastResult;
    if (attempt < maxRetries) {
      await sleep(backoffMs * Math.pow(2, attempt)); // Exponential backoff
    }
  }
  
  return lastResult!;
}

4. Assertion Error Handling

Assertions should never crash the test run. A failing assertion is a test result, not an error.

// Assertion returns Result, not throw
function evaluateToolCalled(
  response: ProviderResponse,
  config: ToolCalledConfig,
): Result<AssertionResult> {
  // This is a test failure, NOT an error:
  if (!response.toolCalls?.length) {
    return {
      success: true, // Result succeeded (we got a result)
      data: {
        pass: false, // But the assertion failed
        message: `Expected tool "${config.tool}" to be called, but no tools were called`,
        score: 0,
      }
    };
  }
  
  // This IS an error (something went wrong evaluating):
  // Only happens if our own code has a bug
}

Key distinction:

Scenario	Result.success	AssertionResult.pass	Meaning
Tool was called correctly	`true`	`true`	Test passed
Tool was not called	`true`	`false`	Test failed (expected outcome)
Assertion code crashed	`false`	N/A	Error in KindLM itself

5. Engine Error Handling

The engine orchestrates test execution. It collects Results from providers and assertions and produces a RunResult.

Test-level errors

If a provider call fails for a single test, that test is marked as errored (not failed). Remaining tests continue.

interface TestResult {
  name: string;
  status: 'passed' | 'failed' | 'errored' | 'skipped';
  assertions: AssertionResult[];
  error?: KindlmError;  // Only set when status === 'errored'
  latencyMs: number;
  costUsd: number;
}

Suite-level errors

If config parsing fails, the entire suite is skipped with an error message. Other suites still run.

Run-level errors

Only fatal errors (config file not found, no valid suites) stop the entire run.

Run
├── Suite A (3 tests)
│   ├── Test 1: passed
│   ├── Test 2: failed (assertion failed — expected behavior)
│   └── Test 3: errored (provider timeout — infrastructure issue)
├── Suite B (2 tests) — skipped (invalid config reference)
│   └── Error: system_prompt_file "prompts/missing.txt" not found
└── Suite C (2 tests)
    ├── Test 1: passed
    └── Test 2: passed

Summary: 3 passed, 1 failed, 1 errored, 2 skipped (Suite B)
Exit code: 1

6. CLI Error Messages

The CLI is responsible for converting error codes to user-friendly messages.

Formatting rules:

Error message on first line (bold red in terminal)
Actionable fix on second line
Debug details only with --verbose

Examples:

✗ Config error: Unknown assertion type "tool_caled" at line 23
  Did you mean "tool_called"? See: https://kindlm.com/docs/assertions

✗ OpenAI API key is invalid or missing
  Set OPENAI_API_KEY in your environment or .env file

✗ OpenAI rate limit hit. Retry after 30 seconds.
  Reduce concurrency with --concurrency 1 or add retry config

✗ Schema file not found: schemas/order.json
  Referenced in suite "order-agent", test "happy-path"

✗ Baseline not found for suite "refund-agent"
  Run: kindlm baseline set

Exit codes:

Code	Meaning
0	All tests passed, all gates passed
1	Tests failed OR gate failed OR error occurred

We intentionally keep it simple — 0 or 1. CI systems only need pass/fail.

`--verbose` flag:

Adds stack traces, full provider response bodies, timing per assertion, and retry attempts to output.

7. Cloud API Error Responses

All Cloud API errors return consistent JSON:

{
  "error": "plan_required",
  "message": "This feature requires a Team or Enterprise plan. Current plan: free",
  "details": {
    "required_plan": "team",
    "current_plan": "free",
    "feature": "pdf_export",
    "upgrade_url": "https://cloud.kindlm.com/settings/billing"
  }
}

HTTP status mapping:

Error Code	HTTP Status	When
`CLOUD_AUTH_ERROR`	401	Missing/invalid/expired token
`CLOUD_PLAN_LIMIT`	403	Feature needs higher plan
`not_found`	404	Resource doesn't exist or not in user's org
`conflict`	409	Duplicate project name
`validation_error`	422	Invalid request body (Zod error details included)
`CLOUD_RATE_LIMIT`	429	Rate limit exceeded
`payload_too_large`	413	Upload > 5MB
`UNKNOWN_ERROR`	500	Unexpected server error (logged, not exposed to user)

8. Logging

CLI logging levels:

kindlm test              → Only errors and summary
kindlm test --verbose    → Errors, warnings, info, debug details
DEBUG=kindlm* kindlm test → Full debug output (Node.js debug module)

Cloud logging:

Cloudflare Workers logs via console.log → Cloudflare dashboard + optional Logpush to external service.

Log format (structured JSON):

{
  "level": "error",
  "code": "PROVIDER_TIMEOUT",
  "message": "OpenAI request timed out after 30000ms",
  "org_id": "org_a1b2c3",
  "run_id": "run_x1y2z3",
  "timestamp": "2026-02-15T10:30:00Z"
}

Never log: API keys, user prompts/responses (PII risk), full provider response bodies in production.

9. Graceful Degradation

Failure	Behavior
One provider fails, others succeed	Failed provider's tests marked as errored, others complete
Judge model unavailable	Judge assertions marked as errored with suggestion to retry
Cloud upload fails	CLI warns but exits based on test results (not upload status)
Baseline file corrupted	Compare fails gracefully with "baseline corrupt" error, tests still run
Disk full (can't write report)	Stderr warning, test results still shown in terminal
SIGINT (Ctrl+C)	Graceful shutdown — print partial results, exit 1