KindLM — Error Handling Specification

Principle: Errors are data, not surprises. Every function that can fail returns a Result type. The CLI is the only layer that converts errors to human-readable messages and exit codes. Core never prints to stdout. Core never calls process.exit().


1. Result Type

All core functions that can fail return this discriminated union:

type Result<T, E = KindlmError> =
  | { success: true; data: T }
  | { success: false; error: E };

KindlmError

interface KindlmError {
  code: ErrorCode;
  message: string;          // Human-readable, shown to user
  details?: Record<string, unknown>;  // Machine-readable context
  cause?: Error;            // Original error (for stack traces in debug mode)
}

type ErrorCode =
  // Config errors (1xx)
  | 'CONFIG_NOT_FOUND'       // 100 — kindlm.yaml doesn't exist
  | 'CONFIG_PARSE_ERROR'     // 101 — YAML syntax error
  | 'CONFIG_VALIDATION_ERROR'// 102 — Zod validation failed
  | 'CONFIG_FILE_REF_ERROR'  // 103 — Referenced file doesn't exist (schemaFile, system_prompt_file)
  
  // Provider errors (2xx)
  | 'PROVIDER_NOT_FOUND'     // 200 — Unknown provider string
  | 'PROVIDER_AUTH_ERROR'    // 201 — API key missing or invalid
  | 'PROVIDER_RATE_LIMIT'    // 202 — 429 from provider
  | 'PROVIDER_TIMEOUT'       // 203 — Request timed out
  | 'PROVIDER_API_ERROR'     // 204 — Non-retryable API error (400, 500)
  | 'PROVIDER_NETWORK_ERROR' // 205 — DNS, connection refused, etc.
  
  // Assertion errors (3xx)
  | 'ASSERTION_EVAL_ERROR'   // 300 — Assertion logic failed unexpectedly
  | 'SCHEMA_FILE_ERROR'      // 301 — JSON Schema file invalid or not found
  | 'JUDGE_EVAL_ERROR'       // 302 — Judge model failed to return a score
  
  // Engine errors (4xx)
  | 'ENGINE_MAX_TURNS'       // 400 — Multi-turn loop hit max iterations
  | 'ENGINE_EMPTY_RESPONSE'  // 401 — Provider returned empty content
  
  // Baseline errors (5xx)
  | 'BASELINE_NOT_FOUND'     // 500 — No baseline saved
  | 'BASELINE_CORRUPT'       // 501 — Baseline JSON can't be parsed
  | 'BASELINE_VERSION_MISMATCH' // 502 — Baseline from incompatible version
  
  // Cloud errors (6xx)
  | 'CLOUD_AUTH_ERROR'       // 600 — Not logged in or token expired
  | 'CLOUD_UPLOAD_ERROR'     // 601 — Upload failed
  | 'CLOUD_PLAN_LIMIT'       // 602 — Feature requires higher plan
  | 'CLOUD_RATE_LIMIT'       // 603 — Cloud API rate limited
  
  // System errors (9xx)
  | 'UNKNOWN_ERROR';         // 999 — Unexpected error

2. Error Flow

Provider API → ProviderAdapter → Engine → Reporter → CLI → User
     ↓              ↓              ↓          ↓         ↓
  HTTP error    Result type    Result type  Result   Exit code
  (thrown)      (caught at     (propagated) (propagated) + message
                 boundary)

Layer responsibilities:

LayerCatchesReturnsAllowed to print
Provider adapterHTTP errors, JSON parse errorsResult<ProviderResponse>No
Assertion handlerEvaluation errorsResult<AssertionResult>No
EngineNothing — propagates ResultsResult<RunResult>No
ReporterNothing — receives RunResultResult<string> (formatted output)No
CLIUnwraps all ResultsExit code + stderr/stdoutYes

3. Provider Error Handling

Provider adapters are the boundary between external HTTP APIs and our code. They're the only place where try/catch is used in core.

// packages/core/src/providers/openai.ts
async complete(request: ProviderRequest): Promise<Result<ProviderResponse>> {
  try {
    const response = await fetch('https://api.openai.com/v1/chat/completions', {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${this.apiKey}`,
        'Content-Type': 'application/json',
      },
      body: JSON.stringify(toOpenAIFormat(request)),
      signal: AbortSignal.timeout(request.timeout ?? 30_000),
    });

    if (response.status === 401) {
      return {
        success: false,
        error: {
          code: 'PROVIDER_AUTH_ERROR',
          message: `OpenAI API key is invalid or missing. Set OPENAI_API_KEY environment variable.`,
        }
      };
    }

    if (response.status === 429) {
      const retryAfter = response.headers.get('retry-after');
      return {
        success: false,
        error: {
          code: 'PROVIDER_RATE_LIMIT',
          message: `OpenAI rate limit hit. Retry after ${retryAfter ?? 'unknown'} seconds.`,
          details: { retryAfter, provider: 'openai' },
        }
      };
    }

    if (!response.ok) {
      const body = await response.text();
      return {
        success: false,
        error: {
          code: 'PROVIDER_API_ERROR',
          message: `OpenAI returned ${response.status}: ${body.slice(0, 200)}`,
          details: { status: response.status, body },
        }
      };
    }

    const data = await response.json();
    return { success: true, data: fromOpenAIFormat(data) };

  } catch (err) {
    if (err instanceof DOMException && err.name === 'AbortError') {
      return {
        success: false,
        error: {
          code: 'PROVIDER_TIMEOUT',
          message: `OpenAI request timed out after ${request.timeout ?? 30_000}ms`,
          cause: err as Error,
        }
      };
    }

    return {
      success: false,
      error: {
        code: 'PROVIDER_NETWORK_ERROR',
        message: `Failed to connect to OpenAI: ${(err as Error).message}`,
        cause: err as Error,
      }
    };
  }
}

Retry logic

Retries happen in the engine, not in adapters. Adapters are stateless — they make one request and return one Result.

// packages/core/src/engine/retry.ts
const RETRYABLE_CODES: ErrorCode[] = [
  'PROVIDER_RATE_LIMIT',
  'PROVIDER_TIMEOUT',
  'PROVIDER_NETWORK_ERROR',
];

async function withRetry<T>(
  fn: () => Promise<Result<T>>,
  maxRetries: number = 2,
  backoffMs: number = 1000,
): Promise<Result<T>> {
  let lastResult: Result<T>;
  
  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    lastResult = await fn();
    
    if (lastResult.success) return lastResult;
    if (!RETRYABLE_CODES.includes(lastResult.error.code)) return lastResult;
    if (attempt < maxRetries) {
      await sleep(backoffMs * Math.pow(2, attempt)); // Exponential backoff
    }
  }
  
  return lastResult!;
}

4. Assertion Error Handling

Assertions should never crash the test run. A failing assertion is a test result, not an error.

// Assertion returns Result, not throw
function evaluateToolCalled(
  response: ProviderResponse,
  config: ToolCalledConfig,
): Result<AssertionResult> {
  // This is a test failure, NOT an error:
  if (!response.toolCalls?.length) {
    return {
      success: true, // Result succeeded (we got a result)
      data: {
        pass: false, // But the assertion failed
        message: `Expected tool "${config.tool}" to be called, but no tools were called`,
        score: 0,
      }
    };
  }
  
  // This IS an error (something went wrong evaluating):
  // Only happens if our own code has a bug
}

Key distinction:

ScenarioResult.successAssertionResult.passMeaning
Tool was called correctlytruetrueTest passed
Tool was not calledtruefalseTest failed (expected outcome)
Assertion code crashedfalseN/AError in KindLM itself

5. Engine Error Handling

The engine orchestrates test execution. It collects Results from providers and assertions and produces a RunResult.

Test-level errors

If a provider call fails for a single test, that test is marked as errored (not failed). Remaining tests continue.

interface TestResult {
  name: string;
  status: 'passed' | 'failed' | 'errored' | 'skipped';
  assertions: AssertionResult[];
  error?: KindlmError;  // Only set when status === 'errored'
  latencyMs: number;
  costUsd: number;
}

Suite-level errors

If config parsing fails, the entire suite is skipped with an error message. Other suites still run.

Run-level errors

Only fatal errors (config file not found, no valid suites) stop the entire run.

Run
├── Suite A (3 tests)
│   ├── Test 1: passed
│   ├── Test 2: failed (assertion failed — expected behavior)
│   └── Test 3: errored (provider timeout — infrastructure issue)
├── Suite B (2 tests) — skipped (invalid config reference)
│   └── Error: system_prompt_file "prompts/missing.txt" not found
└── Suite C (2 tests)
    ├── Test 1: passed
    └── Test 2: passed

Summary: 3 passed, 1 failed, 1 errored, 2 skipped (Suite B)
Exit code: 1

6. CLI Error Messages

The CLI is responsible for converting error codes to user-friendly messages.

Formatting rules:

  1. Error message on first line (bold red in terminal)
  2. Actionable fix on second line
  3. Debug details only with --verbose
Examples:

✗ Config error: Unknown assertion type "tool_caled" at line 23
  Did you mean "tool_called"? See: https://kindlm.com/docs/assertions

✗ OpenAI API key is invalid or missing
  Set OPENAI_API_KEY in your environment or .env file

✗ OpenAI rate limit hit. Retry after 30 seconds.
  Reduce concurrency with --concurrency 1 or add retry config

✗ Schema file not found: schemas/order.json
  Referenced in suite "order-agent", test "happy-path"

✗ Baseline not found for suite "refund-agent"
  Run: kindlm baseline set

Exit codes:

CodeMeaning
0All tests passed, all gates passed
1Tests failed OR gate failed OR error occurred

We intentionally keep it simple — 0 or 1. CI systems only need pass/fail.

--verbose flag:

Adds stack traces, full provider response bodies, timing per assertion, and retry attempts to output.


7. Cloud API Error Responses

All Cloud API errors return consistent JSON:

{
  "error": "plan_required",
  "message": "This feature requires a Team or Enterprise plan. Current plan: free",
  "details": {
    "required_plan": "team",
    "current_plan": "free",
    "feature": "pdf_export",
    "upgrade_url": "https://cloud.kindlm.com/settings/billing"
  }
}

HTTP status mapping:

Error CodeHTTP StatusWhen
CLOUD_AUTH_ERROR401Missing/invalid/expired token
CLOUD_PLAN_LIMIT403Feature needs higher plan
not_found404Resource doesn't exist or not in user's org
conflict409Duplicate project name
validation_error422Invalid request body (Zod error details included)
CLOUD_RATE_LIMIT429Rate limit exceeded
payload_too_large413Upload > 5MB
UNKNOWN_ERROR500Unexpected server error (logged, not exposed to user)

8. Logging

CLI logging levels:

kindlm test              → Only errors and summary
kindlm test --verbose    → Errors, warnings, info, debug details
DEBUG=kindlm* kindlm test → Full debug output (Node.js debug module)

Cloud logging:

Cloudflare Workers logs via console.log → Cloudflare dashboard + optional Logpush to external service.

Log format (structured JSON):

{
  "level": "error",
  "code": "PROVIDER_TIMEOUT",
  "message": "OpenAI request timed out after 30000ms",
  "org_id": "org_a1b2c3",
  "run_id": "run_x1y2z3",
  "timestamp": "2026-02-15T10:30:00Z"
}

Never log: API keys, user prompts/responses (PII risk), full provider response bodies in production.


9. Graceful Degradation

FailureBehavior
One provider fails, others succeedFailed provider's tests marked as errored, others complete
Judge model unavailableJudge assertions marked as errored with suggestion to retry
Cloud upload failsCLI warns but exits based on test results (not upload status)
Baseline file corruptedCompare fails gracefully with "baseline corrupt" error, tests still run
Disk full (can't write report)Stderr warning, test results still shown in terminal
SIGINT (Ctrl+C)Graceful shutdown — print partial results, exit 1