Context¶
Production deployments require visibility into per-provider performance to diagnose bottlenecks, set SLOs, and inform capacity planning. Operators need to answer: “Which diagram type is slowest?”, “Which provider fails most?”, “What’s the P99 latency?”
Decision¶
Prometheus Metrics (per diagram provider)¶
All metrics are tagged with {provider, format} dimensions for granular breakdowns:
| Metric | Type | Notes |
|---|---|---|
kroki_requests_total | Counter | Per provider × format |
kroki_request_duration_seconds | Histogram | p75, p90, p99 buckets |
kroki_rendering_errors_total | Counter | Tagged with error_kind |
kroki_payload_size_bytes | Histogram | Input payload size distribution |
kroki_conversion_time_seconds | Histogram | Provider-internal render time |
kroki_active_connections | Gauge | Current concurrent requests |
kroki_circuit_breaker_state | Gauge | 0=closed, 1=open, 2=half-open |
Implementation¶
Use
metrics+metrics-exporter-prometheuscrates (lightweight, no OTel overhead by default).Metrics collection is always-on when configured. The
/metricsexport endpoint on the admin port is optional (server.metrics.export_endpoint = true).Histogram buckets are tuned for diagram rendering:
[0.01, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0]seconds.
OpenTelemetry (Optional)¶
Behind the
otelCargo feature flag.Bridges existing
tracingspans to OTLP viatracing-opentelemetry.OTLP endpoint configurable via
kroki.toml(server.telemetry.otlp_endpoint) orOTEL_EXPORTER_OTLP_ENDPOINTenv var.
Configuration¶
[server.metrics]
enabled = true
export_endpoint = false
[server.telemetry]
enabled = false
otlp_endpoint = ""Consequences¶
Positive: Operators get deep visibility into per-provider performance without any external tooling.
Positive: Optional export endpoint means zero overhead when not scraping.
Positive: OTel integration reuses existing
tracinginstrumentation — no code duplication.Negative: Histogram memory footprint grows linearly with number of providers × formats.