Prometheus Monitoring Guide

The Docker Awakening Gateway natively exposes application telemetrics via a standard Prometheus /_metrics endpoint. This allows administrators to monitor gateway performance, container awakening activity, and resource optimization (idle shutdowns) globally or per-container.

1. Overview

By default, the gateway serves metrics on port 8080 (or whatever gateway.port is configured to) under the /_metrics HTTP path. This endpoint serves:

  • Standard Go runtime metrics (go_gc_*, go_memstats_*, go_goroutines, etc.)
  • Custom Gateway metrics (prefixed with gateway_*).

Note: The /_metrics endpoint is considered internal. It is excluded from proxy routing and is rate-limited exactly like the /_health checks.

2. Prometheus Scrape Configuration

To instruct Prometheus to scrape the gateway, add a job to your prometheus.yml:

scrape_configs:
  - job_name: 'docker-gateway'
    scrape_interval: 15s
    static_configs:
      - targets: ['gateway:8080'] # Replace with your gateway IP/hostname and port

3. Available Custom Metrics

All custom metrics use the container label. This allows you to differentiate traffic and lifecycle events for my-app vs slow-app, ensuring granular observability.

Metric Name Type Labels Description
gateway_requests_total Counter container, status_code Total number of HTTP requests that successfully passed through the reverse proxy. status_code allows distinguishing between 200 OK, 502 Bad Gateway, etc.
gateway_request_duration_seconds Histogram container Tracks the entire latency of the HTTP request, including proxying time.
gateway_starts_total Counter container, result Counts every attempt to wake up a sleeping container. result is either success (container started and TCP answered) or error (timeout, crash, network issue).
gateway_start_duration_seconds Histogram container Tracks the time it takes for a container to go from “starting” to fully “running” (TCP port responding). Crucial for optimizing start_timeout values.
gateway_idle_stops_total Counter container Increments every time a container is automatically stopped by the gateway because its idle_timeout threshold was exceeded.

4. Useful PromQL Queries (Grafana Examples)

Here are some standard queries you can use to build a Grafana dashboard monitoring your sleep/wake environment.

Traffic & Performance

Requests per second (RPS) per container

rate(gateway_requests_total[5m])

95th Percentile Response Time (Latency)

histogram_quantile(0.95, rate(gateway_request_duration_seconds_bucket[5m]))

Error Rate (HTTP 5xx)

sum by (container) (rate(gateway_requests_total{status_code=~"5.."}[5m]))

Awakening & Lifecyles

Awakening Success Rate vs Failure Rate

sum by (result) (rate(gateway_starts_total[1h]))

Average Awakening Time (Cold Start Penalty)

rate(gateway_start_duration_seconds_sum[1h]) 
/ 
rate(gateway_start_duration_seconds_count[1h])

Containers stopped to save resources (last 24h)

increase(gateway_idle_stops_total[24h])

Docker Awakening Gateway — MIT License