Prometheus Monitoring Guide

The Docker Awakening Gateway natively exposes application telemetrics via a standard Prometheus /_metrics endpoint. This allows administrators to monitor gateway performance, container awakening activity, and resource optimization (idle shutdowns) globally or per-container.

1. Overview

By default, the gateway serves metrics on port 8080 (or whatever gateway.port is configured to) under the /_metrics HTTP path. This endpoint serves:

Standard Go runtime metrics (go_gc_*, go_memstats_*, go_goroutines, etc.)
Custom Gateway metrics (prefixed with gateway_*).

Note: The /_metrics endpoint is considered internal. It is excluded from proxy routing and is rate-limited exactly like the /_health checks.

2. Prometheus Scrape Configuration

To instruct Prometheus to scrape the gateway, add a job to your prometheus.yml:

scrape_configs:
  - job_name: 'docker-gateway'
    scrape_interval: 15s
    static_configs:
      - targets: ['gateway:8080'] # Replace with your gateway IP/hostname and port

3. Available Custom Metrics

All custom metrics use the container label. This allows you to differentiate traffic and lifecycle events for my-app vs slow-app, ensuring granular observability.

Metric Name	Type	Labels	Description
`gateway_requests_total`	Counter	`container`, `status_code`	Total number of HTTP requests that successfully passed through the reverse proxy. `status_code` allows distinguishing between `200 OK`, `502 Bad Gateway`, etc.
`gateway_request_duration_seconds`	Histogram	`container`	Tracks the entire latency of the HTTP request, including proxying time.
`gateway_starts_total`	Counter	`container`, `result`	Counts every attempt to wake up a sleeping container. `result` is either `success` (container started and TCP answered) or `error` (timeout, crash, network issue).
`gateway_start_duration_seconds`	Histogram	`container`	Tracks the time it takes for a container to go from “starting” to fully “running” (TCP port responding). Crucial for optimizing `start_timeout` values.
`gateway_idle_stops_total`	Counter	`container`	Increments every time a container is automatically stopped by the gateway because its `idle_timeout` threshold was exceeded.

4. Useful PromQL Queries (Grafana Examples)

Here are some standard queries you can use to build a Grafana dashboard monitoring your sleep/wake environment.

Traffic & Performance

Requests per second (RPS) per container

rate(gateway_requests_total[5m])

95th Percentile Response Time (Latency)

histogram_quantile(0.95, rate(gateway_request_duration_seconds_bucket[5m]))

Error Rate (HTTP 5xx)

sum by (container) (rate(gateway_requests_total{status_code=~"5.."}[5m]))

Awakening & Lifecyles

Awakening Success Rate vs Failure Rate

sum by (result) (rate(gateway_starts_total[1h]))

Average Awakening Time (Cold Start Penalty)

rate(gateway_start_duration_seconds_sum[1h]) 
/ 
rate(gateway_start_duration_seconds_count[1h])

Containers stopped to save resources (last 24h)

increase(gateway_idle_stops_total[24h])