COG - Prometheus

launch

Prometheus is an open-source systems monitoring and alerting toolkit. Its main features are:

  • a multi-dimensional data model with time series data identified by metric name and key/value pairs
  • PromQL, a flexible query language to leverage this dimensionality
  • no reliance on distributed storage; single server nodes are autonomous
  • time series collection happens via a pull model over HTTP
  • pushing time series is supported via an intermediary gateway
  • targets are discovered via service discovery or static configuration
  • multiple modes of graphing and dashboarding support

Support

For bug reports, feature requests, or general questions, please contact us.

Directions

  1. Launch AMI from the Amazon Marketplace
  2. To get started, create an SSH tunnel into your instance as the ec2-user user with: ssh -i /path/to/mykey.pem ec2-user@ec2-111-111-111-111.compute-1.amazonaws.com -L9090:localhost:9090
  3. Prometheus is already running and listening on port 9090.
  4. Edit "/etc/prometheus/prometheus.yml" if you want to change options.
    • Restart server with "/etc/init.d/prometheus restart"

Getting Started with Prometheus

See: Prometheus documentation for complete documentation.

You can also verify that Prometheus is serving metrics about itself by navigating to its own metrics endpoint: http://localhost:9090/metrics

Using the expression browser

Let us try looking at some data that Prometheus has collected about itself. To use Prometheus's built-in expression browser, navigate to http://localhost:9090/graph and choose the "Console" view within the "Graph" tab.

As you can gather from http://localhost:9090/metrics, one metric that Prometheus exports about itself is called promhttp_metric_handler_requests_total (the total number of /metrics requests the Prometheus server has served). Go ahead and enter this into the expression console:

promhttp_metric_handler_requests_total

This should return a number of different time series (along with the latest value recorded for each), all with the metric name promhttp_metric_handler_requests_total, but with different labels. These labels designate different requests statuses.

If we were only interested in requests that resulted in HTTP code 200, we could use this query to retrieve that information:

promhttp_metric_handler_requests_total{code="200"}

To count the number of returned time series, you could write:

count(promhttp_metric_handler_requests_total)

For more about the expression language, see the expression language documentation.

Using the graphing interface

To graph expressions, navigate to http://localhost:9090/graph and use the "Graph" tab.

For example, enter the following expression to graph the per-second HTTP request rate returning status code 200 happening in the self-scraped Prometheus:

rate(promhttp_metric_handler_requests_total{code="200"}[1m])

You can experiment with the graph range parameters and other settings.

Be sure to review the Prometheus documentation for complete documentation.