Query Log

When your Prometheus server is experiencing a high query load, it can be useful to get more insight into the specific PromQL queries that your Prometheus server is running. Prometheus allows you to turn on an optional query log file that logs all PromQL queries, along with key execution statistics. Since this log file is configured in Prometheus' main configuration file (which can be live-reloaded), you can turn it off and on at run-time as needed.

You can configure the query log file using the query_log_file field in the global section of the main Prometheus configuration file. For example, to write all PromQL queries to a file named /tmp/prometheus-queries.log, the global section could read:

global:
  query_log_file: /tmp/prometheus-queries.log

After configuring this setting, Prometheus will write every completed query to the log file in a JSON-based format.

Here is an example of a 1-hour range query for the PromQL expression sum by(mode) (rate(demo_cpu_usage_seconds_total[5m])):

{
   "httpRequest" : {
      "clientIP" : "::1",
      "method" : "GET",
      "path" : "/api/v1/query_range"
   },
   "params" : {
      "end" : "2023-04-09T18:06:35.070Z",
      "query" : "sum by(mode) (rate(demo_cpu_usage_seconds_total[5m]))",
      "start" : "2023-04-09T17:06:35.070Z",
      "step" : 14
   },
   "spanID" : "0000000000000000",
   "stats" : {
      "samples" : {
         "peakSamples" : 597,
         "totalQueryableSamples" : 7164
      },
      "timings" : {
         "evalTotalTime" : 0.000942917,
         "execQueueTime" : 1.9981e-05,
         "execTotalTime" : 0.000976921,
         "innerEvalTime" : 0.000817501,
         "queryPreparationTime" : 0.000105761,
         "resultSortTime" : 2.403e-06
      }
   },
   "ts" : "2023-04-09T18:06:35.076Z"
}

The fields mean the following:

  • spanID: The trace span ID, when using request tracing.
  • params: The query parameters like start and end time, resolution step, and query expression.
  • ts: The timestamp of when the query was executed.
  • stats: This contains a range of internal statistics that can give you an idea about the size and duration of the query. Most importantly, the timings.evalTotalTime field tells you the total query execution time.
  • httpRequest: For queries that were triggered over HTTP, this contains the HTTP path and method, as well as the triggering client's IP address. This is helpful for tracking down where queries are coming from.

This query log is very useful for tracking down slow or expensive queries that are overloading your Prometheus servers. Note however, that the query log file only includes queries that have already completed. If an expensive query causes your Prometheus to crash due to an out-of-memory (OOM) error, you will not be able to find out the query causing the OOM from this log file. We will learn how to deal with this kind of situation in the next section.