Data Model

Prometheus stores time series, which are streams of numeric values that are sampled at ongoing timestamps:

Prometheus data model in a graph

On a high level, each time series consists of an identifier and a set of sample values:

Prometheus data model

In Prometheus 3.0, the data model allows any UTF-8 characters in metric names and label names. However, this comes with some caveats described further below.

Series identifiers

Every series is uniquely identified by a metric name and a set of key/value pairs called "labels". For example, one of the series identifiers in the diagram above is:

http_requests_total{job="api-server",instance="10.0.0.1:443",method="GET"}

Prometheus automatically creates and indexes series identifiers in its TSDB the first time it sees them, so there is no explicit schema you need to predefine.

Metric names

The metric name identifies the overall aspect of a system that is being measured. For example, the metric name http_requests_total indicates the total number of HTTP requests handled by a given server process, while process_resident_memory_bytes would indicate the amount of resident memory (in bytes) that a process is currently using.

Labels

Labels allow you to split up, or partition, a metric into subdimensions. For example, the instance label in the example above tells you which particular instance (process) the metric came from, while the job label indicates the job, or group of processes, that the instance belongs to. The method label further subdivides the metric by the HTTP method used within the process.

Series samples

Samples form the bulk of the data of a series and are appended to an indexed series over time:

  • Timestamps are 64-bit integer Unix timestamps in millisecond precision.
  • Sample values can be either:
    • A 64-bit floating point number
    • An entire high-resolution native histogram (experimental)

New in Prometheus 3.0: Full UTF-8 support

To improve Prometheus' compatibility with OpenTelemetry metrics sources, Prometheus 3.0 introduced support for arbitrary UTF-8 characters in metric names and label names, so you are technically no longer limited to the original character set shown in the diagram above for these identifiers. However, we still recommend using the original character set for metric and label names to ensure compatibility with other systems and tools that may not support UTF-8 characters in these identifiers yet.

You will also encounter some downsides in PromQL when using the extended character set, since data selectors require more quoting and a slightly different syntax.

For example, if you have a selector like this for the original character set:

my_metric{my_label="value"}

...you will have to be change it to the following when introducing previously unsupported characters (dots instead of underscores in this example):

{"my.metric", "my.label"="value"}

As you can see, the metric name now has to be defined inside of the label matcher list, and you have to quote both the metric name and the my.label label name. This syntax is more cumbersome to write and read, so keep this in mind before deciding to go beyond the original character set for identifiers.