Data Model
Prometheus stores time series, which are streams of numeric values that are sampled at ongoing timestamps:
On a high level, each time series consists of an identifier and a set of sample values:
In Prometheus 3.0, the data model allows any UTF-8 characters in metric names and label names. However, this comes with some caveats described further below.
Series identifiers
Every series is uniquely identified by a metric name and a set of key/value pairs called "labels". For example, one of the series identifiers in the diagram above is:
http_requests_total{job="api-server",instance="10.0.0.1:443",method="GET"}
Prometheus automatically creates and indexes series identifiers in its TSDB the first time it sees them, so there is no explicit schema you need to predefine.
Metric names
The metric name identifies the overall aspect of a system that is being measured. For example, the metric name http_requests_total
indicates the total number of HTTP requests handled by a given server process, while process_resident_memory_bytes
would indicate the amount of resident memory (in bytes) that a process is currently using.
Labels
Labels allow you to split up, or partition, a metric into subdimensions. For example, the instance
label in the example above tells you which particular instance (process) the metric came from, while the job
label indicates the job, or group of processes, that the instance belongs to. The method
label further subdivides the metric by the HTTP method used within the process.
Series samples
Samples form the bulk of the data of a series and are appended to an indexed series over time:
- Timestamps are 64-bit integer Unix timestamps in millisecond precision.
- Sample values can be either:
- A 64-bit floating point number
- An entire high-resolution native histogram (experimental)
New in Prometheus 3.0: Full UTF-8 support
To improve Prometheus' compatibility with OpenTelemetry metrics sources, Prometheus 3.0 introduced support for arbitrary UTF-8 characters in metric names and label names, so you are technically no longer limited to the original character set shown in the diagram above for these identifiers. However, we still recommend using the original character set for metric and label names to ensure compatibility with other systems and tools that may not support UTF-8 characters in these identifiers yet.
You will also encounter some downsides in PromQL when using the extended character set, since data selectors require more quoting and a slightly different syntax.
For example, if you have a selector like this for the original character set:
my_metric{my_label="value"}
...you will have to be change it to the following when introducing previously unsupported characters (dots instead of underscores in this example):
{"my.metric", "my.label"="value"}
As you can see, the metric name now has to be defined inside of the label matcher list, and you have to quote both the metric name and the my.label
label name. This syntax is more cumbersome to write and read, so keep this in mind before deciding to go beyond the original character set for identifiers.