Skip to content

Create initial Entities data model specification. #4442

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 29 commits into from
Apr 10, 2025
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
8c6d33f
Create initial Entities data model specification.
jsuereth Mar 6, 2025
14b0d1a
Add changelog with PR number.
jsuereth Mar 6, 2025
0804931
Markdownlint.
jsuereth Mar 6, 2025
02b1dc5
Fix more markdownlint.
jsuereth Mar 6, 2025
fc3e90a
Fix lint issue.
jsuereth Mar 6, 2025
80769a0
Fix lint issues.
jsuereth Mar 6, 2025
c8adaf0
Apply suggestions from code review
jsuereth Mar 13, 2025
1f10be2
Update specification/entities/data-model.md
jsuereth Mar 13, 2025
1d43246
Address missing repeatability language for id.
jsuereth Mar 13, 2025
912e572
Add cached but not saved vscode changes.
jsuereth Mar 18, 2025
9865527
Fix typos.
jsuereth Mar 18, 2025
3b112cf
Apply suggestions from code review
jsuereth Mar 18, 2025
45173f3
Update specification/entities/data-model.md
jsuereth Mar 18, 2025
a2d6288
Update toc.
jsuereth Mar 18, 2025
2cfecbf
reword a poorly worded sentence.
jsuereth Mar 18, 2025
5e717cf
Merge branch 'main' into wip-entities-data-model
jsuereth Apr 4, 2025
9ee7c45
Enforce 80 character limit on markdown lines.
jsuereth Apr 4, 2025
ae7e933
Address some comments.
jsuereth Apr 4, 2025
ef20cc8
Clean up the specification examples.
jsuereth Apr 4, 2025
15c85fc
Another english cleanup.
jsuereth Apr 4, 2025
c46fa82
Another language nit cleaned up.
jsuereth Apr 4, 2025
8436b6e
Add a better transition statement.
jsuereth Apr 4, 2025
9a86182
Regenerate toc.
jsuereth Apr 4, 2025
62a8848
Fix last nit comment.
jsuereth Apr 4, 2025
14742d8
Update layout from feedback.
jsuereth Apr 8, 2025
4422a24
Fix lint issue.
jsuereth Apr 8, 2025
316d170
Remove references to Resource being immutable on the data model.
jsuereth Apr 9, 2025
b39bd07
Fix up bad reference.
jsuereth Apr 9, 2025
67e7e5d
Update specification/resource/README.md
jsuereth Apr 10, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,9 @@ release.

### Resource

- Add Datamodel for Entities
([#4442](https://github.com/open-telemetry/opentelemetry-specification/pull/4442))

### Profiles

### OpenTelemetry Protocol
Expand Down
27 changes: 27 additions & 0 deletions specification/entities/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
<!--- Hugo front matter used to generate the website version of this page:
path_base_for_github_subdir:
from: tmp/otel/specification/entities/_index.md
to: entities/README.md
--->

# Entities

<details>
<sumamry>Table of Contents</summary>

<!-- toc -->

- [Overview](#overview)
- [Specifications](#specifications)

<!-- tocstop -->

</details>

## Overview

Entity represents an object of interest associated with produced telemetry: traces, metrics, logs, etc.

## Specifications

- [Data Model](./data-model.md)
182 changes: 182 additions & 0 deletions specification/entities/data-model.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,182 @@
# Entity Data Model

**Status**: [Development](../document-status.md)

<details>
<summary>Table of Contents</summary>

<!-- toc -->

- [Minimally Sufficient Id](#minimally-sufficient-id)
- [Examples of Entities](#examples-of-entities)

<!-- tocstop -->

</details>

Entity represents an object of interest associated with produced telemetry:
traces, metrics or logs.

For example, telemetry produced using OpenTelemetry SDK is normally associated with
a `service` entity. Similarly, OpenTelemetry defines system metrics for a `host`. The `host` is the
entity we want to associate metrics with in this case.

Entities may be also associated with produced telemetry indirectly.
For example a service that produces
telemetry is also related with a process in which the service runs, so we say that
the Service entity is related to the `process` entity. The process normally also runs
on a host, so we say that the `process` entity is related to the `host` entity.

> Note: How entities are associated will be refined in future specification work.

The data model below defines a logical model for an entity (irrespective of the physical
format and encoding of how entity data is recorded).

<table>
<tr>
<td><strong>Field</strong>
</td>
<td><strong>Type</strong>
</td>
<td><strong>Description</strong>
</td>
</tr>
<tr>
<td>Type
</td>
<td>string
</td>
<td>Defines the type of the entity. MUST not change during the
lifetime of the entity. For example: "service" or "host". This field is
required and MUST not be empty for valid entities.
</td>
</tr>
<tr>
<td>Id
</td>
<td>map&lt;string, attribute&gt;
</td>
<td>Attributes that identify the entity.
<p>
MUST not change during the lifetime of the entity. The Id must contain
at least one attribute.
<p>
Follows OpenTelemetry <a
href="../../specification/common/README.md#attribute">common
attribute definition</a>. SHOULD follow OpenTelemetry <a
href="https://github.com/open-telemetry/semantic-conventions">semantic
conventions</a> for attributes.
</td>
</tr>
<tr>
<td>Description
</td>
<td>map&lt;string, any&gt;
</td>
<td>Descriptive (non-identifying) attributes of the entity.
<p>
MAY change over the lifetime of the entity. MAY be empty. These
attributes are not part of entity's identity.
<p>
Follows <a
href="../../specification/logs/data-model.md#type-any">any</a>
value definition in the OpenTelemetry spec - it can be a scalar value,
byte array, an array or map of values. Arbitrary deep nesting of values
for arrays and maps is allowed.
<p>
SHOULD follow OpenTelemetry <a
href="https://github.com/open-telemetry/semantic-conventions">semantic
conventions</a> for attributes.
</td>
</tr>
</table>

## Minimally Sufficient Id

Commonly, a number of attributes of an entity are readily available for the telemetry
producer to compose an Id from. Of the available attributes the entity Id should
include the minimal set of attributes that is sufficient for uniquely identifying
that entity. For example
a Process on a host can be uniquely identified by (`process.pid`,`process.start_time`)
attributes. Adding for example `process.executable.name` attribute to the Id is
unnecessary and violates the Minimally Sufficient Id rule.

## Examples of Entities

_This section is non-normative and is present only for the purposes of demonstrating
the data model._

Here are examples of entities, the typical identifying attributes they
have and some examples of non-identifying attributes that may be
associated with the entity.

_Note: These examples MAY diverge from semantic conventions._

<table>
<tr>
<td><strong>Entity</strong>
</td>
<td><strong>Entity Type</strong>
</td>
<td><strong>Identifying Attributes</strong>
</td>
<td><strong>Non-identifying Attributes</strong>
</td>
</tr>
<tr>
<td>Service
</td>
<td>"service"
</td>
<td>service.name (required)
<p>
service.instance.id
<p>
service.namespace
</td>
<td>service.version
</td>
</tr>
<tr>
<td>Host
</td>
<td>"host"
</td>
<td>host.id
</td>
<td>host.name
<p>
host.type
<p>
host.image.id
<p>
host.image.name
</td>
</tr>
<tr>
<td>K8s Pod
</td>
<td>"k8s.pod"
</td>
<td>k8s.pod.uid (required)
<p>
k8s.cluster.name
</td>
<td>Any pod labels
</td>
</tr>
<tr>
<td>K8s Pod Container
</td>
<td>"container"
</td>
<td>k8s.pod.uid (required)
<p>
k8s.cluster.name
<p>
container.name
</td>
<td>Any container labels
</td>
</tr>
</table>
26 changes: 26 additions & 0 deletions specification/resource/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,29 @@ path_base_for_github_subdir:
--->

# Resource

<details>
<sumamry>Table of Contents</summary>

<!-- toc -->

- [Overview](#overview)
- [Specifications](#specifications)

<!-- tocstop -->

</details>

## Overview

A Resource is an immutable representation of the entity producing telemetry.
Within OpenTelemetry, all signals are associated with a Resource, enabling
contextual correlation of data from the same source. For Example, if I see
a high latency in a span I should be able to check the metrics for the
same entity that produced that Span during the time when the latency was
observed.

## Specifications

- [Data Model](./data-model.md)
- [Resource SDK](./sdk.md)
59 changes: 59 additions & 0 deletions specification/resource/data-model.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# Resource Data Model

**Status**: [Development](../document-status.md)

<details>
<summary>Table of Contents</summary>

<!-- toc -->

- [Identity](#identity)
* [Navigation](#navigation)
* [Telescoping](#telescoping)

<!-- tocstop -->

</details>

A Resource is an immutable representation of the entity producing telemetry as Attributes. For example, a process producing telemetry that is running in a container on Kubernetes has a Pod name, it is in a namespace and possibly is part of a Deployment which also has a name. All three of these attributes can be included in the Resource. Note that there are certain "standard attributes" that have prescribed meanings.

A resource is composed of [`Entity`](../entities/README.md) and raw attributes.

Resource provides two important aspects for observability:

- It MUST *identify* an entity that is producing telemetry.
- It SHOULD allow users to determine *where* that entity resides within their infrastructure.

## Identity

Most resources are a composition of `Entity`. `Entity` is described [here](../entities/data-model.md), and includes its own notion of identity. The identity of a resource is the set
of entities contained within it. Two resources are considered different if one
contains an entity not found in the other.

Some resources include raw attributes in additon to Entities. Raw attributes are
considered identifying on a resource. That is, if the key-value pairs of
raw attributes are different, then you can assume the resource is different.

### Navigation

Implicit in the design of Resource and attributes is ensuring users are able to navigate their infrastructure, tools, Uis, etc. to find the *same* entity that telemetry is reporting against. For example, in the definition above, we see a few components listed for one entity:

- A process
- A container
- A kubernetes pod name
- A namespace
- A deployment

By including identifying attributes of each of these, we can help users navigate through their `kubectl` or kubernetes UIs to find the specific process generating telemetry. This is as important as being able to uniquely identify one process from another.

> Aside: Observability signals SHOULD be actionable. Knowing a process is struggling is not as useful as > being able to scale up a deployment to take load off the struggling process.

If the only thing important to Resource was identity, we could simply use UUIDs.

### Telescoping

Within OpenTelemetry, we want to give users the flexibility to decide what information needs to be sent *with* observability signals and what information can be later joined. We call this "telescoping identity" where users can decide how *small* or *large* the size of an OpenTelemetry resource will be on the wire (and correspondingly, how large data points are when stored, depending on storage solution).

For example, in the extreme, OpenTelemery could synthesize a UUID for every system which produces telemetry. All identifying attributes for Resource and Entity could be sent via a side channel with known relationships to this UUID. While this would optimise the runtime generation and sending of telemetry, it comes at the cost of downstream storage systems needing to join data back together either at ingestion time or query time. For high performance use cases, e.g. alerting, these joins can be expensive.

In practice, users control Resource identity via the configuration of Resource Detection within SDKs and the collector. Users wishing for minimal identity will limit their resource detection just to a `service.instance.id`, for example. Some users highly customize resource detection with many concepts being appended.