-
Notifications
You must be signed in to change notification settings - Fork 915
Create initial Entities data model specification. #4442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
jsuereth
merged 29 commits into
open-telemetry:main
from
jsuereth:wip-entities-data-model
Apr 10, 2025
Merged
Changes from 6 commits
Commits
Show all changes
29 commits
Select commit
Hold shift + click to select a range
8c6d33f
Create initial Entities data model specification.
jsuereth 14b0d1a
Add changelog with PR number.
jsuereth 0804931
Markdownlint.
jsuereth 02b1dc5
Fix more markdownlint.
jsuereth fc3e90a
Fix lint issue.
jsuereth 80769a0
Fix lint issues.
jsuereth c8adaf0
Apply suggestions from code review
jsuereth 1f10be2
Update specification/entities/data-model.md
jsuereth 1d43246
Address missing repeatability language for id.
jsuereth 912e572
Add cached but not saved vscode changes.
jsuereth 9865527
Fix typos.
jsuereth 3b112cf
Apply suggestions from code review
jsuereth 45173f3
Update specification/entities/data-model.md
jsuereth a2d6288
Update toc.
jsuereth 2cfecbf
reword a poorly worded sentence.
jsuereth 5e717cf
Merge branch 'main' into wip-entities-data-model
jsuereth 9ee7c45
Enforce 80 character limit on markdown lines.
jsuereth ae7e933
Address some comments.
jsuereth ef20cc8
Clean up the specification examples.
jsuereth 15c85fc
Another english cleanup.
jsuereth c46fa82
Another language nit cleaned up.
jsuereth 8436b6e
Add a better transition statement.
jsuereth 9a86182
Regenerate toc.
jsuereth 62a8848
Fix last nit comment.
jsuereth 14742d8
Update layout from feedback.
jsuereth 4422a24
Fix lint issue.
jsuereth 316d170
Remove references to Resource being immutable on the data model.
jsuereth b39bd07
Fix up bad reference.
jsuereth 67e7e5d
Update specification/resource/README.md
jsuereth File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
<!--- Hugo front matter used to generate the website version of this page: | ||
path_base_for_github_subdir: | ||
from: tmp/otel/specification/entities/_index.md | ||
to: entities/README.md | ||
---> | ||
|
||
# Entities | ||
|
||
<details> | ||
<sumamry>Table of Contents</summary> | ||
|
||
<!-- toc --> | ||
|
||
- [Overview](#overview) | ||
- [Specifications](#specifications) | ||
|
||
<!-- tocstop --> | ||
|
||
</details> | ||
|
||
## Overview | ||
|
||
Entity represents an object of interest associated with produced telemetry: traces, metrics, logs, etc. | ||
jsuereth marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
## Specifications | ||
|
||
- [Data Model](./data-model.md) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,182 @@ | ||
# Entity Data Model | ||
|
||
**Status**: [Development](../document-status.md) | ||
|
||
<details> | ||
<summary>Table of Contents</summary> | ||
|
||
<!-- toc --> | ||
|
||
- [Minimally Sufficient Id](#minimally-sufficient-id) | ||
- [Examples of Entities](#examples-of-entities) | ||
|
||
<!-- tocstop --> | ||
|
||
</details> | ||
|
||
Entity represents an object of interest associated with produced telemetry: | ||
traces, metrics or logs. | ||
jsuereth marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
For example, telemetry produced using OpenTelemetry SDK is normally associated with | ||
jsuereth marked this conversation as resolved.
Show resolved
Hide resolved
|
||
a `service` entity. Similarly, OpenTelemetry defines system metrics for a `host`. The `host` is the | ||
entity we want to associate metrics with in this case. | ||
|
||
Entities may be also associated with produced telemetry indirectly. | ||
For example a service that produces | ||
telemetry is also related with a process in which the service runs, so we say that | ||
jsuereth marked this conversation as resolved.
Show resolved
Hide resolved
|
||
the Service entity is related to the `process` entity. The process normally also runs | ||
jsuereth marked this conversation as resolved.
Show resolved
Hide resolved
|
||
on a host, so we say that the `process` entity is related to the `host` entity. | ||
|
||
> Note: How entities are associated will be refined in future specification work. | ||
jsuereth marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
The data model below defines a logical model for an entity (irrespective of the physical | ||
format and encoding of how entity data is recorded). | ||
|
||
<table> | ||
<tr> | ||
<td><strong>Field</strong> | ||
</td> | ||
<td><strong>Type</strong> | ||
</td> | ||
<td><strong>Description</strong> | ||
</td> | ||
</tr> | ||
<tr> | ||
<td>Type | ||
</td> | ||
<td>string | ||
</td> | ||
<td>Defines the type of the entity. MUST not change during the | ||
jsuereth marked this conversation as resolved.
Show resolved
Hide resolved
|
||
lifetime of the entity. For example: "service" or "host". This field is | ||
required and MUST not be empty for valid entities. | ||
</td> | ||
</tr> | ||
<tr> | ||
<td>Id | ||
</td> | ||
<td>map<string, attribute> | ||
jsuereth marked this conversation as resolved.
Show resolved
Hide resolved
|
||
</td> | ||
<td>Attributes that identify the entity. | ||
<p> | ||
MUST not change during the lifetime of the entity. The Id must contain | ||
at least one attribute. | ||
<p> | ||
Follows OpenTelemetry <a | ||
href="../../specification/common/README.md#attribute">common | ||
jsuereth marked this conversation as resolved.
Show resolved
Hide resolved
|
||
attribute definition</a>. SHOULD follow OpenTelemetry <a | ||
href="https://github.com/open-telemetry/semantic-conventions">semantic | ||
conventions</a> for attributes. | ||
</td> | ||
</tr> | ||
<tr> | ||
<td>Description | ||
</td> | ||
<td>map<string, any> | ||
</td> | ||
<td>Descriptive (non-identifying) attributes of the entity. | ||
<p> | ||
MAY change over the lifetime of the entity. MAY be empty. These | ||
attributes are not part of entity's identity. | ||
<p> | ||
Follows <a | ||
href="../../specification/logs/data-model.md#type-any">any</a> | ||
value definition in the OpenTelemetry spec - it can be a scalar value, | ||
byte array, an array or map of values. Arbitrary deep nesting of values | ||
jsuereth marked this conversation as resolved.
Show resolved
Hide resolved
|
||
for arrays and maps is allowed. | ||
<p> | ||
SHOULD follow OpenTelemetry <a | ||
href="https://github.com/open-telemetry/semantic-conventions">semantic | ||
conventions</a> for attributes. | ||
</td> | ||
</tr> | ||
</table> | ||
|
||
## Minimally Sufficient Id | ||
jsuereth marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Commonly, a number of attributes of an entity are readily available for the telemetry | ||
producer to compose an Id from. Of the available attributes the entity Id should | ||
include the minimal set of attributes that is sufficient for uniquely identifying | ||
that entity. For example | ||
a Process on a host can be uniquely identified by (`process.pid`,`process.start_time`) | ||
attributes. Adding for example `process.executable.name` attribute to the Id is | ||
unnecessary and violates the Minimally Sufficient Id rule. | ||
|
||
## Examples of Entities | ||
|
||
_This section is non-normative and is present only for the purposes of demonstrating | ||
the data model._ | ||
|
||
Here are examples of entities, the typical identifying attributes they | ||
have and some examples of non-identifying attributes that may be | ||
associated with the entity. | ||
|
||
_Note: These examples MAY diverge from semantic conventions._ | ||
|
||
<table> | ||
<tr> | ||
<td><strong>Entity</strong> | ||
</td> | ||
<td><strong>Entity Type</strong> | ||
</td> | ||
<td><strong>Identifying Attributes</strong> | ||
</td> | ||
<td><strong>Non-identifying Attributes</strong> | ||
</td> | ||
</tr> | ||
<tr> | ||
<td>Service | ||
</td> | ||
<td>"service" | ||
</td> | ||
<td>service.name (required) | ||
<p> | ||
service.instance.id | ||
jsuereth marked this conversation as resolved.
Show resolved
Hide resolved
|
||
<p> | ||
service.namespace | ||
</td> | ||
<td>service.version | ||
</td> | ||
</tr> | ||
<tr> | ||
<td>Host | ||
</td> | ||
<td>"host" | ||
</td> | ||
<td>host.id | ||
</td> | ||
<td>host.name | ||
<p> | ||
host.type | ||
<p> | ||
host.image.id | ||
<p> | ||
host.image.name | ||
</td> | ||
</tr> | ||
<tr> | ||
<td>K8s Pod | ||
</td> | ||
<td>"k8s.pod" | ||
</td> | ||
<td>k8s.pod.uid (required) | ||
<p> | ||
k8s.cluster.name | ||
</td> | ||
<td>Any pod labels | ||
</td> | ||
</tr> | ||
<tr> | ||
<td>K8s Pod Container | ||
</td> | ||
<td>"container" | ||
</td> | ||
<td>k8s.pod.uid (required) | ||
<p> | ||
k8s.cluster.name | ||
<p> | ||
container.name | ||
</td> | ||
<td>Any container labels | ||
</td> | ||
</tr> | ||
</table> |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
dyladan marked this conversation as resolved.
Show resolved
Hide resolved
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,59 @@ | ||
# Resource Data Model | ||
|
||
**Status**: [Development](../document-status.md) | ||
|
||
<details> | ||
<summary>Table of Contents</summary> | ||
|
||
<!-- toc --> | ||
|
||
- [Identity](#identity) | ||
* [Navigation](#navigation) | ||
* [Telescoping](#telescoping) | ||
|
||
<!-- tocstop --> | ||
|
||
</details> | ||
|
||
A Resource is an immutable representation of the entity producing telemetry as Attributes. For example, a process producing telemetry that is running in a container on Kubernetes has a Pod name, it is in a namespace and possibly is part of a Deployment which also has a name. All three of these attributes can be included in the Resource. Note that there are certain "standard attributes" that have prescribed meanings. | ||
jsuereth marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
A resource is composed of [`Entity`](../entities/README.md) and raw attributes. | ||
jsuereth marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Resource provides two important aspects for observability: | ||
|
||
- It MUST *identify* an entity that is producing telemetry. | ||
- It SHOULD allow users to determine *where* that entity resides within their infrastructure. | ||
|
||
## Identity | ||
|
||
Most resources are a composition of `Entity`. `Entity` is described [here](../entities/data-model.md), and includes its own notion of identity. The identity of a resource is the set | ||
of entities contained within it. Two resources are considered different if one | ||
contains an entity not found in the other. | ||
|
||
Some resources include raw attributes in additon to Entities. Raw attributes are | ||
considered identifying on a resource. That is, if the key-value pairs of | ||
raw attributes are different, then you can assume the resource is different. | ||
|
||
### Navigation | ||
|
||
Implicit in the design of Resource and attributes is ensuring users are able to navigate their infrastructure, tools, Uis, etc. to find the *same* entity that telemetry is reporting against. For example, in the definition above, we see a few components listed for one entity: | ||
jsuereth marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
- A process | ||
- A container | ||
- A kubernetes pod name | ||
- A namespace | ||
- A deployment | ||
|
||
By including identifying attributes of each of these, we can help users navigate through their `kubectl` or kubernetes UIs to find the specific process generating telemetry. This is as important as being able to uniquely identify one process from another. | ||
jsuereth marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
> Aside: Observability signals SHOULD be actionable. Knowing a process is struggling is not as useful as > being able to scale up a deployment to take load off the struggling process. | ||
jsuereth marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
If the only thing important to Resource was identity, we could simply use UUIDs. | ||
tigrannajaryan marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
### Telescoping | ||
|
||
Within OpenTelemetry, we want to give users the flexibility to decide what information needs to be sent *with* observability signals and what information can be later joined. We call this "telescoping identity" where users can decide how *small* or *large* the size of an OpenTelemetry resource will be on the wire (and correspondingly, how large data points are when stored, depending on storage solution). | ||
|
||
For example, in the extreme, OpenTelemery could synthesize a UUID for every system which produces telemetry. All identifying attributes for Resource and Entity could be sent via a side channel with known relationships to this UUID. While this would optimise the runtime generation and sending of telemetry, it comes at the cost of downstream storage systems needing to join data back together either at ingestion time or query time. For high performance use cases, e.g. alerting, these joins can be expensive. | ||
|
||
In practice, users control Resource identity via the configuration of Resource Detection within SDKs and the collector. Users wishing for minimal identity will limit their resource detection just to a `service.instance.id`, for example. Some users highly customize resource detection with many concepts being appended. |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.