feat(relation): clarify `dlt.Relation` lifecycle

### Summary
Currently, `dlt.Relation` is coupled to `pipeline.run()` in a way that those two are not equivalent

```python
@dlt.transformation
def foo(dataset: dlt.Dataset):
  yield dataset.table("bar")

# retrieve relation from transformation then execute and get data
result_rel: dlt.Relation = list(foo(dataset))[0]
result_rel.df()

# execute transformation via pipeline.run() then get data
pipeline.run([foo])
result2_rel = pipeline.dataset().table("foo")
result2_rel.df()
```
This behavior is inconsistent with `@dlt.resource` where doing `list(resource)` will gather records in-memory and add the `_dlt_id` and `_dlt_load_id` fields.

This happens because `dlt.Pipeline.run()` goes through Extract, Normalize, Load. The Normalize step adds the `_dlt_id` and `_dlt_load_id` columns.

### Motivation
We want to be able to define `@dlt.transformation` interactively without requiring the heavy `dlt.Pipeline.run()` constructs on each execution. 

The column `_dlt_id` exists on all tables produced by dlt by default and is useful to uniquely identify records. This is useful for transformation when:
- identifying rows for data quality checks
- incremental strategies
- smart joins

### Intended behavior
1.  `list(transformation)[0]` should include the `_dlt_id` and `_dlt_load_id` just like `list(resource)` does
2.  Currently, transformation outputs are all treated like root tables; they have their own `_dlt_id` and `_dlt_load_id` with no parent.
  - This could change in the future (e.g., we generate dimensions and facts), but is good for now
3. Executing `dlt.Relation` via `.df()`, `.fetchall()`, etc. shouldn't add `_dlt_id` nor `_dlt_load_id`. Relations are a more generic mechanism than transformations
4. Methods and features related to `dlt.Dataset` shouldn't mutate queries to include / generate `_dlt_id` nor `_dlt_load_id` 

### Potential solution
The internals of `@dlt.transformation` (e.g., `make_transformation_function`)  produce a `dlt.Relation`. The code should be changed such that `dlt.Relation` generates the `_dlt_id` and `_dlt_load_id` instead of this being done in the Normalization of `pipeline.run()`

question: can I have a list of things that happen a Normalization for `@dlt.transformation`? 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(relation): clarify `dlt.Relation` lifecycle #3206

Summary

Motivation

Intended behavior

Potential solution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feat(relation): clarify dlt.Relation lifecycle #3206

Description

Summary

Motivation

Intended behavior

Potential solution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

feat(relation): clarify `dlt.Relation` lifecycle #3206