Detailed breakdown of dependency conflict check breaking change

Recently, a [breaking change](https://github.com/open-telemetry/opentelemetry-python-contrib/pull/3202) was made to how dependency checks work. The change was released in 1.32.0/0.53b0. There were multiple issues with this approach but also multiple benefits. This issue it meant to explain the reasons for the change, the different use cases affected, breakages, and possible solutions.

# Pre-existing dependency conflict logic for autoinstrumentation

1. Each instrumentations stores the restrictions for its instrumented library under `[project.optional-dependencies]`. For instance the [Flask instrumentation lists `flask >= 1.0`](https://github.com/open-telemetry/opentelemetry-python-contrib/blob/main/instrumentation/opentelemetry-instrumentation-flask/pyproject.toml#L38).
2. The importlib_metadata ["Distribution"](https://docs.python.org/3/library/importlib.metadata.html#importlib.metadata.Distribution) object's ["requires"](https://docs.python.org/3/library/importlib.metadata.html#distribution-requirements) field includes the instrumentations required _and_ optional dependencies. The optional dependencies have the extra field, `extra == 'instrument'`. For examples `['opentelemetry-api~=1.12', 'opentelemetry-instrumentation-wsgi==0.52b1', ... "flask>=1.0; extra == 'instruments'"]`
3. The [`get_dist_dependency_conflicts`](https://github.com/open-telemetry/opentelemetry-python-contrib/blob/147e3f754e9cc94c16371cc177d7f968c3e60693/opentelemetry-instrumentation/src/opentelemetry/instrumentation/dependencies.py#L43) removed by the breaking change, identifies the optional dependencies that have `extra == 'instrument'` and returns a Dependency conflict if the optional dependencies requirements are not met. For instance, a conflict is returned if Flask<1.0 is installed _OR_ if Flask is not installed at all. (The latter is essential for the "codeless cloud autoinstrumentation" and "instrumentation pack" use cases explained below.)
4. Autoinstrumentation's _load.py calls `get_dist_dependency_conflicts` _before_ initialization the instrumentator objects. If a dependency conflict is returned, the instrumentator object will not be initialized.

For autoinstrumentation at least, dependency check was done before any instrumentor was instantiated. It is not assumed by autoinstrumentation that the instrumented library is installed. Most instrumentor objects were also [written assuming that the instrumented library is installed](https://github.com/open-telemetry/opentelemetry-python-contrib/blob/main/instrumentation/opentelemetry-instrumentation-flask/src/opentelemetry/instrumentation/flask/__init__.py#L254) and therefore that a dependency check would be down _before_ the Instrumentor object is instantiated.

# New dependency conflict logic from change

1. OPTIONAL: Instrumentation lists optional requirements in the `_instruments` field in package.py 
2. The `<Library>Instrumentor` object's `instrumentation_dependencies` method returns the optional dependencies. Most often, this pulls from package.py's `_instruments` field. However, in more complicated use cases, such as the [KafkaInstrumentor, it may provide different requirements depending on what is installed](https://github.com/open-telemetry/opentelemetry-python-contrib/blob/main/instrumentation/opentelemetry-instrumentation-kafka-python/src/opentelemetry/instrumentation/kafka/__init__.py#L108). Importantly, this [Kafka design still assumes Kafka is installed and will crash if not](https://github.com/open-telemetry/opentelemetry-python-contrib/blob/main/instrumentation/opentelemetry-instrumentation-kafka-python/src/opentelemetry/instrumentation/kafka/__init__.py#L88).

Note that `[project.optional-dependencies]` is no longer relevant. (As far as I can tell, it would only be used in this [packaging script](https://github.com/open-telemetry/opentelemetry-python-contrib/blob/cc7169cf2c778447d23c55fdb75e253656ff4e06/scripts/otel_packaging.py#L66).)The optional dependencies identified in the importlib_metadata `Distribution.requires` field have no bearing on whether an instrumentation will be initialized. In fact, the `_instruments` field is merely common style and is not required either. All that matters is the `<Library>Instrumentor` object's `instrumentation_dependencies` method.

## Reason for change: Multi-package instrumentations

The old approach does not work well for Kafka or PsycoPG2. These instrumentations have multiple alternative packages they could instrument. For instance, [Kafka can instrument `kafka-python` OR `kafka-python-ng`](https://github.com/open-telemetry/opentelemetry-python-contrib/blob/main/instrumentation/opentelemetry-instrumentation-kafka-python/pyproject.toml#L35). It should not require both to be installed. However, in the old approach, multiple entries in `[project.optional-dependencies]` are treated as ALL required. So, when Kafka lists `"kafka-python >= 2.0, < 3.0", "kafka-python-ng >= 2.0, < 3.0"`, the old approach would only attempt to instrument Kafka in the unrealistic scenario where _both_ `kafka-python` and `kafka-python-ng` are installed. To summerize, the old approach is only designed for "AND Instrumentations" but not "OR instrumentations".

### Secondary reason: Manual vs Auto consistency

## Change breakdown

The change moves dependency checks into the Instrumentor.instrument method itself. In other words, no dependency check is done before Instrumentors are instantiated. Since Instrumentors generally assume instrumented packages are installed, this causes any such Instrumentor to crash, generally with an ImportError even before the new dependency check in Instrumentor.instrument begins.

In short, this means that the new dependency check only prevents breakage when the instrumented package is installed _but with the wrong version_.

## Use cases

Before explaining the breakages, here are some relevant use cases

### Instrumentation packs

OpenTelemetry clients may include multiple instrumentations automatically. For instance, the azure-monitor-opentelemetry "distro" provides an easy one-line solution to set up OTel providers, exporters, and instrumentations of the most popular libraries. For example, it includes the Flask instrumentation automatically. It is up to dependency conflicts to decide whether that instrumentor should be instantiated and whether the library should be instrumented.

### One-click codeless autoinstrumentation from Cloud providers

Multiple cloud providers, such as Azure provide OpenTelemetry autoinstrumentation as a feature in their UI. This means with a single click, you can enable any and all supported instrumentations. This means the cloud service must install the instrumentations. For both ease of use and to avoid ballooning start-up times, this is down by side loading pre-installed instrumentations. For example, the Flask instrumentation will be instantly present regardless of whether the user has a flask app. It is up to dependency conflicts to decide whether that instrumentor should be instantiated and whether the library should be instrumented.

Note that the fundamental difference between this and other autoinstrumentation scenarios is that instrumented app and autoinstrumentation agent come from 2 different parties: the cloud customer and cloud provider, respectively.
 
## Summarized breakages and fixes

1. Public method get_dist_dependency_conflicts deleted. Fixed in ___
2. Instrumentation requirements are no longer taken from project.toml but rather by abstract Instrumentor.instrumentation_dependencies() method.
3. Instrumented libraries are now assumed to be present for all installed instrumentations, whether they rely on an "and" or "or" list of instrumented libraries. This breaks the "instrumentation pack" and "Cloud-provided autoinstrumentation scenarios"
4. Instrumentor objects are now assumed to gracefully instantiate with instrumented library is not installed
5. There is no distinction between the ModuleNotFound error raised when an instrumented library is not installed and all other possible sources of that error. Dependency checks are now only used to constrain the _version_ of the instrumented package but not whether or not it is installed. DependencyConflictError is only raised when the library is installed but at the wrong version.

## Possible solutions

### Revert change, add new "instruments_either" package field

We could add a new field besides "instruments" that acts as an "or" least while leaving the existing field to act as an "and" list. get_dependency_conflicts would then be changed to utilize _both_ fields. So, instrumentations like Kafka would leave instruments blank but populate "instruments_either". Most instrumentations would keep their current "instruments" value and not require any changes.

If we wish to keep the similarity between Manual and Auto, we could either do a partial revert, of simply include this as a manual instrumentation option as well. I think it makes sense to allow users to do a dependency check before instantiating the Instrumentor even for Manual instrumentation users.

Pros: increased customizability, faster setup, no unnecessary Instrumentor instantiation.
Cons: Could slightly complicate dependency conflict logic. Might required speced out edge cases.

### Retrofit all Instrumentation's Instrumentor objects to lazy import

Instrumentation modules and Instrumentor objects would all be changes to not automatically import their instrumented libraries. They would automatically check themselves. This includes changing Kafka and PsycoPG2 as well.

Pros: Insrumentors probably shouldn't automatically import libraries outside their dependencies.
Cons: Lots of retrofitting work across most instrumentation. Potentially slower setup if we continue to instantiate all Instrumentors. See Vertex AI instrumentation. Would not actually "solve" the OR use case because instrumentation_dependencies would still need to pass an AND list. But it would allow that AND list to change depending on what libraries are installed.

### Implement new `should_instrument` method in each instrumentation

This method would provide the flexibility of the `instrumentation_dependencies`, but with more clarity for use cases like Kafka and PsycoPG2. It would also work for "codeless cloud autoinstrumentation" and "instrumentation pack" use cases. Depending on implementation, this may also require retrofitting instrumentations or instrumentor objects to not automatically import their instrumented libraries

Pros: Would not require instantiated Instrumentor object. Clear design.
Cons: Lots of retrofitting work across most instrumentation. Potentially slower setup if we continue to load and import all Instrumentations. May require extensive changes since entry points point specifically to instrumentor objects.

### Implement separate Instrumentations instead of "OR scenarios"

There could be a KafkaInstrumentation and a KafkaNGInstrumentation.

Pros: Easy to understand, faster setup, no unnecessary Instrumentor instantiation.
Cons: New packages, could temporarily break KafkaNG users relying on latest release until they add new instrumentation.

Links:
Repo before change: https://github.com/open-telemetry/opentelemetry-python-contrib/tree/8582da5b8decd99f3780e820b5652d4c72b7a953
Breaking change PR: https://github.com/open-telemetry/opentelemetry-python-contrib/pull/3202
New tracebacks and logs example: issue: https://github.com/Azure/azure-sdk-for-python/issues/40517

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Detailed breakdown of dependency conflict check breaking change #3434

Pre-existing dependency conflict logic for autoinstrumentation

New dependency conflict logic from change

Reason for change: Multi-package instrumentations

Secondary reason: Manual vs Auto consistency

Change breakdown

Use cases

Instrumentation packs

One-click codeless autoinstrumentation from Cloud providers

Summarized breakages and fixes

Possible solutions

Revert change, add new "instruments_either" package field

Retrofit all Instrumentation's Instrumentor objects to lazy import

Implement new `should_instrument` method in each instrumentation

Implement separate Instrumentations instead of "OR scenarios"

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Detailed breakdown of dependency conflict check breaking change #3434

Description

Pre-existing dependency conflict logic for autoinstrumentation

New dependency conflict logic from change

Reason for change: Multi-package instrumentations

Secondary reason: Manual vs Auto consistency

Change breakdown

Use cases

Instrumentation packs

One-click codeless autoinstrumentation from Cloud providers

Summarized breakages and fixes

Possible solutions

Revert change, add new "instruments_either" package field

Retrofit all Instrumentation's Instrumentor objects to lazy import

Implement new should_instrument method in each instrumentation

Implement separate Instrumentations instead of "OR scenarios"

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Implement new `should_instrument` method in each instrumentation