-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What is an OpenTelemetry Collector, what is a distribution? #8555
Comments
Here was my take from 2020: https://docs.google.com/document/d/1jHOYTRRI91UdyMEfqV7WNPEAxSQKP13b_jPcQX4oe9I/edit?usp=sharing TL;DR Other projects (prometheus, kubernetes) have successfully created conformance programs by testing conformant behavior, rather than requiring the use of certain code packages. An example of "conformant behavior" could be:
The easiest way to construct a "conformant" collector distribution would be to simply use collector libraries, or the collector builder, but it wouldn't necessarily require it. |
I like the idea of defining conformance to a standard but it's unclear to me what we are suggesting will be the effect of being conformant. In other words, let's say we define what it means to be an "OpenTelemetry Collector", and someone has a product which meets all the requirements. Isn't it still a trademark issue for them to say that their product is an OpenTelemetry Collector? IANAL but as I understand it, The Linux Foundation has a trademark on the term e.g. It would be a trademark violation for a company to name their product "Company OpenTelemetry Collector" because the trademark may not be used in a product name. However, it is ok to use the phrase "Company Distribution for OpenTelemetry Collector" because it is a reference to the trademark and does not imply that the trademark is part of the product name. I don't mean to nitpick but I can't figure out how one would communicate the fact that they officially have an OpenTelemetry Collector without violating the trademark guidelines. |
What does this clarification do and how does it help the project? I am unclear on why this is coming up, is this impacting the OpenTelemetry project's ability to graduate within the CNCF? |
I think in this case the effect would be that you cannot call yourself a "Collector distribution" without passing X,Y,Z conformance tests. I think the trademark issue is separate though and has already been enforced in the past.
I'm not sure I fully understand this one. Would this by proxy mean that "a collector distribution" must be built, or be able to be built, with OCB? I think this may be too limiting. Consider this scenario. Contributor X build a new Collector component type. It is ideal for their specific use case, and they don't plan on contributing upstream but they build it on top of the collector framework. OCB does not recognize this component type and thus fails to build it. Would this not qualify as a distribution? |
Just linking this other issue here that suggests a distribution should be added to the spec: open-telemetry/opentelemetry-specification#2873 As the issue points out, distribution is already in the official documentation: https://opentelemetry.io/docs/concepts/distributions/ |
Note the doc linked above also includes a link to the definition of the collector today: https://opentelemetry.io/docs/concepts/components/#collector
|
I guess my question would be if Collector SIG disagrees with the definition of distribution that's currently on the website. |
One thing that came up today during discussions today at the Operator Sig and also separately in discussions with @Aneurysm9 is command support. Should collector distributions be required to support both the Collector cc: @jaronoff97 |
My expectation as someone building features on top of the collector is that any collector distribution uses the collector builder or at least can be marshalled in to a struct that matches the collector go framework. Being able to adhere to that would ensure that how we design Kubernetes features will always work for any distribution. |
I think this is a great question to help anchor this discussion. Here's one scenario that comes to mind. Consider if (hypothetically) Google offers an OpenTelemetry Collector Distro for GCP that has lots of great 1st party GCP support. But their distro doesn't include (hypothetically) the Honeycomb Marker Exporter, because they don't want to be on the hook for supporting that exporter. This situation seems somewhat unavoidable, as I'm not sure we want to force all distros to include all components, both for size and support reasons. If the OpenTelemetry Collector could support dynamic linking, then users could just drop the Honeycomb Marker Exporter into their GCP distro, and the problem is solved, but it sounds like dynamic linking is a no go because of Go. So we would need another way to ensure that OpenTelemetry Collector distros can be extended and don't lock users into the distro's ecosystem. [just for one example, potentially we could say that anything called an OpenTelemetry Collector distro must be built using the OpenTelemetry Collector Builder and that all the distro components must be publicly available so that users can extend the distro themselves] |
@trask I don't think your example answers the question, at least for me. And we had an hour-long discussion on the call where we still didn't explicitly enumerate what problems we're trying to address by the discussion. I heard at least two problems, one on the call, another in your answer:
Some thoughts on (2):
|
it's not very user friendly and about 100x more painful than the plugin-based ecosystems I've worked with before where I can just upload a pre-built component into my existing system. I guess I was hoping we could get as close to the convenience that other plugin-based ecosystems offer, within the constraints of Golang.
I think the connection is that we have an opportunity to make requirements on something that wants to call itself an OpenTelemetry Collector distro, and so it's our chance to enforce something like this (if we want) fwiw, the example I gave
aligns with the definition proposed by @dyladan and @jpkrohling above:
|
This ^ already excludes existing distros that use proprietary code. More importantly, it doesn't answer the question which problem a definition like this solves. I see no reason to debate the criteria without deciding why we're doing it. To quote a good book:
|
I totally agree which is why I tried to provide one possible "why" above. I'm looking forward to seeing what other "whys" people have in mind. |
The primary reason I care about a definition here is that users are advised to limit the collector to contain only the components necessary for an environment. In the absence of a dynamic plugin model (which to my knowledge no collector maintainer believes is feasible), we are recommending that users deploy a "collector" that we have not built ourselves. Since we are not recommending a concrete binary, I believe we need to define precisely what we are recommending. Additionally, we expect that as a user's needs evolve they will migrate to another "collector" that contains a different set of components. Therefore, a definition would serve to establish expectations for what stays the same between "collectors" vs what may be different. I would like to highlight that the issue asks for two definitions, but there appear to be at least three categories of collectors which have been discussed. Very roughly:
The conversation so far seems to have blurred (2) and (3), and we might explicitly conclude that this is not an important distinction. However, for now, I'm drawing this distinction because the "whys" I've described above specifically apply to (2). |
I have two problems that I would like to see resolved. Problem one: remove confusion about what a Collector isThe first problem is basic confusion about "what a Collector is." Not a Collector distro, but the term Collector itself. If someone points to a binary and calls it a Collector, just about everyone in the community would assume that the binary is a build of the collector codebase plus some plugins. Even if a binary was described as some kind of "Vendor Specific Collector Distribution," that core assumption would still be there. That seems a bit obvious, but we're now starting to see projects pop up which don't match this definition. One example is Grafana Alloy. My understanding is that Alloy is basically the pre-existing Grafana agent, plus some additional components that it shares with the Collector codebase. Which is a totally fine thing to be! But when I first came across it, it was described as a "vendor neutral OpenTelemetry Collector distribution." Like everyone else in the community, that description made me think it was something completely different – that it was the Collector codebase plus some Grafana-specific plugins. I was super confused when I discovered that wasn't the case! Again, no disrespect to Grafana or the Alloy project; it seems like a totally fine project to me. But the naming threw me for a loop. Imagine if CouchDB started calling itself Redis because it shared some Redis code in order to add a feature. That would be really confusing! I'm sure the Grafana folks are reasonable, and we can just talk to them about it. But I imagine that there may be more instances of this in the future, so it seems prudent that we provide some kind of official definition of a Collector that roughly matches community expectations, in order to avoid confusion. Namely, that a Collector is a build of the collector framework plus some plugins. Problem two: who do I talk to for technical support?At the heart of all the various collector distro discussions is the question "who is responsible for helping me with this thing?" We have users who come into our slack channels asking for technical support. What technical support do we want to give? Who do we point them to if we don't want to give them support? Do we just support the Maybe there are additional problems, but those are the two where I am currently seeing real world issues related to a lack of clear definitions around the Collector. |
I don't think this in itself is a problem. Whatever someone calls their binary doesn't concern me unless I have an actual problem to solve and their naming creates confusion preventing me from solving the problem (like coming to OTEL support group when the actual "collector" is something else entirely). So your In other words, if |
@yurishkuro number one is definitely a problem. We are actively addressing an example of it right now. It is related to number two, but it causes other fundamental confusions. I agree that for most projects, |
Thank you all for the renewed interest in defining Collector and Collector distributions. I watched the recording from last Thursday and spoke to several of you on Slack (GC and Collector leads). Here’s a summary of the situation as I understand it. We already have a few definitions in place, such as:
Commercial vendors are being asked to support the "OTel Collector" by their customers, as evidenced by the number of commercial vendors listed as having a distribution of the Collector:
Each vendor has a different approach to meeting this demand. Some assist customers using a curated list of upstream components, others offer support (with SLAs) for their official binaries with vetted upstream components, and others provide extra features at different levels. These approaches are categorized on the distribution definition page as "Pure," "Plus," and "Minus." However, not all of these approaches resonate equally within the GC and with Collector maintainers: we accept some approaches as distributions but not others. We can't pinpoint why they are different, making it harder for vendors to comply with the (non-existent) requirements to be called a distribution. The GC has politely asked one of these vendors to stop calling itself a Collector, without providing a clear path forward for the project to regain the right to be called a distribution. Lack of knowledge about these projects adds to the confusion. For instance, I have seen inaccurate claims about ADOT and Alloy. @atoulme, @bogdandrutu, and @yurishkuro have questioned the actual problem we are aiming to solve. While their question might seem odd, there wasn’t a clear articulation of the problem: we feel that something is off but can't pinpoint why we don't want certain projects to be called a distribution of the Collector. One argument by @djaglowski was well-received: we want users to have a consistent experience and be able to reuse their knowledge when switching between "flavors" of the Collector, whether custom-built, vendor-built, or community-built. I have also heard a few other arguments, which I'll address here:
To me, it's clear that we need an objective set of rules in addition to our existing subjective definitions, so the ecosystem can thrive with options for our users while retaining their ability to reuse their knowledge and switch between flavors without getting locked-in. If we can agree on this need, here’s what I propose as an initial draft, with the promise to develop it further elsewhere:
|
Thanks @jpkrohling that's a great layout. My only suggestion is that I think Collector Build and Collector Distro can be combined. Anything that can be reproduced by the builder can be called a Distro, regardless of who issued it. |
In my previous message, I should have stressed more that we didn't have a consensus on whether we had a problem to solve. Before addressing why I think we need a build and a distribution, I'd like to take a step back and have a consensus. Community, Collector leads, TC, GC: please vote on this issue. The options are: ❤️ No problem to solve at the moment. Let the ecosystem use our subjective definitions (status quo) Note that you are NOT voting on my draft proposal. |
Let me try one last time. You cannot solve a "problem" of "what is collector" without deciding why, i.e. what success criteria you want to meet by "solving" it. The poll above provides exactly zero answers to that question. |
Not sure how helpful this is, but this is my take from working with several hundred customers adopting OTel:
So I guess my experience is that there isn't a terrible problem here to resolve, but there is quite a bit of variation in what people use, and that sometimes leads to confusion or a bad experience depending on what they're using. I see here echoes of what it means to adopt OTel. If you propose an alternative API, but still emit semantic conventions and OTLP data under the hood, is that OTel? I'd say yes. Is your binary, Acme Corp. Collector, capable of accepting and emitting OTLP, and also uses the |
@yurishkuro, please bear with us. Your input has been valuable and I think we are now in a better position because of your questions. I'll try again, starting with what I see as the problems we are trying to solve:
If we define we want to work on those problems, here are the goals for me:
|
I think the simplest way to conceptualize the 'problem' is that the only thing that the project defines as hard requirements for 'what is an OpenTelemetry ' is what's in the specification. This falls apart when you start talking about things like the collector - there's not really a specification for the collector. This can lead to not only user confusion (see above), but also confusion for vendors and integrators building in the ecosystem. Ultimately, we need to be able to provide some guarantees to both of these groups -- to users, we need to be able to have clear guidance for questions like:
To builders, we need guidance around:
|
Don't you see that this is a pure tautology? "We want to know because we want to know". Any definition will match that. E.g. the following definition is clear and objective, and completely besides the point as it does not address the unspoken problems:
This is getting closer to the issue, but it's very hand-wavy. @austinlparker 's comment #8555 (comment) is more concrete. Basically, we can approach this as a product requirement spec. Try to phrase everything as a use case:
For example, with one of Austin's bullet points:
Phrased like this, an immediate question from me - is that what we actually want? How is that even possible? It means that the two distros are 100% functionally equivalent (at least on the features I already used with distro X), which defeats the purpose of distros in the first place. Ability to swap implementations is a nice theoretical goal, but there are other goals users may have, like I don't want to run binaries 100s of MBs in size bundling every possible feature. So rather than keep debating completely arbitrary definitions of collector, let's first
Doing so will implicitly inform the definition of the collector, based on actual problems / goals / user needs, not based on a tautological definition of a problem. |
@yurishkuro There is an immediate need for the collector, as a SIG, to define what the requirements of another piece of software calling itself an 'OpenTelemetry Collector' must align with. This is, as you said, a product requirement. I stated my rationale above, but I would like to expand on it with the bigger issue here. As OpenTelemetry continues to mature and graduates, we (the GC and project leadership more generally) will need to create requirements around certification and compatibility. This is both easy, and hard. For instance, it is relatively easy to set a requirement around something like OTLP. If you write OTLP, then you must write valid OTLP to any compliant OTLP receiver. It is also somewhat easy to say 'Supports OpenTelemetry API' by ensuring that you can get the active span from context and modify it, etc. The collector, however, is much more difficult to quantify by these standards. I agree, in principle, that it might not be desirable for non-specced config files to be portable. I would generally agree that a receiver written for upstream may not necessarily work with other implementations. With that said, what is the distinction that we are going to use? You can hopefully understand my reluctance to say "Ok, well, you can just call anything that receives OTLP a Collector" because that could be very confusing for users, especially as management tools proliferate. Similarly, it does not benefit users to remove one source of lock-in (the API/SDK) then replace it with another (the pipeline/collector layer). I would honestly be fine saying 'there is only one thing called an OpenTelemetry Collector, and it is anything that is built with upstream ocb'. Everyone else in the ecosystem can be 'OTLP compatible' or whatever other words we come up with. edit: By 'non-specced' config files above, I mean configuration files that do not align with a published specification (eg, the upcoming file-based config options) |
Just to be crystal clear -- I think an entirely acceptable outcome of this is stating the following:
|
After discussing this at the specification meeting on 26-Nov-2024, the spec SIG agreed that having a definition in the spec will be valuable for the project and end users. Thank you for all the discussion in this thread, I will open a spec PR with my existing definition and would love to see the discussion continue there. Will close this once that PR is opened |
@codeboten There is a terminology problem with your proposal #8555 (comment) OTEL project does not have exclusive right to the use of the words Collector and Distribution. The only "enforceable" part of your definition is the definition of OpenTelemetry Collector, which can be clarified as "for the purpose of that definition the terms collector and distribution are defined as follows...". But that means the two terms do not stand on their own, they are just an implementation detail of the main definition. If a vendor maintains some sort of collector, it is by definition not an OpenTelemetry Collector since it's not maintained by OTel maintainers, and thus the vendors still have no guidelines what to call their binary. They are perfectly within their rights to call it Collector or even Distribution no matter how it's implemented because that's just general terms. |
@austinlparker this ^ is closely related to my concern which I wrote about in the second point of this comment. |
I don't think the goal with this definition should be to enforce the terminology, in fact i would prefer it wasn't. I don't want to get into the business of chasing down "collectors" in the wild. I just want users to know what their getting when they're getting the OpenTelemetry Collector and what they should get with any other collector. This is ultimately my goal here. Ideally, that definition leads to a set of tests or common practices that can be documented on the opentelemetry.io website to give end users the tools they need to do things like bringing their own components or using existing components from the ecosystem with any thing that calls itself a collector. And if the thing that calls itself a collector doesn't align with the definition, then users can go and ask whoever publishes this thing for changes to better align with the definition. I've added a comment in the spec issue here: open-telemetry/opentelemetry-specification#4309 (comment) |
We do not have exclusive rights over the english language, that is correct. However, we do have exclusive rights over what we, the project, promulgate. I cannot control every vendor in the world (nor do I wish to) but it is extremely valuable for OpenTelemetry, as a project, to clearly define how we use certain words such as 'collector' and 'distribution' as an inclusion filter for things like the OpenTelemetry website/registry (and other marketing/community activities). |
I don't want to go to another, mostly empty issue, why can't the discussion continue here? Why split it? I intentionally put "enforceable" in quotes, that was not my point. My point is that the only definition you're providing is of OpenTelemetry Collector. And there is a choice - do we want that to refer only to artifacts produced by the maintainers, or also to something that vendors can produce? So far the proposed definition is completely internal - let the users know what they can expect from the official OpenTelemetry Collector. Which is not at all what all the problems mentioned in this issue are about - user confusion, vendor lock-in, none of them are addressed. Which always brings me to the very first question I posted on this thread - what is it that you're trying to solve? Just defining what the official collector is and the principles of how it's being built? |
As an end User, your comment seems to be inferring a User guide, not a specification. And, as an end User, I would prefer specifications to be well defined, have specific terminology and use well formed ideas (if not, then it requires more refinement or, not a specification at all). Again, as an end User, I would rather place the expectation of what a User of a Collector might expect - from a purely OTel PoV - within guidelines and User documentation (as this might be helpful to orient understanding). There is absolutely no way to enforce - nor should it be enforced - those picking up the Collector code base and building their own 'Collector' as to what they should be providing (to a User). That seems a reach. Ultimately, I think the reality of a Collector being a thing that can collect things, process things, and produce things in many permutations of ways may not lend itself to be a specification. Perhaps an oversimplification. 100% please do enhance the User documentation, I can't imagine what routes a specification would bring us down |
My 2 ct: I think this is too restrictive. I understand the desire that users should be able to bring their own components. If users have custom components that are compliant with OTel interfaces they should be able to use these components with every collector distribution. That's ok. However, the sentence above is more restrictive: It's not enough for a distribution to support all OTel-compliant components, and to provide a way to include them in the build. The sentence above says a distribution must not include any alternative components that are incompatible with the current OTel interfaces. I would appreciate if distributions were allowed to experiment with alternative approaches, like components for native Prometheus pipelines. If these approaches turn out to be useful they will eventually be contributed back upstream, which will benefit the community. |
Yes, if we define what a collector is, then we have definitionally defined what it is not. Given the proposed definition, here's a quick scorecard of things that are in the community that are, or are not, collectors by this definition.
|
Now, do all of those things support OpenTelemetry? Yep. They also support OTLP! That's great! We should have more things that support OpenTelemetry and OTLP. However, I hope you can appreciate that without some normative guidance on what exactly you must do to wind up in one of those buckets, we will invariably see more end-user confusion around how they can extend the collector interfaces, how they can migrate between distributions, etc. This is not hypothetical -- this already happens! edit: I would also point out that this isn't just an OpenTelemetry problem; You can look at the k8s community to see what happens when the project doesn't provide clear, normative guidance around the names of things. |
if your 2nd column refers to OpenTelemetry Collector then it's obvious that everything will be No because those are not OTEL-project artifacts. If you really meant unqualified "collector" by the internal definition, then how can something be a distro without being a collector? My suggestion:
|
Given the conversation so far, I felt like it was valuable to be explicit rather than implicit about the proposed definition.
Your third point here seems to be, effectively, what the proposed definition is. We're reserving 'OpenTelemetry Collector' to mean the specific artifact that we produce, while providing guidance for how to create a distribution that complies with our goals (no lock-in, etc.) Could you elaborate on what un-defining 'collector' does, other than allow for existing artifacts that define themselves as 'OpenTelemetry Collectors' to continue to do so? |
This goes back to my point about terminology conflict. Say we define what we mean by the word "collector" for the purpose of defining what OpenTelemetry Collector means. How does it help a vendor? What sentence could they construct that could refer to that internal helper term? What they could use are the top-level terms, in this case "OpenTelemetry Collector Distribution". |
Vendors aren't the only audience for this, they're a part of it. A normative definition of the term and product "OpenTelemetry Collector" provides guidance to end-users and third-parties by clearly saying "the artifact and product OpenTelemetry Collector refers to a specific piece of software built and distributed by the OpenTelemetry project". This means that when someone, for example, takes a training course that covers the OpenTelemetry Collector, the subject of that course is unambiguous. When someone does a conference talk about the OpenTelemetry Collector, it continues to be unambiguous. If I have a deployed OpenTelemetry Collector that I need support for, it is clear where I can get that support, etc. Without defining what an OpenTelemetry Collector is, we cannot effectively define what a Collector Distribution is. A Distribution definition helps me understand other things as an end-user -- for example, the extensibility of that product, the applicability of given configurations and rules for things like transform processors or sampling processors, etc. If we define distributions without also defining collectors, it is a mostly meaningless distinction -- if Alloy, FluentBit, and Vector are all 'OpenTelemetry Collectors' then what possible definition of a 'distribution' can you come up with that satisfies the other requirements of the definition (e.g., component extensibility, config interop, management interop, etc.) |
I said do define OpenTelemetry Collector, but don't define collector (leave it to Webster). |
So if the proposed definition prefixed instances of 'Collector' with 'OpenTelemetry', e.g.
becomes
You'd be fine with it? Because I believe that is the intent, and I agree that making it explicit is fine. |
Does a definition that excludes the majority of the corporate sponsors of community investment seem to benefit the community? Suppose we define OTel Collector & OTel Distribution in this exclusionary way. In that case, it at least feels important to have a third term that encompasses good faith contributors to the community who may not have feasible (business or technological) routes to rebuilding their collectors to comply with the limits enforced specifically by the OCB tooling requirements. |
Right, this isn't about excluding existing collectors. How could any of them fit a definition that hasn't existed previously? 😄 This would give all distributions and their publishers an opportunity to align with the project to ensure the goal of avoiding vendor lock-in is achieved. Whatever definition makes sense to achieve this, i'm in favour of |
@dehaansa I think it's fine to be exclusionary in that regard. The OpenTelemetry Collector is a specific piece of software with specific design goals, ecosystem, and compatibility guarantees. If you write a software that looks like it's doing the same thing but has a completely different implementation and ecosystem (e.g. FluentBit), then it's not an OpenTelemetry Collector, it's just something that supports OTLP. I think the principles @codeboten outlined in #8555 (comment) are quite reasonable because they aim to address user issues like extensibility, compatibility, configuration portability, ecosystem familiarity & continuity, etc. |
Apologies if this is long, I'm attempting to find a balance between keeping it short while also explaining some of the failure modes this project can get into if it is not careful. @codeboten I've tried to respond to your main points, and at the bottom I suggest a more streamlined definition. A Collector MUST allow end users to receive, process, emit telemetry in various formatsThis is a vague requirement, I'm not sure that it helps. You can build a Collector that does not emit telemetry. On the other hand, FluentBit can receive, process, and emit telemetry. So it feels like this definition does not really say anything specific about the Collector. A Collector MUST allow users to bring their own components, to ensure no vendor lock-in can occur.This is the actual heart of the issue. We can't create a behavior-based definition of a Collector because the Collector is completely pluggable. However! The pluggable nature of the Collector is an incredibly valuable feature. I believe that this pluggability is what we want to preserve: mixing and matching plugins, including arbitrary plugins that end users create themselves. This is the feature that keeps the Otel Collector community from fracturing. A Distribution is a package that is produced by utilizing open source tooling maintained by the OpenTelemetry project and contains any combination of components.I assume this means something along the lines of "All Collectors must be compiled using the Collector Builder, and anything the Collector Builder can produce counts as a Collector." At first glance, it would seem like tying the definition of a Collector to the Builder would make for a clean, strict definition of a Collector. This was certainly my first thought. But I have been convinced that it's actually the opposite, and it's worth explaining why. The builder isn't a spec; it's just software. You can make it do anything. If we base the definition of the Collector on what the Builder can do, that will create an incentive for organizations to make PRs attempting to extend the builder to build whatever it is they want to call a Collector. Saying yes or no to those requests would be very difficult, because we would not have a spec to point at as justification. Instead we would be back at square one, trying to come up with a set of principles to base our decisions on. Bendable definitions lead to pressure campaignsThe "builder-based" approach would also create all kinds of political problems for our project. We will be accused of favoritism every time we say "no" to a request to extend the Builder. Then the Builder maintainers will be called out for weaponizing Otel to favor their corporate interests over their competitors. After that, the GC (and possibly the Builder maintainers) will become a target for threats and strong-arm tactics until we relent and accept their change. This isn't a theoretical problem! There have already been attempts to strongarm the project in various ways. We have very intentionally architected the OTel project to avoid these kinds of failure modes. For example, the GC is elected because that avoids the situation where companies come to the project and say "you will give us three GC seats or else." I normally don't talk publicly about that side of the biz, but I really can't emphasize enough that having the rules defined in a way where we can't be pressured is extremely important. It's a big part of why the OTel community is so free of politics, compared to other industry-size OSS projects I have worked on. TL;DR; using the Builder as proof of compliance doesn't help to define what a Collector is. It just transposes the question from "What is a Collector?" to "What is the Builder allowed to do?" So, while it seems logical at first blush, dragging the Builder into this certification would create a lot of social stress without giving us very much in return. No take backsSpeaking of social stress and political threats to the project, I do want to clarify that any definition of a Collector that would exclude the things we have already declared to be Collector distros – ADOT and the Splunk distro, for example – would need to first get approval from those member organizations. It's against our principles to renege on something that big; it would immediately create a crisis. It's also not the outcome we want. We want things like ADOT, Splunk, and Alloy to count as collectors; we're trying to find a definition that allows for projects like these while avoiding projects that try to do something nasty with our brand, or unintentionally force our users to choose between the special features in that particular distro and the features they get from the Collector plugin ecosystem. It's more important that the certification process results in the community we want than it is for the process to be some kind of clean, automatically testable conformance test. ProposalBased on this I recommend that we stick to a slightly more limited definition of a Collector.
This definition focuses on the result, not the process. It is also a concrete and actionable definition: any project that wishes to have an official OpenTelemetry Collector Certification can apply for an audit. The audit is performed as code review, not an automated testing process. Having to come to us for an audit also creates an opportunity for conversation and makes sure that we don't get sideswiped by a new project coming out of the blue. |
Can you expand on what you consider a Collector Config file? Would the configuration available in the alloy example https://github.com/grafana/alloy/?tab=readme-ov-file#example mean that it is excluded from this definition? |
I'm going to respond to these points individually.
I think everyone is more or less in agreement that this is crucial -- perhaps a better way of framing this debate is "how do we protect against lock-in and enable credible exit for end-users without overly specifying the Collector API/ABI".
I mean, you could replace "ocb" with "conformance suite" and I don't know what would really be different here. If we created some clean-sheet bash script or AST parser that looked at your interfaces and said "yep, this compiles" or "nope, it doesn't" then I'm not quite sure how we'd avoid similar situations (people would just argue that our definition of 'compliance' was too strict, and they'd point to the lack of a specification as a rationale).
We already have all kinds of political problems, some of them in this very thread. We have to deal with vendors spreading FUD about the Collector, we have to deal with vendors trying to 'land grab' OpenTelemetry names/concepts and tie them to their marketing/product positioning, and we have 'one-way' collector implementations right now where you can get in, but not out, easily. To my earlier point, the builder isn't what's important here, the definition is what's important, and the builder is a mechanism by which we can programmatically enforce/prove that definition.
I disagree with this categorically. We never made an express or implicit 'promise' to anyone about the Collector other than what's been written out. The fact that the Collector is deliberately unspecified should be as big of a clue to that as anyone. Perhaps it should be, but that's a bit outside the scope of this issue (and I personally don't think it should be -- the thing that matters is, which is OTLP, from an interop perspective). We cannot possibly craft a definition that is both rigorous and complete while also creating special carve-outs for first movers; Doing so would also be unfair to anyone else who might come along and develop for the ecosystem later.
I do not functionally see how this is different than the proposal as-written. Points 1 and 2 are self-referential; a Collector that accepts the config file and accepts existing plugins must be written in Go, and thus can be built via ocb. The third point is ultimately just semantic sugar over the constraints of the first two points; Obviously, any compiled collector can be a Distribution. |
After doing some more thinking, I do think that there's value in trying to avoid a hard requirement on our build tooling. As a compromise, what if we adopted the Edit: The benefit here is that we could then offer self-certification by providing a configuration file with expected inputs/outputs, which would be less work to maintain on our part and also avoid having to make our build tooling a load-bearing part of the conformance process for third parties. Edit x2: Ted and I were just agreeing violently with each other I think |
I interpreted "build with ocb" more like as a shorthand for the actual composability requirements to be spelled out. It's not the ocb that's important, but how the code included in a distro is organized. If the code is proprietary, obviously a user cannot build their own collector. If the code is public and implements Component API but is not structured to be composable by something like ocb then it's still very difficult to the user to build their own collector. So ocb provides an easy smell test, but ultimately means an underlying set of requirements that we can document. Case in point: Jaeger implements several custom components which are all compatible with the Collector framework, but for a number of reasons we intentionally didn't make them into standalone modules the way otel-coll-contrib is organized, so building with ocb is currently not possible (not a priority for Jaeger right now). Similar story with "accepts OTEL config format" - intuitively we understand what that means, the actual PR should spell it out in more details. |
I think an important distinction is that it would allow a distributions to add additional functionality which is not necessarily implemented using the component interfaces. For example, it could support an additional configuration provider) or an additional configuration file format so long as it could be re-built by users with additional components. I wonder if requiring it to support building from an OCB configuration file (but not requiring OCB itself) is the right middle ground. It ensures the "add an additional component" flow is the same across distributions without preventing a distribution from innovating in other ways, or having functionality which is not encapsulated in a component. A distribution could also modify their build process, so long as they can support the config file format. It seems easily testable as well: Build with a custom OCB config file, then run the binary with a custom collector config. |
I don't know that there can be. At least not if it doesn't support having processors that operate on the OTLP data model between the receivers and exporters. Something that collects and produces Prometheus metrics and nothing else is a Prometheus derivative, not an OpenTelemetry Collector derivative. The two things may be able to live in the same binary, but they're fundamentally different systems. |
@codeboten wrote:
While it's desirable that downstream distributions use the tooling our users are familiar with, I wouldn't include that as a requirement to call the produced binary an OTel Collector Distribution. I'm not sure everyone in this thread is aware, but not everything in an OCB manifest is a component: we have support for config providers, which are not components. If we say that a distribution is only what an ocb-like tool can build, we'd be including config providers in the mix (and any other similar features). @TylerHelmuth wrote:
I believe this is key to this matter. @austinlparker wrote:
This statement has it backwards: we require OTel Collector Distributions to be able to include OTel Collector Components, but we should not require OTel Collector Distributions to include only OTel Collector Components. @yurishkuro wrote:
Off-topic, but I think it can: you'll probably have to specify the gomod and import/name/path attributes for each module. @tedsuo wrote:
I think this is clear, short, and captures all the important aspects. I would just replace "plugins" with "components" and qualify the common words "collector", "distro", "components": OTel Collector, OTel Collector Distribution, OTel Collector Component. I would probably also require the config file to use at least one specific config provider.
@codeboten wrote:
I would be explicit that we expect our YAML schema to be accepted, while still allowing other formats. We'd have to get our schema sorted out though :-) |
Can you provide an example of a collector distribution that would not be compatible with any collector component, if we define 'collector component' to be a go module that implements the Here's why I think we have to have bidirectional component guarantees. If we say "collectors may include components that do not support the component interface", then we open the door to "Collector Distributions" that import a few parts of the collector as a shim (e.g., OTLP receiver and service definitions), then replace all the 'moving parts' with something else -- e.g., a proprietary agent, sampler, et. al. By this definition, the Datadog Agent could be construed as an "OpenTelemetry Collector Distribution". From a practical perspective, I don't think that it is terribly useful to end-users to allow for such a broad scope; if anything can be a collector distribution, then it's a meaningless definition. Ultimately, I am less concerned by the exact mechanism we use to certify that this bidirectional coupling exists, but I have a strong preference towards something that is self-certifying and does not require us to make judgement calls. In my mind, the following are more-or-less equivalent ways of expressing this preference:
This does not mean that we cannot have a wide ecosystem of tools that fill the role of collectors/pipeline components in OpenTelemetry -- indeed, I expect that more will be created. What I feel like is important to those is that we have clear guidance about what features they support -- e.g., "Supports OpenTelemetry", "Supports OTLP", "Supports OpAMP" -- as guidance for end users on how those tools fit into their overall observability deployment. A restrictive definition of Collectors and Distributions do not preclude us from having more broad categories; I'd argue that it actually helps in creating a definition of those categories and provides more avenues for ecosystem development and innovation. If you don't have to try and force your collector-shaped idea into a collector distribution shaped box, then you can focus on what matters (DX/UX, runtime environment, etc.) |
Opened a PR to the spec with the definition as proposed by @tedsuo and updated by @jpkrohling open-telemetry/opentelemetry-specification#4313 I'm open to edits/suggestions/updates, please make suggestions on the document in that PR as I think it will be easier to parse than in this github issue |
We had a discussion recently around what is an OpenTelemetry Collector and what is a distribution of the Collector. I would like to gather your opinions.
@dyladan proposed that only what the SIG Collector produces can be called an "OpenTelemetry Collector" and that a distribution has to fulfill the following requirements:
I tend to agree with him, but I'm eager to hear your opinions. The GC might have the right to make the final decision if we can't get an agreement, but I think we can indeed reach a consensus, at least between the GC and the Collector maintainers (core and contrib).
Update - 2024-07-17: based on the state of the discussion so far, here are the issues we identified:
The text was updated successfully, but these errors were encountered: