-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ADDON-77801 Added support for ingesting json event using conte… #376
base: main
Are you sure you want to change the base?
ADDON-77801 Added support for ingesting json event using conte… #376
Conversation
1f21143
to
0f3be11
Compare
Had some internal discussion directly with the PR author about this, so this PR may change. As such I have converted it to draft at this time. |
@@ -11,3 +11,4 @@ class TestAttackData(BaseModel): | |||
sourcetype: str = Field(...) | |||
custom_index: str | None = None | |||
host: str | None = None | |||
endpoint: str | None = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering if it makes sense to add param, so that it would need to be set externally. It is doable for sure, on the other hand maybe it is generic enough for contentctl to spot that event format is hec format and decide as @spanchal-crest was proposing initally?
Mean there is very specific json format, requiring "event" field and some others: https://docs.splunk.com/Documentation/Splunk/latest/Data/FormateventsforHTTPEventCollector
On the other hand if assuming that having event and not having other fields than listed on above page seems to risky we can stick to the entry here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not exactly sure that I am following your suggestion. Could you provide some pseudocode, by any chance?
Are you saying that, for example, the JSON data that we want to replay, today, is actually JSONL and is a single file that looks like the following:
{"field1": "somevalue", "field2": 4}
{"field1": "someothervalue", "field2": 2}
{"field1": "somethirdvalue", "field2": 1}
But must actually be converted to the following format before it is replayed:
{"sourcetype": "some_sourcetype", "event": {"field1": "somevalue", "field2": 4}}
{"sourcetype": "some_sourcetype", "event": {"field1": "someothervalue", "field2": 2}}
{"sourcetype": "some_sourcetype", "event": {"field1": "somethirdvalue", "field2": 1}}
In line with the requirements for the HEC JSON endpoint?
Secondly, instead of the typing
endpoint: str | None = None
I would suggest
DEFAULT_HEC_ENDPOINT = "services/collector/raw"
`endpoint: str = DEFAULT_HEC_ENDPOINT`
or, even better
from enum import StrEnum
class ReplayEndpoint(StrEnum):
HEC = "services/collector/raw"
JSON = "services/collector/event"
endpoint: ReplayEndpoint = ReplayEndpoint.HEC
given that we already have thousands of tests and all of them use the raw event collector endpoint. The typing this way:
1. Ensures that endpoint is never `None` - since it is required it MUST be defined and Test Object creation time (giving it a reasonable default value means we don't need to go update 2,000 existing ymls).
2. Makes it explicit after the object is constructed so that we don't need a check in the replay logic for whether or not the value is None.
3. Means that if someone typos the endpoint, or tries to replay to an endpoint that contentctl does not support, they get a warning at validation time.
A third option could even be to make ReplayEndpoint its own Abstract Class and make HEC and JSON child classes that MUST implement the function `transform` so that if "raw" data (which comes directly from a file) must be "changed" in some way, the logic can be implemented in each of the ReplayEndpoint classes rather than in the replay logic itself.
OR, the JSON event file could be in the `{"sourcetype": "xyz", "event":{...}}` format on-disk.
@@ -1402,6 +1402,10 @@ def hec_raw_replay( | |||
url_with_hec_path = urllib.parse.urljoin( | |||
url_with_port, "services/collector/raw" | |||
) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do if else above with this, not override later?
Is there any test where we can test this behavior?
Jira: https://splunk.atlassian.net/browse/ADDON-77801
Added support for ingesting the json event using contentctl.
For the windows TA, the SPL2 transformed logs have additional fields along with _raw. So, in order to ingest those samples we need to use
services/collector/event
endpoint as we want to ingest json data.For ingesting the events in json format, we should have the below format of the event.
{"event": "raw_event", "fields" : {"fields1" : "value1", "fields2" : "value2"}}