Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhancement(aws_s3 source): Logs processed S3 objects #22083

Merged

Conversation

fdamstra
Copy link
Contributor

Summary

Adds log events for each object processed from S3 to add an audit trail. This is similar to the events that are logged for the file source (started watching, stopped watching).

If acknowledgements on the sink are enabled, then this will also log completion of the object.

Unlike the aws_sqs source, which converts sqs messages to logs, the aws_s3 source consumes objects that usually contain many logs.

Change Type

  • Bug fix
  • New feature
  • Non-functional (chore, refactoring, docs)
  • Performance

Is this a breaking change?

  • Yes
  • No

How did you test this PR?

Deployed to our test and dev environments, with pipelines that have acknowledgements enabled and others that have it disabled. Reviewed logging.

Does this PR include user facing changes?

  • Yes. Please add a changelog fragment based on our guidelines.
  • No. A maintainer will apply the "no-changelog" label to this PR.

Checklist

  • Please read our Vector contributor resources.
  • If this PR introduces changes Vector dependencies (modifies Cargo.lock), please
    run dd-rust-license-tool write to regenerate the license inventory and commit the changes (if any). More details here.

References

Unlike the `aws_sqs` source type, the sqs messaage itself is not the
source of events. This logs the bucket and key for files that were
ingested via vector in order to have a better audit log.

If acknowledgements are enabled, it also logs when they are
successfully processed.
@fdamstra fdamstra requested a review from a team as a code owner December 26, 2024 16:32
@github-actions github-actions bot added the domain: sources Anything related to the Vector's sources label Dec 26, 2024
@jszwedko jszwedko changed the title enhancement (aws_s3 source): Logs processed S3 objects enhancement(aws_s3 source): Logs processed S3 objects Jan 2, 2025
@jszwedko jszwedko added the source: aws_s3 Anything `aws_s3` source related label Jan 2, 2025
Copy link
Member

@pront pront left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be fine and we also have the internal_log_rate_limit option on by default which should prevent pathological situations from happening.

Note that this PR needs a changelog, thanks!

src/sources/aws_s3/sqs.rs Outdated Show resolved Hide resolved
src/sources/aws_s3/sqs.rs Outdated Show resolved Hide resolved
@@ -215,7 +215,11 @@ pub enum ProcessingError {
#[snafu(display("Unsupported S3 event version: {}.", version,))]
UnsupportedS3EventVersion { version: semver::Version },
#[snafu(display("Sink reported an error sending events"))]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks much better now. You can also use these fields in #[snafu(display... like UnsupportedS3EventVersion does above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much. I've incorporated these into the snafu string.

I appreciate all the handholding you've been providing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution!
If all checks pass, this will be added to the merge queue.

@pront pront enabled auto-merge January 9, 2025 18:55
auto-merge was automatically disabled January 10, 2025 13:01

Head branch was pushed to by a user without write access

@pront pront enabled auto-merge January 10, 2025 15:02
@pront pront added this pull request to the merge queue Jan 10, 2025
Merged via the queue into vectordotdev:master with commit 1ef01ae Jan 10, 2025
40 checks passed
titaneric pushed a commit to titaneric/vector that referenced this pull request Jan 15, 2025
…2083)

* `aws_s3` source now logs the S3 objects that were processed

Unlike the `aws_sqs` source type, the sqs messaage itself is not the
source of events. This logs the bucket and key for files that were
ingested via vector in order to have a better audit log.

If acknowledgements are enabled, it also logs when they are
successfully processed.

* Adds changelog note

* Fixes naming of aws_s3 logging changelog entry

* Rephrased changelog

* Rephrased changelog

* Removes redundant error messages; Adds useful information to acknowledgement errors

* Rephrased to make it clearer what was delivered

* Outputs bucket and s3 during errors

* Added missing '.'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: sources Anything related to the Vector's sources source: aws_s3 Anything `aws_s3` source related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

aws_s3 source should log what it's ingesting
3 participants