S3 Log Compressor

A simple AWS Lambda function that downloads multiple files from S3 and creates a single zip archive, or retrieves a single file from an archive. Perfect for archiving log files or consolidating multiple files into a single downloadable package. The zip uses zstd for compression (mainly for paths and metadata)

Features

Dual-Mode Operation: Supports both compressing files into an archive and decompressing a single file from an archive.
Batch Processing: Process thousands of files from an S3 manifest.
Asynchronous Cleanup: Optionally deletes source files asynchronously after successful archiving.
Robust Validation: Halts on any file download failure to ensure archive integrity.
Optimized Manifest Parsing: Reduces S3 API calls by intelligently distinguishing files from directories in the manifest.
Concurrent Downloads: Configurable worker threads for parallel processing.
KMS Encryption: Supports server-side encryption with KMS keys.

Orchestrator Lambda

The orchestrator lambda is responsible for processing a large S3 inventory export. It filters the inventory for a specific day, groups the files by AWS account and region, generates a manifest for each group, and then invokes the s3-log-compressor lambda to compress the files for each group.

Orchestrator Event Structure

{
  "inventory_s3_url": "s3://your-source-bucket/path/to/inventory.csv",
  "output_manifest_s3_prefix": "s3://your-target-bucket/manifests/",
  "date_to_process": "2024-12-25"
}

inventory_s3_url: The S3 URL of the inventory export file (in CSV format).
output_manifest_s3_prefix: The S3 prefix where the generated manifest files will be stored.
date_to_process: The date (in YYYY-MM-DD format) to filter the inventory by.

Event Structure

The operation field determines the function's behavior.

Compress Operation

{
  "operation": "compress",
  "input_s3_manifest_url": "s3://bucket/manifest.txt",
  "output_s3_url": "s3://bucket/archive.zip",
  "delete_source_files": false,
  "include_s3_name": true,
  "max_workers": 256
}

Decompress Operation

{
  "operation": "decompress",
  "source_s3_url": "s3://bucket/archive.zip",
  "file_to_extract": "path/in/zip/to/file.txt"
}

Parameters

operation: compress or decompress.
input_s3_manifest_url: (Compress) S3 URL to a text file containing a list of files/directories to archive.
output_s3_url: (Compress) S3 URL where the final zip archive will be stored.
delete_source_files: (Compress) Whether to delete source files after successful archiving (default: false).
include_s3_name: (Compress) Whether to include S3 bucket names in the zip archive paths (default: true).
max_workers: (Compress) Maximum number of concurrent workers for downloading files (default: 256).
source_s3_url: (Decompress) S3 URL of the source zip archive.
file_to_extract: (Decompress) The full path of the file to extract from the archive.

When include_s3_name is true, files will be stored in the zip with paths like bucket-name/path/to/file.txt. When false, files will be stored with just their S3 key path like path/to/file.txt.

Prerequisites

(for building and just lambda deployment)

Rust (rustup)
Cargo Lambda (for full deployment)
AWS CLI
AWS SAM CLI

Building

To build just the binary:

cargo lambda build --release --arm64

The binary will be located in target/lambda/s3_log_compressor.

Deployment

To deploy the entire stack:

sam build && sam deploy

How It Works

Compression

Downloads a manifest file from S3 containing a list of files and directories to archive.
Checks accessibility of all buckets mentioned in the manifest.
Lists all files to be processed, intelligently exploring directories as needed.
Downloads files concurrently and adds them to a zip archive. If any download fails, the process aborts.
Uploads the final zip file to the specified S3 location.
Optionally starts an asynchronous process to delete source files.

Decompression

Downloads the specified zip archive from S3.
Extracts the requested file from the archive in memory.
Returns the file content as a base64-encoded string.

Technical Details

The compression engine uses a few key Rust concepts to work safely and efficiently:

Arc<Mutex<...>>: To handle many concurrent file downloads, the core ZipWriter is wrapped in an Arc (Atomic Reference Counter) and a Mutex (Mutual Exclusion lock).
- Arc allows multiple download tasks to safely share ownership of the writer.
- Mutex ensures that only one task can write to the zip file at a time, preventing data corruption.
ZipWriter: This is a utility from the zip crate that handles the low-level details of creating a valid .zip archive structure.
BufWriter: To improve performance, file writes are sent through a BufWriter. It acts as an in-memory buffer, collecting smaller writes into a single larger, more efficient write to the filesystem, reducing I/O overhead.

Environment Variables

KMS_KEY_ID: KMS key ID for server-side encryption (optional)

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.github/workflows		.github/workflows
dev		dev
events		events
rust-compressor		rust-compressor
rust-orchestrator		rust-orchestrator
.gitignore		.gitignore
README.md		README.md
build-lambda.sh		build-lambda.sh
example_manifest.txt		example_manifest.txt
samconfig.toml		samconfig.toml
template.yaml		template.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

S3 Log Compressor

Features

Orchestrator Lambda

Orchestrator Event Structure

Event Structure

Compress Operation

Decompress Operation

Parameters

Prerequisites

Building

Deployment

How It Works

Compression

Decompression

Technical Details

Environment Variables

About

Uh oh!

Releases 7

Packages

Uh oh!

Languages

DuckyBlender/s3-log-compressor

Folders and files

Latest commit

History

Repository files navigation

S3 Log Compressor

Features

Orchestrator Lambda

Orchestrator Event Structure

Event Structure

Compress Operation

Decompress Operation

Parameters

Prerequisites

Building

Deployment

How It Works

Compression

Decompression

Technical Details

Environment Variables

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 7

Packages 0

Uh oh!

Languages

Packages