A simple AWS Lambda function that downloads multiple files from S3 and creates a single zip archive, or retrieves a single file from an archive. Perfect for archiving log files or consolidating multiple files into a single downloadable package. The zip uses zstd for compression (mainly for paths and metadata)
- Dual-Mode Operation: Supports both compressing files into an archive and decompressing a single file from an archive.
- Batch Processing: Process thousands of files from an S3 manifest.
- Asynchronous Cleanup: Optionally deletes source files asynchronously after successful archiving.
- Robust Validation: Halts on any file download failure to ensure archive integrity.
- Optimized Manifest Parsing: Reduces S3 API calls by intelligently distinguishing files from directories in the manifest.
- Concurrent Downloads: Configurable worker threads for parallel processing.
- KMS Encryption: Supports server-side encryption with KMS keys.
The orchestrator lambda is responsible for processing a large S3 inventory export. It filters the inventory for a specific day, groups the files by AWS account and region, generates a manifest for each group, and then invokes the s3-log-compressor lambda to compress the files for each group.
{
"inventory_s3_url": "s3://your-source-bucket/path/to/inventory.csv",
"output_manifest_s3_prefix": "s3://your-target-bucket/manifests/",
"date_to_process": "2024-12-25"
}inventory_s3_url: The S3 URL of the inventory export file (in CSV format).output_manifest_s3_prefix: The S3 prefix where the generated manifest files will be stored.date_to_process: The date (inYYYY-MM-DDformat) to filter the inventory by.
The operation field determines the function's behavior.
{
"operation": "compress",
"input_s3_manifest_url": "s3://bucket/manifest.txt",
"output_s3_url": "s3://bucket/archive.zip",
"delete_source_files": false,
"include_s3_name": true,
"max_workers": 256
}{
"operation": "decompress",
"source_s3_url": "s3://bucket/archive.zip",
"file_to_extract": "path/in/zip/to/file.txt"
}operation:compressordecompress.input_s3_manifest_url: (Compress) S3 URL to a text file containing a list of files/directories to archive.output_s3_url: (Compress) S3 URL where the final zip archive will be stored.delete_source_files: (Compress) Whether to delete source files after successful archiving (default:false).include_s3_name: (Compress) Whether to include S3 bucket names in the zip archive paths (default:true).max_workers: (Compress) Maximum number of concurrent workers for downloading files (default: 256).source_s3_url: (Decompress) S3 URL of the source zip archive.file_to_extract: (Decompress) The full path of the file to extract from the archive.
When include_s3_name is true, files will be stored in the zip with paths like bucket-name/path/to/file.txt. When false, files will be stored with just their S3 key path like path/to/file.txt.
(for building and just lambda deployment)
- Rust (rustup)
- Cargo Lambda (for full deployment)
- AWS CLI
- AWS SAM CLI
To build just the binary:
cargo lambda build --release --arm64The binary will be located in target/lambda/s3_log_compressor.
To deploy the entire stack:
sam build && sam deploy- Downloads a manifest file from S3 containing a list of files and directories to archive.
- Checks accessibility of all buckets mentioned in the manifest.
- Lists all files to be processed, intelligently exploring directories as needed.
- Downloads files concurrently and adds them to a zip archive. If any download fails, the process aborts.
- Uploads the final zip file to the specified S3 location.
- Optionally starts an asynchronous process to delete source files.
- Downloads the specified zip archive from S3.
- Extracts the requested file from the archive in memory.
- Returns the file content as a base64-encoded string.
The compression engine uses a few key Rust concepts to work safely and efficiently:
Arc<Mutex<...>>: To handle many concurrent file downloads, the coreZipWriteris wrapped in anArc(Atomic Reference Counter) and aMutex(Mutual Exclusion lock).Arcallows multiple download tasks to safely share ownership of the writer.Mutexensures that only one task can write to the zip file at a time, preventing data corruption.
ZipWriter: This is a utility from thezipcrate that handles the low-level details of creating a valid.ziparchive structure.BufWriter: To improve performance, file writes are sent through aBufWriter. It acts as an in-memory buffer, collecting smaller writes into a single larger, more efficient write to the filesystem, reducing I/O overhead.
KMS_KEY_ID: KMS key ID for server-side encryption (optional)