LogFlow is a high-performance, fault-tolerant log ingestion pipeline built in Go. It provides a complete, containerized solution for receiving, processing, and storing log data at scale, using Kafka for message queuing and Elasticsearch for durable, searchable storage.
This project is designed with a microservices architecture, ensuring that each component is decoupled, independently scalable, and resilient to failure.
The pipeline follows an asynchronous, multi-stage process. Each service is a distinct Go microservice that communicates via Kafka topics, ensuring high throughput and system stability.
┌──────────┐ ┌──────────┐ ┌───────────┐ ┌───────────────┐
[HTTP Client]───> │ Ingestor │ ───> │ Kafka │ ───> │ Parser │ ───> │ Kafka │
└──────────┘ │ raw_logs │ └───────────┘ │ parsed_logs │
└──────────┘ │ └───────────────┘
▼ │
┌──────────────┐ ▼
│ Kafka │ ┌────────────────┐
│ raw_logs_dlq │ │ Storage Writer │
└──────────────┘ └────────────────┘
│
▼
┌─────────────────┐
│ Elasticsearch │
└─────────────────┘
- Ingestor: A Go service using the Gin framework that exposes an HTTP endpoint (
/log) to receive log entries. It performs initial validation (size, format, required fields) and publishes valid raw logs to theraw_logsKafka topic. - Parser: A Kafka consumer that reads from
raw_logs. It normalizes data (e.g., standardizing log levels, parsing timestamps), enriches logs with metadata, and generates a unique ID. Processed logs are published toparsed_logs. Logs that fail parsing are routed to a Dead-Letter Queue (raw_logs_dlq) for later analysis. - Storage Writer: Consumes structured logs from the
parsed_logstopic. It creates daily time-based indices in Elasticsearch and writes the final log document. If Elasticsearch is unavailable or rejects a document, the message is routed to its own DLQ (parsed_logs_dlq).
- Fault Tolerance: Utilizes Dead-Letter Queues (DLQs) at both the parsing and storage stages to prevent data loss from malformed messages or downstream service outages.
- Structured, Centralized Logging: All services use Go's native
sloglibrary to output JSON-formatted logs, enabling easier debugging and analysis of the pipeline itself. - Asynchronous & Decoupled: Kafka acts as a buffer, allowing the ingestor to handle traffic spikes without overwhelming the processing and storage layers. Services can be scaled, updated, or restarted independently.
- Comprehensive Testing: The project is validated by both unit tests and a full end-to-end integration test that spins up the entire stack.
- Containerized: Fully defined in
docker-compose.ymlfor a reproducible, one-command setup. - Centralized Configuration: Uses a
config.yamlfile with environment variable overrides for flexible deployment.
- Language: Go
- Services: Gin (HTTP), Sarama (Kafka Client), go-elasticsearch (Elasticsearch Client)
- Infrastructure: Docker, Docker Compose, Kafka, Elasticsearch
- Go (latest version recommended)
- Docker and Docker Compose
git clone https://github.com/MinuteHanD/log-pipeline.git
cd log-pipelineThis command builds the Go services, starts all containers, and runs them in the background.
docker compose up -d --buildA helper script is provided to send a mix of valid and invalid logs to the ingestor, which helps test the main pipeline and the Dead-Letter Queue functionality.
./send_all_logs.shWait a few moments for the logs to be processed. You can then query Elasticsearch to see the stored logs. This command fetches the 10 most recent logs.
curl -X GET "http://localhost:9200/logs-*/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query": { "match_all": {} },
"size": 10,
"sort": [ { "timestamp": { "order": "desc" } } ]
}'You can also view the data in Kibana by visiting http://localhost:5601.
For local development, you can run the Go services directly on your machine while the backing services (Kafka, Elasticsearch) run in Docker.
docker compose up -d kafka elasticsearch kibanaThe services are configured to connect to the infrastructure running in Docker. The following environment variables allow the Go services to find them. You can set these in your shell's configuration file (e.g., .bashrc, .zshrc) or export them in each terminal session.
export KAFKA_BROKERS=localhost:9092
export ELASTICSEARCH_URL=http://localhost:9200Open a separate terminal for each service and run the following commands:
# Terminal 1: Ingestor
go run ./ingestor
# Terminal 2: Parser
go run ./parser
# Terminal 3: Storage Writer
go run ./storage-writerThe project includes both unit tests and integration tests, which are separated by Go build tags.
-
Run Unit Tests: These tests are fast and do not require any external dependencies.
go test -v ./... -
Run the End-to-End Integration Test: This test will automatically start and stop the required Docker containers. Make sure Docker is running before executing this command.
go test -v -tags=integration
The pipeline exposes key operational metrics from each service, which are collected by Prometheus and visualized in a pre-built Grafana dashboard. This provides at-a-glance visibility into:
- Ingestor: Logs received, invalid logs, and successful handoffs to Kafka.
- Parser: Logs processed and logs sent to the Dead-Letter Queue.
- Storage Writer: Logs successfully written to Elasticsearch and failures.
Alerting rules are configured in Alertmanager to notify operators of critical issues, such as a high rate of failed logs or services being down.
To view the Grafana dashboard, visit http://localhost:3000 (default credentials: admin/admin).
- Correlation IDs: Implement a correlation ID at the
ingestorand pass it through Kafka headers to trace a single request across all services. - DLQ Re-processing: Build a utility or service to consume from the DLQ topics, attempt to re-process messages, and archive unrecoverable ones.