Skip to content

Production-ready Go microservice with zero-downtime Kubernetes deployments on GKE Autopilot

License

Notifications You must be signed in to change notification settings

fahadAziz44/zero-downtime-go-api

Repository files navigation

User Management Microservice

A production-grade Go microservice designed for zero-downtime Kubernetes deployments, automated CI/CD, and high-security standards.

Go Version Kubernetes Docker Image Size License: MIT


🎯 System Overview

This service was engineered for high-availability deployment on Google Kubernetes Engine (GKE Autopilot):

  • βœ… Zero-Downtime Deployments - Rolling updates with health probes and graceful shutdown
  • βœ… Multi-Environment Architecture - Isolated staging and production namespaces
  • βœ… Automated CI/CD Pipeline - Quality gates, security scanning, progressive deployment
  • βœ… Cloud-Native Design - GKE Autopilot-ready with managed PostgreSQL support
  • βœ… Observability - Structured JSON logging with request tracing
  • βœ… Docker Optimization - 98% size reduction (1.87GB β†’ 36MB)

Deployment Architecture:

Note: The live deployment on GKE has been paused to minimize cloud costs. The infrastructure code remains fully functional and can be redeployed at any time.


πŸ—οΈ Architecture Overview

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                  GitHub Actions CI/CD                       β”‚
β”‚                                                             |
β”‚  Lint -> Formatting β†’ Security Scan β†’ Tests β†’ Build β†’ Deployβ”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                  β”‚
                  ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              Google Kubernetes Engine (GKE)             β”‚
β”‚                                                         β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”        β”‚
β”‚  β”‚ Staging Namespaceβ”‚      β”‚Production Namespaceβ”‚       β”‚
β”‚  β”‚                  β”‚      β”‚                  β”‚        β”‚
β”‚  β”‚  Load Balancer   β”‚      β”‚  Load Balancer   β”‚        β”‚
β”‚  β”‚  <STAGING_IP>    β”‚      β”‚  <PRODUCTION_IP> β”‚        β”‚
β”‚  β”‚       ↓          β”‚      β”‚       ↓          β”‚        β”‚
β”‚  β”‚  Service (LB)    β”‚      β”‚  Service (LB)    β”‚        β”‚
β”‚  β”‚       ↓          β”‚      β”‚       ↓          β”‚        β”‚
β”‚  β”‚  Deployment      β”‚      β”‚  Deployment      β”‚        β”‚
β”‚  β”‚  β€’ 2 Replicas    β”‚      β”‚  β€’ 3 Replicas    β”‚        β”‚
β”‚  β”‚  β€’ Health Probes β”‚      β”‚  β€’ Health Probes β”‚        β”‚
β”‚  β”‚  β€’ Auto Rollback β”‚      β”‚  β€’ Zero Downtime β”‚        β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜        β”‚
β”‚           β”‚                         β”‚                  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
            β”‚                         β”‚
            ↓                         ↓
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚  Neon Staging β”‚         β”‚  Neon Productionβ”‚
    β”‚  PostgreSQL   β”‚         β”‚  PostgreSQL   β”‚
    β”‚  (SSL/TLS)    β”‚         β”‚  (SSL/TLS)    β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸš€ Key Features

Zero-Downtime Deployments

  • Rolling updates with maxUnavailable: 0 in production
  • Health probes prevent traffic to unhealthy pods (liveness + readiness)
  • Graceful shutdown with configurable preStop hooks
  • Automated smoke tests validate deployments before traffic routing
  • Instant rollback on deployment failure

Production-Grade CI/CD

  • Parallel quality gates: Linting, security scanning (gosec), unit tests
  • Progressive deployment: Staging (automatic) β†’ Production (manual approval)
  • Immutable deployments: SHA-tagged Docker images for traceability
  • Environment isolation: Separate namespaces, configs, and databases
  • Automated rollback: Failed deployments revert automatically

Cloud-Native Architecture

  • GKE Autopilot: Fully managed Kubernetes with auto-scaling
  • Managed Database: Neon PostgreSQL with SSL/TLS encryption
  • Secret Management: Kubernetes secrets (not hardcoded credentials)
  • Resource Optimization: CPU/memory limits prevent resource exhaustion
  • Security Hardening: Non-root containers, minimal attack surface

Observability & Monitoring

  • Structured JSON logging with automatic log levels (INFO/WARN/ERROR)
  • Request tracing via unique X-Request-ID headers
  • Health endpoints: /health (liveness) and /ready (readiness)
  • Prometheus-ready: Annotations for metrics scraping
  • Latency tracking: Automatic request duration logging

Docker Optimization

  • 36MB final image (98% reduction from naive 1.87GB build)
  • Multi-stage distroless build for minimal attack surface
  • Static binary compilation (no runtime dependencies)
  • Security: Runs as non-root user (uid 65532)
  • Build caching: Optimized layer structure for fast rebuilds

πŸ“¦ Tech Stack

Component Technology Purpose
Language Go 1.25 High-performance backend
Web Framework Gin HTTP routing and middleware
Database PostgreSQL (Neon) Managed, serverless SQL database
Container Docker (Distroless) Minimal, secure runtime
Orchestration Kubernetes (GKE Autopilot) Zero-downtime deployments
CI/CD GitHub Actions Automated testing and deployment
Logging log/slog Structured JSON logging
Registry GitHub Container Registry (GHCR) Docker image storage

🏁 Quick Start

Local Development

# Clone the repository
git clone https://github.com/fahadAziz44/zero-downtime-go-api.git
cd zero-downtime-go-api

# Start database and application with Docker Compose
docker-compose up --build

# Run database migrations (in another terminal)
make migrate-up

# The API will be available at http://localhost:8080

Test the Live Deployment

# Replace <PRODUCTION_IP> with your actual GKE Load Balancer IP

# Production health check
curl http://<PRODUCTION_IP>/health

# List all users
curl http://<PRODUCTION_IP>/api/v1/users

# Create a user
curl -X POST http://<PRODUCTION_IP>/api/v1/users \
  -H "Content-Type: application/json" \
  -d '{
    "username": "johndoe",
    "email": "[email protected]",
    "full_name": "John Doe"
  }'

# Get user by username
curl http://<PRODUCTION_IP>/api/v1/users/username/johndoe

Development Workflow

# Run all validation checks (lint, security, tests)
make validate

# Run tests with coverage
make coverage

# Build Docker image locally
make docker-build

# Run application locally (DB in Docker)
make db          # Start PostgreSQL container
make migrate-up  # Run migrations
make run         # Start Go application

πŸ“š API Endpoints

Base URL:

  • Local: http://localhost:8080/api/v1
  • Production: http://<PRODUCTION_IP>/api/v1 (replace with your GKE Load Balancer IP)
Method Endpoint Description
GET /health Liveness probe (Kubernetes)
GET /ready Readiness probe (database connectivity)
GET /users List all users
GET /users/username/:username Get user by username
GET /users/id/:id Get user by UUID
POST /users Create new user
PATCH /users/id/:id Update user by UUID
DELETE /users/id/:id Delete user by UUID

Example Request:

curl -X POST http://localhost:8080/api/v1/users \
  -H "Content-Type: application/json" \
  -d '{
    "username": "alice",
    "email": "[email protected]",
    "full_name": "Alice Johnson"
  }'

Note: Replace localhost:8080 with your deployment URL when running in Kubernetes.

Example Response:

{
  "id": "123e4567-e89b-12d3-a456-426614174000",
  "username": "alice",
  "email": "[email protected]",
  "full_name": "Alice Johnson",
  "created_at": "2024-11-19T10:30:00Z",
  "updated_at": "2024-11-19T10:30:00Z"
}

πŸ”’ Security Features

  • UUID-based primary keys (prevents enumeration attacks)
  • SQL injection prevention (parameterized queries)
  • Input validation (username, email, full_name constraints)
  • Non-root containers (uid 65532, dropped capabilities)
  • Minimal Docker images (distroless, no shell, no package manager)
  • TLS/SSL database connections (Neon PostgreSQL requires encryption)
  • Optional X-API-Key authentication (header-based access control)
  • Secret management (Kubernetes secrets, not hardcoded)

πŸ› οΈ CI/CD Pipeline

CI Workflow (.github/workflows/ci.yml)

Triggers: All branches and pull requests Duration: ~45 seconds (parallel execution)

Lint (golangci-lint) ──┐
                       |
                       |
Code formatting (gofmt)|
                       β”œβ”€β”€> Quality Gate
Security Scan (gosec) ──
                       β”‚
Unit Tests ─────────────
                       β”‚
Build Verification β”€β”€β”€β”€β”˜

CD Workflow (.github/workflows/deploy.yml)

Triggers: Push to master branch Duration: ~10-15 minutes

Build & Push Docker Image (SHA-tagged)
            ↓
Deploy to Staging (automatic)
  β€’ Update image with SHA tag
  β€’ Rolling update (2 replicas)
  β€’ Smoke tests (/health, /ready)
            ↓
Deploy to Production (manual approval required)
  β€’ Update image with SHA tag
  β€’ Zero-downtime rolling update (3 replicas)
  β€’ Smoke tests (/health, /ready)
  β€’ Auto-rollback on failure

Note: Deployments to GKE are currently disabled. The deployment code remains visible to demonstrate CI/CD practices. To enable deployments, see the Enabling Deployments section in kubernetes/README_GKE.md.

Key Features:

  • Immutable deployments: Every commit creates a unique SHA-tagged image
  • Progressive rollout: Staging validates changes before production
  • Automated validation: Health checks prevent bad deployments
  • Traceability: Know exactly which commit is running in each environment

πŸ“Š Kubernetes Deployment

Multi-Environment Strategy

Setting Staging Production
Replicas 2 3
Downtime Tolerance 50% (1 pod) 0% (zero-downtime)
Memory 128Mi-256Mi 256Mi-512Mi
CPU 100m-500m 250m-1000m
Database Neon dev branch Neon production branch
Deployment Automatic Manual approval

Zero-Downtime Configuration

Production deployment strategy:

strategy:
  type: RollingUpdate
  rollingUpdate:
    maxSurge: 1        # Create 1 extra pod during update
    maxUnavailable: 0  # Never drop below 3 running pods

Health probes prevent bad deployments:

livenessProbe:
  httpGet:
    path: /health
    port: 8080
  initialDelaySeconds: 15
  periodSeconds: 20
  failureThreshold: 3

readinessProbe:
  httpGet:
    path: /ready
    port: 8080
  initialDelaySeconds: 10
  periodSeconds: 10
  failureThreshold: 3

Graceful shutdown prevents connection drops:

lifecycle:
  preStop:
    exec:
      command: ["/bin/sh", "-c", "sleep 10"]  # Drain connections

πŸ“– Documentation

Comprehensive documentation is available in the docs/ directory:


Testing

# Run all unit tests
make test

# Run tests with coverage report
make coverage

# Generate HTML coverage report
make coverage-html
open coverage.html

Test Coverage: Service layer has comprehensive unit tests following the Given-When-Then pattern.

Example test:

// Given: A valid user exists in the repository
func TestGetByID_Success(t *testing.T) {
    // When: Fetching user by ID
    user, err := service.GetByID(ctx, validID)

    // Then: User is returned without error
    assert.NoError(t, err)
    assert.Equal(t, "johndoe", user.Username)
}

Run API Tests

# Test local deployment
./test-api.sh http://localhost:8080

# Test production deployment (replace with your IP)
./test-api.sh http://<PRODUCTION_IP>

# Keep test data for debugging
./test-api.sh --no-cleanup

---

## βš™οΈ **Configuration**

The application uses **environment-based configuration** with validation and fail-fast behavior.

**Required Environment Variables:**
```bash
POSTGRES_USER=your_user
POSTGRES_PASSWORD=your_password

Optional Environment Variables (with defaults):

POSTGRES_HOST=localhost      # Database host
POSTGRES_PORT=5432          # Database port
POSTGRES_DB=cruder          # Database name
POSTGRES_SSL_MODE=disable   # SSL mode (use 'require' in production)
PORT=8080                   # Application port
API_KEY=                    # Optional API key for authentication

Development Setup:

  1. Copy .env.example to .env
  2. Update POSTGRES_USER and POSTGRES_PASSWORD
  3. Start the application with docker-compose up

Production Setup:

  • Configuration is managed via Kubernetes ConfigMaps and Secrets
  • Sensitive credentials (database password, API keys) are stored in Kubernetes Secrets
  • Non-sensitive config (database host, port) is stored in ConfigMaps

πŸ” Authentication (Optional)

The API supports optional X-API-Key authentication:

Enable authentication:

# Add to .env file
API_KEY=your-secret-key-here

Make authenticated requests:

curl -H "X-API-Key: your-secret-key-here" \
  http://localhost:8080/api/v1/users

Responses:

  • βœ… Valid key β†’ Request proceeds
  • ❌ Missing header β†’ 401 Unauthorized
  • ❌ Wrong key β†’ 403 Forbidden

Development mode: Leave API_KEY unset to disable authentication during local development.


🚧 Future Enhancements

Potential improvements to make this even more production-ready:

  • HTTPS/TLS - SSL certificates for secure communication
  • Rate Limiting - Protect API from abuse (currently implemented at LB level via Cloud Armor)
  • Monitoring - Prometheus/Grafana dashboards with alerts
  • Terraform - Infrastructure as Code for GKE and Neon
  • Integration Tests - End-to-end API validation in CI/CD
  • Database Backups - Automated backup and restore procedures
  • API Documentation - Swagger/OpenAPI specification
  • JWT Authentication - Per-user authentication (currently using API key)
  • Pagination - Handle large datasets efficiently
  • Feature Flags - Gradual rollouts and safe feature deployment

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.


πŸ™ Acknowledgments

This project evolved from a technical assessment into a comprehensive exploration of production-grade backend architecture. It represents the type of system I'd build for real-world use, with all the operational considerations that come with running services in production.


πŸ“¬ Contact

Built by: Fahad Aziz GitHub: @fahadAziz44


⭐ If you find this useful, please consider giving it a star!

About

Production-ready Go microservice with zero-downtime Kubernetes deployments on GKE Autopilot

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors 2

  •  
  •