Project Improvements Plan

This document outlines a plan for implementing high-impact improvements to the "Cloud Integration Showcase" project. These features are designed to more deeply align with the skills requested in the job description, demonstrating advanced capabilities in integration, data processing, and system design.

Phase 1: Advanced Data Transformation with AWS Glue

Objective: To demonstrate proficiency with data transformation and ETL processes, as mentioned in the job ad.
Status: ✅ Completed
Steps:
1. Create Complex Source Data: Generate a new sample dataset in a nested JSON format (e.g., representing licenses with multiple history entries per record).
2. Write Glue ETL Script: Develop a Python/PySpark script for an AWS Glue ETL job. This script will:
  - Read the nested JSON from the source S3 bucket.
  - Flatten the nested structure.
  - Perform data cleaning (e.g., standardize date formats, handle null values).
  - Write the transformed data to a new S3 prefix in the efficient Parquet format.
3. Update CDK Stack:
  - Add the AWS Glue ETL job resource to the integration-engineer-project-stack.ts.
  - Modify the Glue Crawler to point to the new Parquet data location.
4. Update Documentation: Update ARCHITECTURE.md and README.md to reflect the new ETL step in the data querying flow.

Phase 2: Bidirectional Salesforce Integration

Objective: To provide a direct and powerful demonstration of the Salesforce integration skills required by the role.
Status: ✅ Completed
Steps:
1. Setup Salesforce Dev Account: Create a free Salesforce Developer Edition account.
2. Configure Salesforce Connected App: Set up a Connected App in Salesforce to get API credentials (client ID, client secret) for OAuth 2.0 authentication.
3. Store Credentials in AWS: Store the Salesforce credentials securely in AWS Secrets Manager.
4. Update Print Request Lambda (AWS to Salesforce):
  - Modify the TypeScript Lambda to fetch the Salesforce credentials from Secrets Manager.
  - Implement logic to make an API call to Salesforce (e.g., using JSforce) to create or update a Case or custom object record when a print request is received.
5. Configure Outbound Message (Salesforce to AWS):
  - In Salesforce, create a workflow rule and an "Outbound Message" action that sends a message to your API Gateway endpoint when a record is updated.
  - This demonstrates handling incoming webhooks from Salesforce.
6. Update Documentation: Add a section to the README.md explaining how to configure the Salesforce integration for testing.

Phase 3: Decoupled Processing with RabbitMQ

Objective: To showcase experience with message brokers, a specific requirement listed in the job ad.
Status: ✅ Completed
Steps:
1. Provision Amazon MQ:
  - Update the CDK stack to provision a small, free-tier Amazon MQ for RabbitMQ broker.
  - Store the broker's connection details in AWS Secrets Manager.
2. Create Publisher Lambda:
  - Modify the "license print request" TypeScript Lambda. Instead of processing the file itself, it will now publish a message containing the request details to a RabbitMQ queue.
3. Create Consumer Lambda:
  - Create a new Lambda function (e.g., in C#/.NET) that will be triggered by messages on the RabbitMQ queue.
  - Configure the Lambda's event source mapping in the CDK to connect to the Amazon MQ instance.
  - This new Lambda will contain the logic to perform the final processing step (e.g., creating the file in S3).
4. Update Architecture Diagrams: Update ARCHITECTURE.md and the Mermaid diagram to show the new message-driven flow.

Phase 4: Enhanced Quality and Security

Objective: To demonstrate a commitment to professional development practices, including testing and security.
Status: ✅ Completed
Steps:
1. Add Unit & Integration Tests:
  - Write unit tests for the business logic within each Lambda function using Jest and xUnit.
  - Use the aws-cdk-lib/assertions module to create integration tests for the CDK stack, verifying that resources are configured correctly.
2. Implement API Security:
  - Create a new Lambda Authorizer function in the CDK.
  - This authorizer will validate a static API key passed in the request headers.
  - Attach the authorizer to the API Gateway endpoint to secure it.
3. Add Observability:
  - Enable AWS X-Ray active tracing for the API Gateway and all Lambda functions within the CDK stack.
  - Define a new CloudWatch Dashboard in the CDK to monitor key application metrics.
4. Update Documentation: Add instructions to the README.md on how to run tests and how to provide the API key when calling the secured endpoint.

Phase 5: Future Enhancements

Objective: To propose future improvements that would further enhance the project and demonstrate additional skills.
Status: 💡 Proposed
Steps:
1. Implement a CI/CD Pipeline:
  - Create a new CDK stack (pipeline-stack.ts) to define a CodePipeline that automatically builds, tests, and deploys the application upon changes to the main branch.
2. Add End-to-End Testing:
  - Create a separate test suite (e.g., using Jest and Puppeteer) that performs end-to-end tests against the deployed application.
3. Containerize the C# Lambda:
  - Use Docker to containerize the C# Document Automation Lambda and deploy it as a container image to ECR.
4. Implement Feature Flags:
  - Integrate a feature flag service (e.g., AWS AppConfig or a third-party service) to dynamically enable or disable features in the application.

Phase 6: Data-Driven Enhancements

Objective: To implement improvements based on a deeper analysis of the expected data flows and types, as inferred from the job description.
Status: 📝 In Progress
Improvements:
1. Implement a Canonical Data Model and Schema Validation:
  - Goal: Decouple services from specific source data formats by creating a standardized, internal data model.
  - Steps:
    - Define canonical TypeScript interfaces for core objects like License and LicenseAuditEvent.
    - Update the Glue ETL script to transform legacy data into the canonical model.
    - Enforce the model at the API Gateway level using JSON Schema request validation.
2. Enhance the Glue ETL Script for Robustness:
  - Goal: Make the data transformation pipeline resilient to messy or unexpected legacy data.
  - Steps:
    - Add data quality checks to validate records before transformation.
    - Implement a "dead-letter queue" pattern to isolate and store bad records without failing the entire ETL job.
    - Add rich logging and custom metrics for visibility into data quality.

Phase 7: Security Hardening

Objective: To enhance the security posture of the application by implementing best practices for IAM, data protection, and network configuration.
Status: 📝 In Progress
Improvements:
1. Implement the Principle of Least Privilege (IAM):
  - Goal: Reduce the blast radius of a potential compromise by ensuring services have only the permissions they absolutely need.
  - Steps:
    - Replace broad, AWS-managed policies with fine-grained inline policies for each IAM role.
    - Scope S3 bucket access to specific prefixes (e.g., s3:GetObject on source/*, s3:PutObject on processed/*).
    - Restrict secretsmanager:GetSecretValue permissions to the specific secret ARNs required by each Lambda.
2. Enhance Data Protection with Encryption:
  - Goal: Increase control and auditability over data encryption by using customer-managed keys.
  - Steps:
    - Create a dedicated AWS KMS Customer-Managed Key (CMK).
    - Apply the KMS key to encrypt the S3 data bucket, the Amazon MQ broker, and all CloudWatch Log groups.
    - Grant appropriate key usage permissions to the service IAM roles.
3. Secure the Network Configuration (VPC):
  - Goal: Isolate the application from the public internet to reduce the attack surface.
  - Steps:
    - Make the Amazon MQ broker private (publiclyAccessible: false).
    - Add VPC Endpoints for S3 and Secrets Manager to enable private communication.
    - Tighten security group ingress rules to only allow traffic from specific application components.
4. Integrate Automated Security Scanning:
  - Goal: Proactively identify and prevent security vulnerabilities from being deployed.
  - Steps:
    - Add a dependency scanning step (e.g., npm audit) to the CodePipeline to check for vulnerable packages.
    - Integrate a Static Application Security Testing (SAST) tool to analyze code for common security flaws.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Project Improvements Plan

Phase 1: Advanced Data Transformation with AWS Glue

Phase 2: Bidirectional Salesforce Integration

Phase 3: Decoupled Processing with RabbitMQ

Phase 4: Enhanced Quality and Security

Phase 5: Future Enhancements

Phase 6: Data-Driven Enhancements

Phase 7: Security Hardening

FilesExpand file tree

IMPROVEMENTS_PLAN.md

Latest commit

History

IMPROVEMENTS_PLAN.md

File metadata and controls

Project Improvements Plan

Phase 1: Advanced Data Transformation with AWS Glue

Phase 2: Bidirectional Salesforce Integration

Phase 3: Decoupled Processing with RabbitMQ

Phase 4: Enhanced Quality and Security

Phase 5: Future Enhancements

Phase 6: Data-Driven Enhancements

Phase 7: Security Hardening