This document outlines a plan for implementing high-impact improvements to the "Cloud Integration Showcase" project. These features are designed to more deeply align with the skills requested in the job description, demonstrating advanced capabilities in integration, data processing, and system design.
-
Objective: To demonstrate proficiency with data transformation and ETL processes, as mentioned in the job ad.
-
Status: ✅ Completed
-
Steps:
- Create Complex Source Data: Generate a new sample dataset in a nested JSON format (e.g., representing licenses with multiple history entries per record).
- Write Glue ETL Script: Develop a Python/PySpark script for an AWS Glue ETL job. This script will:
- Read the nested JSON from the source S3 bucket.
- Flatten the nested structure.
- Perform data cleaning (e.g., standardize date formats, handle null values).
- Write the transformed data to a new S3 prefix in the efficient Parquet format.
- Update CDK Stack:
- Add the AWS Glue ETL job resource to the
integration-engineer-project-stack.ts. - Modify the Glue Crawler to point to the new Parquet data location.
- Add the AWS Glue ETL job resource to the
- Update Documentation: Update
ARCHITECTURE.mdandREADME.mdto reflect the new ETL step in the data querying flow.
-
Objective: To provide a direct and powerful demonstration of the Salesforce integration skills required by the role.
-
Status: ✅ Completed
-
Steps:
- Setup Salesforce Dev Account: Create a free Salesforce Developer Edition account.
- Configure Salesforce Connected App: Set up a Connected App in Salesforce to get API credentials (client ID, client secret) for OAuth 2.0 authentication.
- Store Credentials in AWS: Store the Salesforce credentials securely in AWS Secrets Manager.
- Update Print Request Lambda (AWS to Salesforce):
- Modify the TypeScript Lambda to fetch the Salesforce credentials from Secrets Manager.
- Implement logic to make an API call to Salesforce (e.g., using
JSforce) to create or update aCaseor custom object record when a print request is received.
- Configure Outbound Message (Salesforce to AWS):
- In Salesforce, create a workflow rule and an "Outbound Message" action that sends a message to your API Gateway endpoint when a record is updated.
- This demonstrates handling incoming webhooks from Salesforce.
- Update Documentation: Add a section to the
README.mdexplaining how to configure the Salesforce integration for testing.
-
Objective: To showcase experience with message brokers, a specific requirement listed in the job ad.
-
Status: ✅ Completed
-
Steps:
- Provision Amazon MQ:
- Update the CDK stack to provision a small, free-tier Amazon MQ for RabbitMQ broker.
- Store the broker's connection details in AWS Secrets Manager.
- Create Publisher Lambda:
- Modify the "license print request" TypeScript Lambda. Instead of processing the file itself, it will now publish a message containing the request details to a RabbitMQ queue.
- Create Consumer Lambda:
- Create a new Lambda function (e.g., in C#/.NET) that will be triggered by messages on the RabbitMQ queue.
- Configure the Lambda's event source mapping in the CDK to connect to the Amazon MQ instance.
- This new Lambda will contain the logic to perform the final processing step (e.g., creating the file in S3).
- Update Architecture Diagrams: Update
ARCHITECTURE.mdand the Mermaid diagram to show the new message-driven flow.
- Provision Amazon MQ:
-
Objective: To demonstrate a commitment to professional development practices, including testing and security.
-
Status: ✅ Completed
-
Steps:
- Add Unit & Integration Tests:
- Write unit tests for the business logic within each Lambda function using Jest and xUnit.
- Use the
aws-cdk-lib/assertionsmodule to create integration tests for the CDK stack, verifying that resources are configured correctly.
- Implement API Security:
- Create a new Lambda Authorizer function in the CDK.
- This authorizer will validate a static API key passed in the request headers.
- Attach the authorizer to the API Gateway endpoint to secure it.
- Add Observability:
- Enable AWS X-Ray active tracing for the API Gateway and all Lambda functions within the CDK stack.
- Define a new CloudWatch Dashboard in the CDK to monitor key application metrics.
- Update Documentation: Add instructions to the
README.mdon how to run tests and how to provide the API key when calling the secured endpoint.
- Add Unit & Integration Tests:
-
Objective: To propose future improvements that would further enhance the project and demonstrate additional skills.
-
Status: 💡 Proposed
-
Steps:
- Implement a CI/CD Pipeline:
- Create a new CDK stack (
pipeline-stack.ts) to define a CodePipeline that automatically builds, tests, and deploys the application upon changes to the main branch.
- Create a new CDK stack (
- Add End-to-End Testing:
- Create a separate test suite (e.g., using Jest and Puppeteer) that performs end-to-end tests against the deployed application.
- Containerize the C# Lambda:
- Use Docker to containerize the C# Document Automation Lambda and deploy it as a container image to ECR.
- Implement Feature Flags:
- Integrate a feature flag service (e.g., AWS AppConfig or a third-party service) to dynamically enable or disable features in the application.
- Implement a CI/CD Pipeline:
-
Objective: To implement improvements based on a deeper analysis of the expected data flows and types, as inferred from the job description.
-
Status: 📝 In Progress
-
Improvements:
- Implement a Canonical Data Model and Schema Validation:
- Goal: Decouple services from specific source data formats by creating a standardized, internal data model.
- Steps:
- Define canonical TypeScript interfaces for core objects like
LicenseandLicenseAuditEvent. - Update the Glue ETL script to transform legacy data into the canonical model.
- Enforce the model at the API Gateway level using JSON Schema request validation.
- Define canonical TypeScript interfaces for core objects like
- Enhance the Glue ETL Script for Robustness:
- Goal: Make the data transformation pipeline resilient to messy or unexpected legacy data.
- Steps:
- Add data quality checks to validate records before transformation.
- Implement a "dead-letter queue" pattern to isolate and store bad records without failing the entire ETL job.
- Add rich logging and custom metrics for visibility into data quality.
- Implement a Canonical Data Model and Schema Validation:
-
Objective: To enhance the security posture of the application by implementing best practices for IAM, data protection, and network configuration.
-
Status: 📝 In Progress
-
Improvements:
- Implement the Principle of Least Privilege (IAM):
- Goal: Reduce the blast radius of a potential compromise by ensuring services have only the permissions they absolutely need.
- Steps:
- Replace broad, AWS-managed policies with fine-grained inline policies for each IAM role.
- Scope S3 bucket access to specific prefixes (e.g.,
s3:GetObjectonsource/*,s3:PutObjectonprocessed/*). - Restrict
secretsmanager:GetSecretValuepermissions to the specific secret ARNs required by each Lambda.
- Enhance Data Protection with Encryption:
- Goal: Increase control and auditability over data encryption by using customer-managed keys.
- Steps:
- Create a dedicated AWS KMS Customer-Managed Key (CMK).
- Apply the KMS key to encrypt the S3 data bucket, the Amazon MQ broker, and all CloudWatch Log groups.
- Grant appropriate key usage permissions to the service IAM roles.
- Secure the Network Configuration (VPC):
- Goal: Isolate the application from the public internet to reduce the attack surface.
- Steps:
- Make the Amazon MQ broker private (
publiclyAccessible: false). - Add VPC Endpoints for S3 and Secrets Manager to enable private communication.
- Tighten security group ingress rules to only allow traffic from specific application components.
- Make the Amazon MQ broker private (
- Integrate Automated Security Scanning:
- Goal: Proactively identify and prevent security vulnerabilities from being deployed.
- Steps:
- Add a dependency scanning step (e.g.,
npm audit) to the CodePipeline to check for vulnerable packages. - Integrate a Static Application Security Testing (SAST) tool to analyze code for common security flaws.
- Add a dependency scanning step (e.g.,
- Implement the Principle of Least Privilege (IAM):