Skip to content

Commit a453b8b

Browse files
authored
[MAINTENANCE] Adding terraform documentation (#102)
* Adding documentation for modules
1 parent ac6b4b4 commit a453b8b

File tree

11 files changed

+405
-123
lines changed

11 files changed

+405
-123
lines changed
File renamed without changes.

README.md

Lines changed: 45 additions & 54 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# Data Quality Gate
22

33
## Description
4-
Terrafrom module which setup Data-QA solution(bucket,Stepfunctions Pipeline with AWS Lambda, Metadata Storage. Data-QA Reports) in your infrastructure in 'one-click'. AWS Based. Built on top of Great_expectations, Pandas_profiling, Allure
4+
Terraform module which setups DataQA solution in your infrastructure in 'one-click'. AWS Based. Built on top of Great_expectations, Pandas_profiling, Allure
55

66
### Data Test
77
Main engine based on GX to profile, generate suites and run tests
@@ -15,64 +15,55 @@ Metadata and metrics aggregation
1515
## Solution Architecture
1616
![Preview Image](https://raw.githubusercontent.com/provectus/data-quality-gate/main/architecture.PNG)
1717

18+
## Supported Features
19+
20+
- AWS Lambda runtime Python 3.9
21+
- AWS StepFunction pipeline, combining whole DataQA cycle(profiling, test generation, reporting)
22+
- Supports Slack and Jira notifications and reporting
23+
- AWS SNS output message bus, allowing to embed to existing data pipelines
24+
- Web reports delivery through Nginx for companies VPN/IP set
25+
- AWS DynamoDB and Athena integration, allowing to build AWS QuickSight or Grafana dashboards
26+
- Flexible way of config management for underlying technologies such as Allure and GreatExpectation
27+
1828
## Usage
19-
Could be used as standard Terraform module, the examples of deployments under `examples` directory.
2029

21-
1. Add to terraform DataQA module as in examples
22-
2. Add to terraform state machine `DataTests` step
23-
```terraform
24-
resource "aws_sfn_state_machine" "data_state_machine" {
25-
definition = jsonencode(
26-
{
27-
StartAt = "GetData"
28-
States = {
29-
GetData = {
30-
Next = "DataTests"
31-
Resource = aws_lambda_function.some_get_data.function_name
32-
ResultPath = "$.file"
33-
Type = "Task"
34-
}
35-
DataTests = {
36-
Type = "Task"
37-
Resource = "arn:aws:states:::states:startExecution.sync:2",
38-
End = true
39-
Parameters = {
40-
StateMachineArn = module.data-qa.qa_step_functions_arn
41-
Input = {
42-
files = [
43-
{
44-
engine = "s3"
45-
source_root = var.data_lake_bucket
46-
run_name = "raw_data"
47-
"source_data.$" = "$.file"
48-
}
49-
]
50-
}
51-
}
52-
}
53-
}
54-
}
55-
)
56-
name = "Data-state-machine"
57-
role_arn = aws_iam_role.state_machine.arn // role with perms on lambda:InvokeFunction
58-
type = "STANDARD"
59-
60-
logging_configuration {
61-
include_execution_data = false
62-
level = "OFF"
63-
}
30+
```hcl
31+
module "data_qa" {
32+
source = "github.com/provectus/data-quality-gate"
6433
65-
tracing_configuration {
66-
enabled = false
34+
data_test_storage_bucket_name = "my-data-settings-dev"
35+
s3_source_data_bucket = "my-data-bucket"
36+
environment = "example"
37+
project = "my-project"
38+
39+
allure_report_image_uri = "xxxxxxxxxxxx.dkr.ecr.xx-xxxx-x.amazonaws.com/dqg-allure_report:latest"
40+
data_test_image_uri = "xxxxxxxxxxxx.dkr.ecr.xx-xxxx-x.amazonaws.com/dqg-data_test:latest"
41+
push_report_image_uri = "xxxxxxxxxxxx.dkr.ecr.xx-xxxx-x.amazonaws.com/dqg-push_reportt:latest"
42+
43+
data_reports_notification_settings = {
44+
channel = "DataReportSlackChannelName"
45+
webhook_url = "https://hooks.slack.com/services/xxxxxxxxxxxxxxx"
6746
}
47+
48+
lambda_private_subnet_ids = ["private_subnet_id"]
49+
lambda_security_group_ids = ["security_group_id"]
50+
51+
reports_vpc_id = "some_vpc_id"
52+
reports_subnet_id = "subnet_id"
53+
reports_whitelist_ips = ["0.0.0.0/0"]
6854
}
6955
```
70-
3. Create AWS Serverless application* - [AthenaDynamoDBConnector](https://us-west-2.console.aws.amazon.com/lambda/home?region=us-west-2#/create/app?applicationId=arn:aws:serverlessrepo:us-east-1:292517598671:applications/AthenaDynamoDBConnector) with parameters:
71-
- SpillBucket - name of bucket created by terraform module
72-
- AthenaCatalogName - The name you will give to this catalog in Athena. It will also be used as the function name.
7356

74-
*Cannot be created automatically by terraform because [terraform-provider-aws/issues/16485](https://github.com/hashicorp/terraform-provider-aws/issues/16485)
57+
## Examples
58+
59+
Could be used as standard Terraform module, the examples of deployments under `examples` directory.
60+
61+
- [data-qa-basic](https://github.com/provectus/data-quality-gate/tree/main/examples/basic) - Creates DataQA module which builds AWS infrastructure.
62+
63+
## Local Development and Testing
64+
65+
See the [functions](https://github.com/provectus/data-quality-gate/tree/main/functions) for further details.
66+
67+
## License
7568

76-
4. Create AWS Athena Data Source:
77-
- Data source type -> Amazon DynamoDB
78-
- Connection details -> lambda function -> name of `AthenaCatalogName` from pt.3
69+
Apache 2 Licensed. See [LICENSE](https://github.com/provectus/data-quality-gate/tree/main/LICENSE) for full details.

docs/inframap.png

89.8 KB
Loading

examples/basic/README.md

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
Basic Data QA example
2+
========================
3+
4+
Configuration in this directory shows how to instantiate a Data QA module that consists from various AWS services.
5+
6+
Note, this example does not contain required high-level aws global infrastructure such as vpc and networking. To see module requirements go to [README](https://github.com/provectus/data-quality-gate/tree/main/terraform/README.md)
7+
8+
Usage
9+
=====
10+
11+
To run this example you need to execute:
12+
13+
```bash
14+
$ terraform init
15+
$ terraform plan
16+
$ terraform apply
17+
```
18+
19+
Note that this example may create resources which can cost money (AWS EC2 instance, for example). Run `terraform destroy` when you don't need these resources.
20+
<!-- BEGIN_TF_DOCS -->
21+
## Requirements
22+
23+
| Name | Version |
24+
|------|---------|
25+
| <a name="requirement_terraform"></a> [terraform](#requirement\_terraform) | ~> 1.1 |
26+
| <a name="requirement_aws"></a> [aws](#requirement\_aws) | ~> 4.64.0 |
27+
28+
## Providers
29+
30+
No providers.
31+
32+
## Modules
33+
34+
| Name | Source | Version |
35+
|------|--------|---------|
36+
| <a name="module_data_qa"></a> [data\_qa](#module\_data\_qa) | ../../terraform | n/a |
37+
38+
## Resources
39+
40+
No resources.
41+
42+
## Inputs
43+
44+
No inputs.
45+
46+
## Outputs
47+
48+
No outputs.
49+
<!-- END_TF_DOCS -->

examples/basic/versions.tf

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
11
terraform {
2-
required_version = ">= 1.1.7"
3-
42
required_providers {
53
aws = {
64
source = "hashicorp/aws"
75
version = "~> 4.64.0"
86
}
97
}
8+
9+
required_version = "~> 1.1"
1010
}

0 commit comments

Comments
 (0)