Skip to content
Open
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
187 changes: 187 additions & 0 deletions docs/dashboard/CLOUDTRAIL_PPL_INTEGRATION_TESTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,187 @@
# CloudTrail PPL Integration Tests

This document describes the integration tests created for CloudTrail PPL (Piped Processing Language) dashboard queries to ensure they don't break the SQL plugin.

## Overview

The CloudTrail PPL integration tests validate that CloudTrail-related PPL queries can be parsed and executed without causing errors in the OpenSearch SQL plugin. These tests are designed to catch any regressions that might break CloudTrail dashboard functionality.

## Test Files Created

### 1. CloudTrailPplDashboardIT.java
**Location:** `/integ-test/src/test/java/org/opensearch/sql/ppl/CloudTrailPplDashboardIT.java`

This is the main integration test class that contains test methods for all CloudTrail PPL queries. Each test method validates a specific CloudTrail query pattern:

- `testTotalEventsCount()` - Tests basic count aggregation for total events
- `testEventsOverTime()` - Tests count by timestamp for event history
- `testEventsByAccountIds()` - Tests count by account ID with null filtering
- `testEventsByCategory()` - Tests count by event category with sorting
- `testEventsByRegion()` - Tests count by AWS region with sorting
- `testTop10EventAPIs()` - Tests count by event name (API calls)
- `testTop10Services()` - Tests count by event source (AWS services)
- `testTop10SourceIPs()` - Tests count by source IP addresses
- `testTop10UsersGeneratingEvents()` - Tests complex user analysis with multiple fields
- `testS3AccessDenied()` - Tests S3 access denied events with filtering
- `testS3Buckets()` - Tests S3 bucket analysis
- `testTopS3ChangeEvents()` - Tests S3 change events excluding read operations
- `testEC2ChangeEventCount()` - Tests EC2 instance change events
- `testEC2UsersBySessionIssuer()` - Tests EC2 users by session issuer with filtering
- `testEC2EventsByName()` - Tests EC2 events by name with rename operation

### 2. Test Data Files

#### cloudtrail_logs.json
**Location:** `/integ-test/src/test/resources/cloudtrail_logs.json`

Sample CloudTrail log data in OpenSearch bulk format containing realistic CloudTrail log entries with fields like:
- `@timestamp` - Event timestamp
- `aws.cloudtrail.eventName` - API operation name
- `aws.cloudtrail.eventSource` - AWS service source
- `aws.cloudtrail.eventCategory` - Event category (Management/Data)
- `aws.cloudtrail.awsRegion` - AWS region
- `aws.cloudtrail.sourceIPAddress` - Source IP address
- `aws.cloudtrail.userIdentity.*` - User identity information
- `aws.cloudtrail.requestParameters.*` - Request parameters
- `errorCode` - Error code for failed operations

#### cloudtrail_logs_index_mapping.json
**Location:** `/integ-test/src/test/resources/indexDefinitions/cloudtrail_logs_index_mapping.json`

OpenSearch index mapping for CloudTrail logs with proper field types:
- Date fields for timestamps
- IP fields for source addresses
- Keyword fields for categorical data
- Nested object mapping for complex CloudTrail structure

## CloudTrail Queries Tested

The integration tests cover the following CloudTrail PPL queries:

1. **Total Events Count:**
```
source=cloudtrail_logs | stats count() as `Event Count`
```

2. **Events Over Time:**
```
source=cloudtrail_logs | stats count() by span(eventTime, 30d)
```

3. **Events by Account IDs:**
```
source=cloudtrail_logs | where isnotnull(userIdentity.accountId) | stats count() as Count by userIdentity.accountId | sort - Count | head 10
```

4. **Events by Category:**
```
source=cloudtrail_logs | stats count() as Count by eventCategory | sort - Count | head 5
```

5. **Events by Region:**
```
source=cloudtrail_logs | stats count() as Count by `awsRegion` | sort - Count | head 10
```

6. **Top 10 Event APIs:**
```
source=cloudtrail_logs | stats count() as Count by `eventName` | sort - Count | head 10
```

7. **Top 10 Services:**
```
source=cloudtrail_logs | stats count() as Count by `eventSource` | sort - Count | head 10
```

8. **Top 10 Source IPs:**
```
source=cloudtrail_logs | WHERE NOT (sourceIPAddress LIKE '%amazon%.com%') | STATS count() as Count by sourceIPAddress| SORT - Count| HEAD 10
```

9. **Top 10 Users Generating Events:**
```
source=cloudtrail_logs | where ISNOTNULL(`userIdentity.accountId`)| STATS count() as Count by `userIdentity.sessionContext.sessionIssuer.userName`, `userIdentity.accountId`, `userIdentity.sessionContext.sessionIssuer.type` | rename `userIdentity.sessionContext.sessionIssuer.userName` as `User Name`, `userIdentity.accountId` as `Account Id`, `userIdentity.sessionContext.sessionIssuer.type` as `Type` | SORT - Count | HEAD 1000
```

10. **S3 Access Denied:**
```
source=cloudtrail_logs | parse `eventSource` '(?<service>s3.*)' | where isnotnull(service) and `errorCode`='AccessDenied' | stats count() as Count
```

11. **S3 Buckets:**
```
source=cloudtrail_logs | where `eventSource` like 's3%' and isnotnull(`requestParameters.bucketName`) | stats count() as Count by `requestParameters.bucketName` | sort - Count| head 10
```

12. **Top S3 Change Events:**
```
source=cloudtrail_logs | where `eventSource` like 's3%' and not (`eventName` like 'Get%' or `eventName` like 'Describe%' or `eventName` like 'List%' or `eventName` like 'Head%') and isnotnull(`requestParameters.bucketName`) | stats count() as Count by `eventName`, `requestParameters.bucketName` | rename `eventName` as `Event`, `requestParameters.bucketName` as `Bucket Name`| sort - Count | head 100
```

13. **EC2 Change Event Count:**
```
source=cloudtrail_logs | where eventSource like "ec2%" and (eventName = "RunInstances" or eventName = "TerminateInstances" or eventName = "StopInstances") and not (eventName like "Get%" or eventName like "Describe%" or eventName like "List%" or eventName like "Head%") | stats count() as Count by eventName | sort - Count | head 5
```

14. **EC2 Users by Session Issuer:**
```
source=cloudtrail_logs | where isnotnull(`userIdentity.sessionContext.sessionIssuer.userName`) and `eventSource` like 'ec2%' and not (`eventName` like 'Get%' or `eventName` like 'Describe%' or `eventName` like 'List%' or `eventName` like 'Head%') | stats count() as Count by `userIdentity.sessionContext.sessionIssuer.userName` | sort - Count | head 10
```

15. **EC2 Events by Name:**
```
source=cloudtrail_logs | where `eventSource` like "ec2%" and not (`eventName` like "Get%" or `eventName` like "Describe%" or `eventName` like "List%" or `eventName` like "Head%") | stats count() as Count by `eventName` | rename `eventName` as `Event Name` | sort - Count | head 10
```

## Test Strategy

The tests use actual CloudTrail test data loaded into a test index to verify end-to-end functionality. Each test validates:

1. **Query Execution:** Ensures queries execute successfully without parsing errors
2. **Schema Validation:** Verifies correct field types and names in results
3. **Data Validation:** Confirms expected result counts and values
4. **Complex Filtering:** Tests null checks, string matching, and logical operations

## Running the Tests

To run the CloudTrail PPL integration tests:

```bash
# Compile the tests
./gradlew :integ-test:compileTestJava

# Run all PPL integration tests (includes CloudTrail tests)
./gradlew :integ-test:test --tests "*PPL*"

# Run only CloudTrail PPL tests
./gradlew :integ-test:test --tests "*CloudTrailPplDashboardIT*"
```

## Expected Behavior

- **All queries** should execute successfully and return valid results
- **No parsing errors** should occur for any of the CloudTrail PPL query patterns
- **Schema validation** should pass with correct field types
- **Data validation** should confirm expected result counts from test data
- **Complex filtering** should work correctly with null checks and pattern matching

## Benefits

These integration tests provide:

1. **Regression Protection:** Ensures CloudTrail dashboard queries continue to work as the SQL plugin evolves
2. **Query Validation:** Validates that all CloudTrail PPL query patterns are syntactically correct
3. **Field Compatibility:** Ensures CloudTrail field names and nested structures are properly handled
4. **Complex Query Testing:** Validates advanced filtering, grouping, and aggregation patterns
5. **Documentation:** Serves as living documentation of supported CloudTrail query patterns

## Maintenance

When adding new CloudTrail query patterns to dashboards:

1. Add the new query pattern to the test class
2. Update test data if new fields are required
3. Update the index mapping if new field types are needed
4. Run the tests to ensure compatibility

This ensures that all CloudTrail dashboard functionality remains stable and functional.
194 changes: 194 additions & 0 deletions docs/dashboard/NFW_PPL_INTEGRATION_TESTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,194 @@
# Network Firewall PPL Integration Tests

## Overview

This document describes the integration tests for Network Firewall (NFW) PPL dashboard queries in OpenSearch. These tests ensure that NFW-related PPL queries work correctly with actual AWS Network Firewall log data.

## Test Class

**File**: `NfwPplDashboardIT.java`
**Location**: `/integ-test/src/test/java/org/opensearch/sql/ppl/dashboard/`
**Test Data**: `/integ-test/src/test/resources/nfw_logs.json`
**Index Mapping**: `/integ-test/src/test/resources/indexDefinitions/nfw_logs_index_mapping.json`

## Test Coverage

The NFW dashboard tests cover 37 comprehensive dashboard scenarios:

### 1. Top Application Protocols (`testTopApplicationProtocols`)
```sql
source=nfw_logs | where isnotnull(`event.app_proto`) | STATS count() as Count by `event.app_proto` | SORT - Count| HEAD 10
```
- **Purpose**: Shows most common application layer protocols
- **Expected**: unknown (5), http (3), tls (2), dns (2)

### 2. Top Source IP by Packets (`testTopSourceIPByPackets`)
```sql
source=nfw_logs | stats sum(`event.netflow.pkts`) as packet_count by span(`event.timestamp`, 2d) as timestamp_span, `event.src_ip` | rename `event.src_ip` as `Source IP` | sort - packet_count | head 10
```
- **Purpose**: Identifies source IPs generating the most network packets over time
- **Expected**: 10.170.18.235 with 53 packets

### 3. Top Source IP by Bytes (`testTopSourceIPByBytes`)
```sql
source=nfw_logs | stats sum(`event.netflow.bytes`) as sum_bytes by span(`event.timestamp`, 2d) as timestamp_span, `event.src_ip` | rename `event.src_ip` as `Source IP` | sort - sum_bytes | head 10
```
- **Purpose**: Identifies source IPs generating the most network traffic by bytes over time
- **Expected**: 10.170.18.235 with 4142 bytes

### 4. Top Destination IP by Packets (`testTopDestinationIPByPackets`)
```sql
source=nfw_logs | stats sum(`event.netflow.pkts`) as packet_count by span(`event.timestamp`, 2d) as timestamp_span, `event.dest_ip` | rename `event.dest_ip` as `Destination IP` | sort - packet_count | head 10
```
- **Purpose**: Identifies destination IPs receiving the most packets over time
- **Expected**: 8.8.8.8 with 31 packets

### 4. Top Protocols (`testTopProtocols`)
```sql
source=nfw_logs | stats count() as Protocol by `event.proto` | sort - Protocol | head 1
```
- **Purpose**: Shows most common network protocols
- **Expected**: TCP with 10 occurrences

### 5. Top Application Protocols (`testTopApplicationProtocols`)
```sql
source=nfw_logs | stats count() as Protocol by `event.app_proto` | sort - Protocol | head 1
```
- **Purpose**: Shows most common application layer protocols
- **Expected**: unknown with 10 occurrences

### 6. Top Source Ports (`testTopSourcePorts`)
```sql
source=nfw_logs | stats count() as Count by `event.src_port` | sort - Count | head 1
```
- **Purpose**: Identifies most common source ports
- **Expected**: Port 37334 with 10 occurrences

### 7. Top Destination Ports (`testTopDestinationPorts`)
```sql
source=nfw_logs | stats count() as Count by `event.dest_port` | sort - Count | head 3
```
- **Purpose**: Identifies most common destination ports
- **Expected**: Various ports (4663, 7655, 11703) with 1 occurrence each

### 8. Top TCP Flags (`testTopTCPFlags`)
```sql
source=nfw_logs | stats count() as Count by `event.tcp.tcp_flags` | sort - Count | head 1
```
- **Purpose**: Shows distribution of TCP flags
- **Expected**: Flag "02" (SYN) with 10 occurrences

### 9. Top Flow IDs (`testTopFlowIDs`)
```sql
source=nfw_logs | stats count() as Count by `event.flow_id` | sort - Count | head 3
```
- **Purpose**: Shows flow ID distribution for connection tracking
- **Expected**: Unique flow IDs with 1 occurrence each

### 10. Top TCP Flows (`testTopTCPFlows`)
```sql
source=nfw_logs | where `event.proto` = 'TCP' | stats count() as Count by `event.src_ip`, `event.dest_ip` | sort - Count | head 1
```
- **Purpose**: Identifies most common TCP connections between source and destination
- **Expected**: 3.80.106.210 → 10.2.1.120 with 10 flows

### 11. Top Long-Lived TCP Flows (`testTopLongLivedTCPFlows`)
```sql
source=nfw_logs | WHERE `event.proto` = 'TCP' and `event.netflow.age` > 350 | STATS count() as Count by SPAN(`event.timestamp`, 2d), `event.src_ip`, `event.src_port`, `event.dest_ip`, `event.dest_port` | EVAL `Src IP:Port - Dst IP:Port` = CONCAT(`event.src_ip`, ": ", CAST(`event.src_port` AS STRING), " - ", `event.dest_ip`, ": ", CAST(`event.dest_port` AS STRING)) | SORT - Count | HEAD 10
```
- **Purpose**: Identifies TCP connections that have been active for more than 350 seconds
- **Expected**: Long-lived TCP flows with formatted source and destination information

### 12-37. Additional Comprehensive Tests
The remaining 26 tests cover:
- **Destination IP by Bytes** - Traffic volume analysis for destinations
- **Source-Destination Packet/Byte Analysis** - Combined flow analysis
- **TCP Flow Analysis by Packets/Bytes** - Detailed TCP connection metrics
- **Combined Packet and Byte Metrics** - Multi-dimensional traffic analysis
- **Infrastructure Analysis** - Firewall name and availability zone distribution
- **Event Type Analysis** - Netflow event categorization
- **TCP Flag Analysis** - SYN flag detection and analysis
- **Flow Characteristics** - Age and TTL analysis for network optimization

## Data Structure

### NFW Log Format (without aws.networkfirewall prefix)

The test data uses the real AWS Network Firewall log structure:

```json
{
"firewall_name": "NetworkFirewallSetup-firewall",
"availability_zone": "us-east-1a",
"event_timestamp": "1742422274",
"event": {
"src_ip": "3.80.106.210",
"dest_ip": "10.2.1.120",
"src_port": 37334,
"dest_port": 7655,
"proto": "TCP",
"app_proto": "unknown",
"event_type": "netflow",
"flow_id": 363840402826442,
"timestamp": "2025-03-19T22:11:14.249819+0000",
"netflow": {
"pkts": 1,
"bytes": 44,
"start": "2025-03-19T22:05:21.412393+0000",
"end": "2025-03-19T22:05:21.412393+0000",
"age": 0,
"min_ttl": 56,
"max_ttl": 56
},
"tcp": {
"tcp_flags": "02",
"syn": true
}
}
}
```

### Key Field Mappings

| Dashboard Field | Test Field | Type | Description |
|----------------|------------|------|-------------|
| Source IP | `event.src_ip` | keyword | Source IP address |
| Destination IP | `event.dest_ip` | keyword | Destination IP address |
| Source Port | `event.src_port` | integer | Source port number |
| Destination Port | `event.dest_port` | integer | Destination port number |
| Protocol | `event.proto` | keyword | Network protocol (TCP, UDP, ICMP) |
| App Protocol | `event.app_proto` | keyword | Application protocol |
| Packets | `event.netflow.pkts` | integer | Packet count |
| Bytes | `event.netflow.bytes` | integer | Byte count |
| TCP Flags | `event.tcp.tcp_flags` | keyword | TCP flag values |
| Flow ID | `event.flow_id` | long | Unique flow identifier |

## Test Data

The test uses 10 realistic NFW log records with:
- **Single Source IP**: 3.80.106.210 (external)
- **Single Destination IP**: 10.2.1.120 (internal)
- **Single Source Port**: 37334
- **Various Destination Ports**: 4663, 7655, 11703, etc.
- **Protocol**: All TCP traffic
- **TCP Flags**: All SYN packets (flag "02")
- **Consistent Packet/Byte Counts**: 1 packet, 44 bytes per flow

## Running NFW Tests

```bash
# Run all NFW dashboard tests
./gradlew :integ-test:test --tests "*NfwPplDashboardIT*"

# Run specific NFW test
./gradlew :integ-test:test --tests "*NfwPplDashboardIT.testTopSourceIPByPackets"
```

## Field Syntax

All NFW queries use clean field syntax without AWS prefixes:

- ✅ **Correct**: `event.src_ip`, `event.netflow.pkts`
- ❌ **Incorrect**: `aws.networkfirewall.event.src_ip`, `aws.networkfirewall.event.netflow.pkts`

This provides cleaner, more readable dashboard queries while maintaining full compatibility with AWS Network Firewall log structure.
Loading
Loading