```markdown
This repository provides a solution to analyze conversation quality using a combination of AWS S3 storage, PySpark data processing, and OpenAI's GPT models. The conversations are fetched from AWS S3, processed, and then evaluated using a pre-defined scoring mechanism with the aid of a GPT model.
- Session-based Message Extraction: Filter and organize messages by session IDs.
- Conversation Fetching from S3: Extract and preprocess conversation data directly from S3 URIs.
- Transcript Generation: Convert conversations into a readable transcript format.
- GPT-based Scoring: Use a GPT model to calculate the score for a conversation's answer based on a system message.
- AWS Glue ETL Integration: Integrate with AWS Glue for ETL processes.
- Python 3.x
- AWS Account with access to S3
- boto3
- PySpark
- OpenAI's GPT model access
- Clone the repo: ```bash git clone [your-repo-url] ```
- Navigate to the directory: ```bash cd [repo-name] ```
- Install the required packages: ```bash pip install -r requirements.txt ```
- Configure AWS credentials and specify the relevant S3 URI paths within the code.
- Adjust
system_message
, client information, and model name as necessary. - Run the script. It will process the conversation data, score it, and start an AWS Glue ETL process (you'll need to complete the function for this).
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
This project is licensed under the MIT License.
- OpenAI for GPT models.
- AWS for cloud and data services.