SlideSpeak - Text to PPT & Speech Generator

Qualcomm Technologies x Northeastern University Hackathon

2025 March 15th - 16th

Team: NewTeamOne

Team members: Yuchen Jiang, Yuchen Li, Quancheng Li, Shi Zhang, Jiangtian Han

Intro

SlideSpeak is a Python-based AI tool designed for:

Converting text queries into PowerPoint presentations and speech transcripts.
Converting speech transcripts into audio files per slide for presentation rehearsal.

It utilizes local Large Language Models (LLMs) served by Ollama to generate presentation outlines and slide content. Additionally, it integrates a local Text to Speech (TTS) model to convert speech transcripts into audio files.

Text2PPT Live Demo Using NPU

Text2PPT Live Demo Using CPU

Text2Audio Live Demo Using Pyttsx3

Architecture

The following diagram illustrates the main components of SlideSpeak and their relationships:

graph LR
    A[main.py] --> B(pdf2final_list.py)
    A --> C(dictToPpt.py)
    A --> D[text2audio_pyttsx3.py]
    A --> E[text2audio_kokoro.py]
    B --> F(gpt.py)
    B --> G(speech_generator.py)
    C --> H[pptx library]
    F --> I[Ollama API]
    D --> J[pyttsx3 library]
    E --> K[kokoro library]

    style A fill:#f9f,stroke:#333,stroke-width:2px,color:#333333
    style B fill:#ccf,stroke:#333,stroke-width:2px,color:#333333
    style C fill:#ccf,stroke:#333,stroke-width:2px,color:#333333
    style D fill:#ccf,stroke:#333,stroke-width:2px,color:#333333
    style E fill:#ccf,stroke:#333,stroke-width:2px,color:#333333
    style F fill:#aaf,stroke:#333,stroke-width:2px,color:#333333
    style G fill:#aaf,stroke:#333,stroke-width:2px,color:#333333
    style H fill:#eee,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5,color:#333333
    style I fill:#eee,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5,color:#333333
    style J fill:#eee,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5,color:#333333
    style K fill:#eee,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5,color:#333333

    subgraph Application
        A
        B
        C
        D
        E
        F
        G
    end

    subgraph Libraries/APIs
        H
        I
        J
        K
    end

main.py: Orchestrates the entire process, calling other modules for transcript and outline generation, PowerPoint creation, and TTS engine selection.
pdf2final_list.py: Handles transcript and outline generation, utilizing gpt.py for LLM interactions and speech_generator.py for speech-related functionalities.
dictToPpt.py: Responsible for creating PowerPoint presentations using the pptx library.
text2audio_pyttsx3.py & text2audio_kokoro.py: Provide TTS engine options using pyttsx3 and kokoro libraries respectively.
gpt.py: Interacts with the Ollama API to leverage LLMs.
speech_generator.py: Contains tasks related to speech processing.

Prerequisites

Before running SlideSpeak, ensure the following prerequisites are met:

Ollama:
- Download and install Ollama from https://ollama.com/. Follow the installation instructions for your OS.
- Ensure Ollama is running in the background. Verify by running ollama list in your terminal to see installed models.
Python:
- Python 3.10 or higher is required. Download from https://www.python.org/.
Python Packages:
- Install necessary packages using pip. Virtual environment is recommended.
- Navigate to the project directory and run:
```
pip install python-pptx requests pywin32 kokoro soundfile numpy
```
Ollama Models:
- SlideSpeak uses qwen2.5:7b Ollama models. Pull these models from Ollama:
```
ollama pull qwen2.5:7b
```
Kokoro TTS Models:
- Download Kokoro TTS model files and place them in the project root directory.
- Download kokoro-v1.0.onnx from https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files-v1.0/kokoro-v1.0.onnx
- Download voices-v1.0.bin from https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files-v1.0/voices-v1.0.bin

Installation

Clone the repository:

git clone [repository_url]
cd SlideSpeak

Running the Application

SlideSpeak can be run in two modes:

Using the Graphical User Interface (GUI)

Run the gui.py script:
```
python gui.py
```
Enter comma-separated topics in the GUI and press Enter.
Find the generated PPTX presentation as PPT.pptx in the project directory.
Presentation outline and speech transcript are saved in the output directory.
Choose TTS engine in the GUI:
- text2audio_kokoro (Best Result): Uses Kokoro TTS for high-quality audio.
- text2audio_pyttsx3 (Fastest Result): Uses pyttsx3 for faster audio generation.

Using the Command Line Interface (CLI)

Run main.py to generate PPTX with predefined topics:
```
python main.py
```
The generated PPTX file (PPT.pptx) is saved in the project directory.

Notes

Model Selection: Modify Ollama models (qwen2.5:7b) in gpt.py if needed.
NPU Option: Users can also run this application on NPU via AnythingLLM. To do this, please following these steps:
1. Under AnythingLLM Settings, find LLM, select Qualcomm QNN as LLM Provider.
2. Download LLM that supports NPU.
3. Then move onto Developer API, replace the api key in gpt.py.
4. Find your workspace, or create one, replace it also in gpt.py, please note it should be all lower-cases.
5. Lastly, select NPU Option in Device Type under Generation Options when the User Interface is initialized.
Error Handling: Basic error handling is implemented; further improvements are possible.
Image Generation: Current version uses Ollama for image prompts. But the image functionality is not fully implemented and needs further development for full PPTX image insertion.

License

SlideSpeak is licensed under the Apache License 2.0. See the LICENSE file for more details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SlideSpeak - Text to PPT & Speech Generator

Table of Contents

Intro

Architecture

Prerequisites

Installation

Running the Application

Using the Graphical User Interface (GUI)

Using the Command Line Interface (CLI)

Notes

License

About

Releases

Packages

Contributors 5

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
output		output
screenshots		screenshots
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
addphoto.py		addphoto.py
dictToPpt.py		dictToPpt.py
gpt.py		gpt.py
gui.py		gui.py
main.py		main.py
pdf2final_list.py		pdf2final_list.py
speech_generator.py		speech_generator.py
test.py		test.py
text2audio_kokoro.py		text2audio_kokoro.py
text2audio_pyttsx3.py		text2audio_pyttsx3.py

License

zhangshi0512/SlideSpeak

Folders and files

Latest commit

History

Repository files navigation

SlideSpeak - Text to PPT & Speech Generator

Table of Contents

Intro

Architecture

Prerequisites

Installation

Running the Application

Using the Graphical User Interface (GUI)

Using the Command Line Interface (CLI)

Notes

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages