Qualcomm Technologies x Northeastern University Hackathon
2025 March 15th - 16th
Team: NewTeamOne
Team members: Yuchen Jiang, Yuchen Li, Quancheng Li, Shi Zhang, Jiangtian Han
SlideSpeak is a Python-based AI tool designed for:
- Converting text queries into PowerPoint presentations and speech transcripts.
- Converting speech transcripts into audio files per slide for presentation rehearsal.
It utilizes local Large Language Models (LLMs) served by Ollama to generate presentation outlines and slide content. Additionally, it integrates a local Text to Speech (TTS) model to convert speech transcripts into audio files.
Text2PPT Live Demo Using NPU
Text2PPT Live Demo Using CPU
Text2Audio Live Demo Using Pyttsx3
The following diagram illustrates the main components of SlideSpeak and their relationships:
graph LR
A[main.py] --> B(pdf2final_list.py)
A --> C(dictToPpt.py)
A --> D[text2audio_pyttsx3.py]
A --> E[text2audio_kokoro.py]
B --> F(gpt.py)
B --> G(speech_generator.py)
C --> H[pptx library]
F --> I[Ollama API]
D --> J[pyttsx3 library]
E --> K[kokoro library]
style A fill:#f9f,stroke:#333,stroke-width:2px,color:#333333
style B fill:#ccf,stroke:#333,stroke-width:2px,color:#333333
style C fill:#ccf,stroke:#333,stroke-width:2px,color:#333333
style D fill:#ccf,stroke:#333,stroke-width:2px,color:#333333
style E fill:#ccf,stroke:#333,stroke-width:2px,color:#333333
style F fill:#aaf,stroke:#333,stroke-width:2px,color:#333333
style G fill:#aaf,stroke:#333,stroke-width:2px,color:#333333
style H fill:#eee,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5,color:#333333
style I fill:#eee,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5,color:#333333
style J fill:#eee,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5,color:#333333
style K fill:#eee,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5,color:#333333
subgraph Application
A
B
C
D
E
F
G
end
subgraph Libraries/APIs
H
I
J
K
end
main.py
: Orchestrates the entire process, calling other modules for transcript and outline generation, PowerPoint creation, and TTS engine selection.pdf2final_list.py
: Handles transcript and outline generation, utilizinggpt.py
for LLM interactions andspeech_generator.py
for speech-related functionalities.dictToPpt.py
: Responsible for creating PowerPoint presentations using thepptx
library.text2audio_pyttsx3.py
&text2audio_kokoro.py
: Provide TTS engine options usingpyttsx3
andkokoro
libraries respectively.gpt.py
: Interacts with the Ollama API to leverage LLMs.speech_generator.py
: Contains tasks related to speech processing.
Before running SlideSpeak, ensure the following prerequisites are met:
-
Ollama:
- Download and install Ollama from https://ollama.com/. Follow the installation instructions for your OS.
- Ensure Ollama is running in the background. Verify by running
ollama list
in your terminal to see installed models.
-
Python:
- Python 3.10 or higher is required. Download from https://www.python.org/.
-
Python Packages:
- Install necessary packages using pip. Virtual environment is recommended.
- Navigate to the project directory and run:
pip install python-pptx requests pywin32 kokoro soundfile numpy
-
Ollama Models:
- SlideSpeak uses
qwen2.5:7b
Ollama models. Pull these models from Ollama:ollama pull qwen2.5:7b
- SlideSpeak uses
-
Kokoro TTS Models:
- Download Kokoro TTS model files and place them in the project root directory.
- Download
kokoro-v1.0.onnx
from https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files-v1.0/kokoro-v1.0.onnx - Download
voices-v1.0.bin
from https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files-v1.0/voices-v1.0.bin
- Clone the repository:
git clone [repository_url] cd SlideSpeak
SlideSpeak can be run in two modes:
- Run the
gui.py
script:python gui.py
- Enter comma-separated topics in the GUI and press Enter.
- Find the generated PPTX presentation as
PPT.pptx
in the project directory. - Presentation outline and speech transcript are saved in the
output
directory. - Choose TTS engine in the GUI:
- text2audio_kokoro (Best Result): Uses Kokoro TTS for high-quality audio.
- text2audio_pyttsx3 (Fastest Result): Uses
pyttsx3
for faster audio generation.
- Run
main.py
to generate PPTX with predefined topics:python main.py
- The generated PPTX file (
PPT.pptx
) is saved in the project directory.
-
Model Selection: Modify Ollama models (
qwen2.5:7b
) ingpt.py
if needed. -
NPU Option: Users can also run this application on NPU via AnythingLLM. To do this, please following these steps:
- Under AnythingLLM Settings, find LLM, select
Qualcomm QNN
as LLM Provider. - Download LLM that supports NPU.
- Then move onto Developer API, replace the api key in
gpt.py
. - Find your workspace, or create one, replace it also in
gpt.py
, please note it should be all lower-cases. - Lastly, select NPU Option in
Device Type
underGeneration Options
when the User Interface is initialized.
- Under AnythingLLM Settings, find LLM, select
-
Error Handling: Basic error handling is implemented; further improvements are possible.
-
Image Generation: Current version uses Ollama for image prompts. But the image functionality is not fully implemented and needs further development for full PPTX image insertion.
SlideSpeak is licensed under the Apache License 2.0. See the LICENSE
file for more details.