A multi-threaded desktop application utilizing
customtkinterfor its graphical interface and integrating advanced machine learning components for predictive modeling and natural language processing. The application is designed to ingest continuous audio streams, parse medical context, and evaluate dosage protocols using pre-trained neural networks.
The software stack integrates two distinct machine learning paradigms operating in parallel:
graph TD
A[Main UI Thread: customtkinter] -->|Spawn Thread| B(AudioToTextRecorder: RealtimeSTT)
B -->|Transcribed Text| C{ollama.ChatResponse}
C -->|llama3.2 Local LLM| D[Extract Medical Variables]
D -->|Input Tensors| E(precisedose_nn_model.onnx)
E -->|Forward Pass via weights.h5| F[Predict Dosage Floats]
-
Audio-to-Text Pipeline (
RealtimeSTT)
The application instantiates anAudioToTextRecorderinstance. This class establishes a continuous listening thread on the default system microphone, utilizing Voice Activity Detection (VAD) to segment audio streams. Once a voice segment is recorded, it transcribes the waveform into text strings utilizing embedded acoustic models (e.g., Whisper), bypassing the need for cloud API reliance and ensuring zero-latency offline transcription. -
Large Language Model Execution (
ollama)
The transcribed string is pushed to an event loop where theollama.ChatResponsemodule interfaces with a localized LLM daemon. By specifying thellama3.2model architecture in the chat function parameters, the system prompts the LLM to structure the arbitrary text input into actionable medical variables or conversational feedback. -
Predictive Neural Network (
precise_nn_model)
Located in theprecise_nn_modeldirectory, the system loads a pre-compiled ONNX format graph (precisedose_nn_model.onnx). Patient physiological variables parsed by the LLM are transformed into normalized input tensors. The ONNX runtime executes a forward pass through the dense layers of the model, utilizing specific weight matrices (weights.h5) to generate float output arrays representing targeted medication dosages.
- The graphical frontend runs on the main Tkinter thread. It utilizes a layered Canvas structure where background static images are algorithmically tinted using Pillow's
RGBTransform().mix_with()function, effectively multiplying the source pixels with aprimary_colorhex value defined in thesettings.json. - A
tkVideoPlayerinstance is bound to the canvas for rendering animated states. A recursiveafter()method loop handles seeking and playback.
Tip
To prevent the computationally expensive STT and LLM inferences from blocking the UI thread's event loop, these operations are strictly encapsulated in asyncio coroutines.
-
Clone & Install Dependencies Initialize your environment and install the required packages:
pip install customtkinter tkvideoplayer Pillow ollama RealtimeSTT colorama termcolor
-
LLM Initialization Ensure the Ollama daemon is running locally with the required model pulled:
ollama pull llama3.2
-
Launch the Application Run the launcher to initiate the splash screen sequence, or directly execute
main.py:python launcher.py
Note
No Artificial Intelligence or automated code generation tools were utilized in the programming of this project. The entire codebase, including logic, UI design, and model integration workflows, was written manually by hand.