語音簡報生成與播放系統。 Voice presentation generation and playback system.
投票與課堂互動 / Poll & Classroom Interaction
- 新增全螢幕投票控制面板(鍵盤
P開關)及投票結果顯示切換按鈕。 Added fullscreen poll control panel (toggle withP) and a button to show/hide poll results. - 新增「互動模式」:每頁語音結束後自動暫停並啟動即時投票,設定持久化於瀏覽器。 Added "Interactive Mode": automatically pauses after each slide's audio and starts a live poll; setting is persisted in the browser.
AI 測驗 / AI Quiz
- 測驗問題 schema 更新,新增自動計分與 sync role 管理;測驗播放時自動進入全螢幕。 Updated quiz question schema with automatic scoring and sync role management; quiz playback auto-enters fullscreen.
AI 多頁新增 / AI Multi-Page Insertion
- 可在現有簡報中插入任意頁數:輸入大綱 → AI 展開 → 圖片 / 腳本 / 語音逐頁生成,支援即時進度與中途取消。 Insert any number of slides into an existing presentation: enter an outline → AI expands → images/scripts/audio generated per page, with live progress and mid-run cancellation.
認證與匯出入 / Auth & Import/Export
- Google OAuth 設定可於 SettingsPage 直接設定 Client ID / Secret / Redirect URI,無需重啟。 Google OAuth can be configured at runtime in SettingsPage (Client ID / Secret / Redirect URI) without restart.
- 首頁卡片一鍵 ZIP 匯出;首頁新增 ZIP 匯入(含進度條與取消)。 One-click ZIP export from home card; ZIP import added to homepage with progress bar and cancellation.
- 首頁類別旁新增標題關鍵字篩選欄位。 Added a keyword filter beside the category selector on the homepage.
AI 內容生成 / AI Content Generation
- OpenAI 腳本重寫改用 vision API,可參考投影片圖片重寫。 OpenAI script rewriting now uses the vision API, taking slide images into account.
- 新增每頁腳本長度限制設定。 Added per-page script length limit setting.
- YouTube 匯入 yt-dlp 執行與 Python 環境問題修正。 Fixed yt-dlp execution and Python environment issues for YouTube import.
- Docker 映像修正:於 runtime stage 安裝
python3與ca-certificates,確保 yt-dlp 在容器內正常執行。 Docker image fix: installpython3andca-certificatesin the runtime stage so yt-dlp works correctly inside the container.
語音合成 / Speech Synthesis
- Gemini TTS 雙人模式新增明確語音選項(Charon / Aoede 等),可指定 Speaker A / B 各自的聲音。 Gemini TTS dual-speaker mode now supports explicit voice selection (Charon, Aoede, etc.) for Speaker A and B independently.
- Gemini TTS 新增「單人旁白」模式,由單一旁白者朗讀整頁腳本。 Added "solo narrator" mode for Gemini TTS: a single narrator reads the full script.
LLM 支援 / LLM Support
- 新增 OpenRouter 作為 LLM 供應商,可在設定頁填入 API Key 與模型名稱切換;LLM 與 TTS 供應商設定分離。 Added OpenRouter as an LLM provider; configure API key and model name in Settings. LLM and TTS provider settings are now separate.
重生流程 / Regeneration
- 新增「改寫提示詞」重生模式:LLM 逐頁檢視現有圖片提示詞,根據使用者指示做最小修改,只對有變動的頁面重生圖片 / 腳本 / 語音,跳過其餘頁面以節省費用。 Added "rewrite prompts" regeneration mode: LLM reviews each page's image prompt and makes minimal edits per user instructions; only changed pages are regenerated (image/script/audio), skipping unchanged pages to save cost.
YouTube 匯入 / YouTube Import
- 新增下載與 STT 可見進度階段(
downloading_captions/downloading_audio/transcribing_audio)。 Added visible progress stages for download and STT (downloading_captions/downloading_audio/transcribing_audio). - 修正 VTT auto-caption 清理不足問題:去除 inline timing markers 與相鄰重複行;字幕輸入上限從 16K 提升至 64K,確保長影片全覆蓋。 Fixed insufficient VTT auto-caption cleaning: strip inline timing markers and deduplicate adjacent lines; caption input limit raised from 16K to 64K for full coverage of long videos.
手寫標注 / Handwriting Annotation
- 全螢幕模式下按
W開啟手寫工具列,可選 6 色、細 / 中 / 粗三檔,手寫內容定期存回伺服器,切換頁面自動回復。 PressWin fullscreen to open the handwriting toolbar; choose from 6 colors and 3 line widths. Strokes are saved to the server periodically and restored when returning to the same page. - 工具列提供三種模式切換:✏️ 筆(畫圖)、🖱️ 游標(點擊投影片仍可 pause/resume)、⬜ 橡皮擦(智慧型:碰到即移除整條筆劃)。 Three toolbar modes: ✏️ Pen (draw), 🖱️ Cursor (click-to-pause/resume still works), ⬜ Eraser (smart: touching a stroke removes the entire stroke instantly).
外觀主題 / Appearance Theme
- 設定頁新增「外觀主題」選項:跟隨系統 / 淺色 / 深色,切換後立即套用、不需按儲存;偏好存於本機(localStorage
makeslide.theme),不影響後端帳號設定。重新整理時會在首屏 paint 前套用,避免主題閃爍。深色模式為現行預設外觀。 Settings now include an "Appearance Theme" option (Follow system / Light / Dark) that applies instantly without saving; the preference is stored locally (localStoragemakeslide.theme) and does not touch backend account settings. It is applied before first paint on reload to avoid theme flash. Dark remains the current default look. - 實作以語意色彩 token(
bg/surface/text/muted/border/primary/danger,由 CSS variables + Tailwind 暴露)為基礎;首頁、設定頁、播放頁外層與 PDF 卡片已完成第一批適配,淺色模式的細部色階為漸進完善中。 Built on semantic colour tokens (bg/surface/text/muted/border/primary/danger, exposed via CSS variables + Tailwind). HomePage, Settings, the PlayPage shell, and PDF cards are adapted in the first batch; finer light-mode shades are being refined incrementally.
這個專案提供類似 NotebookLM 影片生成的工作流:可接受 PDF,抽取頁面圖片與文字,產生逐字稿,再合成語音。也支援以簡報大綱(例如 ChatGPT 產生、以 ## Slide XX 分頁)直接產圖、產稿、配音。
This project provides a NotebookLM-like video/presentation workflow: ingest PDF, extract page images and text, generate narration scripts, then synthesize speech. It also supports outline-driven generation (for example, ChatGPT output split by ## Slide XX) to create images, scripts, and audio.
請使用下面連結取得 demo
Please use the following links for online demo.
http://120.126.23.25:12345/notebook/aictest001/m4/
account/pass: demo/demo
You can use the docker image from docker hub wycca1/makeslide-app:v1.1.
- 使用者操作教學:
docs/userguide.mdUser guide for setup, upload workflows, playback, editing, and realtime poll. - 系統設計:
docs/design.mdArchitecture and processing pipeline design notes. - 錯誤碼與使用者提示:
docs/error-codes.mdError codes and recommended user-facing actions. - Pipeline 階段與頁面耗時:
docs/pipeline-stage-and-page-timing.mdTiming event and page duration reference.
mkdir -p home1
docker run -d -v home1:/home/jovyan/ -p 8888:8888 wycca1/makeslide:latest啟動後在瀏覽器開啟下列網址,並輸入 OpenAI API key 即可開始使用。 After starting the container, open the following URL in your browser and enter your OpenAI API key to start using the application.
http://127.0.0.1:8888/
makeslide/
├── backend/ # Fastify + TypeScript API server
├── frontend/ # React + Vite + Tailwind SPA
├── storage/ # 執行期產物(gitignored)/ runtime artifacts (gitignored)
│ └── <pdf_id>/
│ ├── source.pdf
│ ├── metadata.json
│ ├── cover.png
│ └── pages/
│ ├── 001.png
│ ├── 001.text.txt
│ └── ...
├── data/ # SQLite DB 檔案(gitignored)/ SQLite DB files (gitignored)
└── docs/
- Node.js 20 LTS+
- npm 10+
- poppler-utils(提供
pdftoppm與pdfinfo,供 PDF 轉圖) poppler-utils (providespdftoppmandpdfinfofor PDF-to-image rendering)- Ubuntu / Debian:
sudo apt-get install poppler-utils - macOS:
brew install poppler - 驗證 / Verify:
pdftoppm -v、pdfinfo -v
- Ubuntu / Debian:
若未安裝 poppler,backend 啟動時會警告,上傳 PDF 後會進入 failed,並在 error_message 說明。
If poppler is missing, backend will warn at startup; uploaded PDFs may enter failed with details in error_message.
-
建議使用 nvm(專案附
.nvmrc鎖定 Node 20)。 Use nvm (project includes.nvmrctargeting Node 20).curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.1/install.sh | bash # 開新 shell 後 / after opening a new shell: nvm install 20 && nvm use 20
-
複製環境變數並填入
OPENAI_API_KEY。 Copy env file and setOPENAI_API_KEY.cp .env.example .env # 編輯 .env / edit .env -
一鍵啟動(檢查 Node / poppler / .env,建立目錄,必要時安裝套件,啟動前後端)。 Start with one command (checks Node/poppler/.env, prepares directories, installs deps if needed, starts backend + frontend).
./start.sh
可用選項 / Available options:
./start.sh --help # 列出所有選項 / show all options
./start.sh --install # 強制重新 npm install / force npm install
./start.sh --clean # 清除 node_modules 後重裝並啟動 / reinstall from clean state
./start.sh --backend-only # 只啟動 backend / backend only
./start.sh --frontend-only # 只啟動 frontend / frontend only按 Ctrl+C 可優雅終止子程序。
Press Ctrl+C for graceful shutdown.
若不使用 start.sh:
If you do not use start.sh:
cp .env.example .env
npm install
npm run devnpm install 會安裝 backend/frontend(npm workspaces)。
npm install installs both backend and frontend via npm workspaces.
- Backend:
http://localhost:3000 - Frontend:
http://localhost:5173(dev server 會把/apiproxy 到 backend) Frontend dev server proxies/apito backend.
也可分別啟動 / You can also run separately:
npm run dev:backend
npm run dev:frontend| Method | Path | 說明(中文) | Description (EN) |
|---|---|---|---|
| GET | /api/health |
健康檢查 | Health check |
| POST | /api/pdfs |
上傳 PDF(multipart/form-data,欄位 file) |
Upload PDF (multipart/form-data, field file) |
| GET | /api/pdfs |
列出 PDF(含封面與進度) | List PDFs (with cover and progress) |
| GET | /api/pdfs/:id |
取得單筆詳情(含頁面資源 URL) | Get PDF detail (with page asset URLs) |
| GET | /api/pdfs/:id/cover |
取得封面縮圖 | Get cover thumbnail |
| GET | /api/pdfs/:id/pages/:n/image |
取得第 n 頁影像 | Get page image |
| GET | /api/pdfs/:id/pages/:n/text |
取得第 n 頁文字 | Get extracted page text |
| DELETE | /api/pdfs/:id |
刪除 PDF 與相關產物 | Delete PDF and related artifacts |
上傳 / Upload → awaiting_prompt(可選,等待使用者提示)
↓ start
uploaded → processing
↓
rendering / extracting_text / generating_script / synthesizing_audio / generating_title
↓
ready(或 failed)
- 背景佇列 / Queue:
p-queue(並行數由PROCESS_CONCURRENCY控制) Concurrency is controlled byPROCESS_CONCURRENCY. - 崩潰復原:啟動時會重新入列
uploaded/processing。 Crash recovery: re-enqueues PDFs inuploaded/processingon startup. - 同一
pdf_id不重複處理。 The samepdf_idis guarded against duplicate processing.
請見 .env.example。核心變數如下:
See .env.example. Core variables:
| 變數 / Variable | 預設 / Default | 說明(中文) | Description (EN) |
|---|---|---|---|
PORT |
3000 |
Fastify 監聽埠 | Fastify listening port |
STORAGE_ROOT |
./storage |
產物根目錄 | Artifact root directory |
DB_PATH |
./data/app.db |
SQLite 路徑 | SQLite DB path |
MAX_UPLOAD_MB |
50 |
上傳大小上限 | Upload size limit |
LOG_LEVEL |
info |
日誌等級 | Log level |
PROCESS_CONCURRENCY |
2 |
同時處理 PDF 數 | Number of PDFs processed concurrently |
RENDER_DPI |
150 |
PDF 轉圖解析度 | PDF rendering DPI |
POPPLER_BIN_PATH |
`` | poppler bin 目錄(空值表示用 PATH) | poppler bin directory (empty means use PATH) |
請見上方「文件導覽 / Documentation」。 See the “Documentation” section above.

