A communication bridge that reads between the lines β turning unspoken feelings into shareable art.
Sumsori analyzes what you say (or write) and uncovers what you really mean. It transforms the gap between spoken words and hidden feelings into expressive illustrations, helping people communicate emotions they struggle to put into words.
Built for the Google DeepMind Gemini 3 Seoul Hackathon (Feb 28, 2026)
Sumsori serves as a communication assistant β not an emotion analyzer, but a tool that helps people deliver the feelings they can't express directly.
Record a short voice message (2β30 seconds). Sumsori listens to how you speak (tone, pace, tremor, pauses) alongside what you say (words, themes, sentiment), then reveals the real feeling underneath.
Example:
- You say: "I'm fine, really. Eating well, doing great."
- Your voice says: Trembling, slow, fragile
- What's really inside: Loneliness
- Result:
#contentment β #lonelinesswith an illustration of quiet solitude
Designed with deaf and hard-of-hearing users in mind. Type what you want to say, choose a voice persona, and Sumsori will:
- Analyze the surface meaning vs. hidden feeling in your words
- Generate a spoken audio version (TTS) with emotional delivery
- Create an illustration that captures the true feeling
This gives people who communicate primarily through text a way to convey emotional nuance that written words alone can't carry.
Every analysis produces a card you can send to someone:
- An AI-generated illustration (oil pastel style, featuring a small white cat)
- Emotion tags showing the gap:
#surface β #core - Your personal message to the recipient
- Optional: include your original transcript
The recipient sees the art, feels the emotion, and reads your words β without raw analysis data getting in the way.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β USER INPUT β
β β
β Voice Recording OR Written Text β
ββββββββββββββββ¬βββββββββββββββββββββββ¬ββββββββββββββββ
β β
βΌ βΌ
ββββββββββββββββββββββββ ββββββββββββββββββββββββββββ
β Gemini 2.5 Flash β β Gemini 2.5 Flash β
β β β β
β β’ Transcribe audio β β β’ Analyze surface meaningβ
β β’ Analyze voice tone β β β’ Uncover hidden feeling β
β β’ Analyze word meaningβ β β’ Generate TTS audio β
β β’ Find concordance β β (with emotional tone) β
β β’ Identify core β β β’ Find concordance β
β emotion β β β’ Identify core emotion β
ββββββββββββ¬ββββββββββββ ββββββββββββββ¬ββββββββββββββ
β β
βΌ βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Nano Banana 2 (Image Gen) β
β β
β Structured prompt β oil pastel illustration β
β Small white cat in emotion-matched scene β
ββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β SHAREABLE CARD β
β β
β Illustration + #tags + personal message β
β Stored in Supabase, shared via unique URL β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
| Layer | Technology |
|---|---|
| Framework | Next.js 16 (App Router, React 19) |
| Language | TypeScript 5.9 |
| Styling | Tailwind CSS 4 with CSS variables |
| AI Models | Gemini 2.5 Flash (analysis), Nano Banana 2 (image gen), Gemini TTS Preview |
| Database | Supabase (PostgreSQL + Storage) |
| Auth | NextAuth v5 with Kakao OAuth |
| State | Zustand |
| i18n | HUA Framework (@hua-labs/hua) β Korean & English |
| UI Kit | @hua-labs/ui, Phosphor Icons |
| Fonts | Gowun Batang (KR serif), Lora (EN serif), Pretendard (sans) |
| Deploy | Google Cloud Run (Docker, Seoul region) |
| PWA | Standalone web app with offline detection |
- Voice analysis β acoustic tone + word meaning + concordance detection
- Text analysis β surface vs. hidden emotion with reasoning
- AI illustration β emotion-driven oil pastel art (cat character, never front-facing)
- TTS synthesis β emotionally directed text-to-speech with 4 voice personas
- Shareable cards β public URLs with OG metadata for social previews
- Text mode for deaf/HoH users β full analysis pipeline without audio input
- Generated voice output β hear your written words spoken with emotional nuance
- Bilingual β full Korean and English support with one-tap switching
- Recording preview β listen before sending (stop β preview β submit)
- Demo mode β try with pre-generated samples, no API calls needed
- PWA β installable on mobile, standalone display
- Dark mode β warm chocolate brown theme, auto-detected
- Pill buttons, serif typography β warm, approachable design language
sumsori/
βββ app/
β βββ page.tsx # Voice recording + analysis
β βββ text/page.tsx # Text input + analysis
β βββ my/page.tsx # User's saved cards
β βββ card/[id]/page.tsx # Public shareable card
β βββ layout.tsx # Root layout (fonts, theme, PWA)
β βββ globals.css # Design system (CSS variables)
β βββ api/
β βββ analyze/ # Voice analysis pipeline
β βββ text-analyze/ # Text analysis + TTS pipeline
β βββ card/message/ # Save personal message
β βββ cards/ # Fetch user's cards
β βββ auth/[...nextauth]/ # OAuth endpoints
β βββ manifest/ # PWA manifest
βββ components/
β βββ Header.tsx # Logo + theme/language toggles
β βββ BottomNav.tsx # Mobile navigation
β βββ LoginModal.tsx # Kakao login bottom sheet
β βββ LanguageToggle.tsx # KO/EN switcher
βββ hooks/
β βββ useAudioRecorder.ts # WebRTC recording + waveform
βββ lib/
β βββ prompts/ # Gemini analysis prompts (KO/EN)
β βββ demo/ # Pre-generated demo bundles
β βββ gemini.ts # Gemini API client config
β βββ supabase.ts # Supabase client (lazy init)
β βββ auth.ts # NextAuth Kakao config
β βββ audio-utils.ts # PCM β WAV conversion
β βββ types.ts # TypeScript interfaces
βββ translations/
β βββ ko/common.json # Korean translations
β βββ en/common.json # English translations
βββ public/
β βββ demo/ # Pre-generated demo assets
β βββ fonts/ # Lora serif subset (woff2)
β βββ icons/ # PWA icons (72β512px)
β βββ images/ # Logo SVG
βββ scripts/
β βββ deploy-gcp.sh # Cloud Run deployment
β βββ generate-text-demos.ts # Generate demo TTS + images
βββ Dockerfile # Multi-stage Docker build
βββ package.json
| Model | Purpose | Why |
|---|---|---|
gemini-2.5-flash |
Voice/text analysis | Fast, supports audio input, structured JSON output |
gemini-3.1-flash-image-preview |
Image generation (Nano Banana 2) | High quality oil pastel illustrations |
gemini-2.5-flash-preview-tts |
Text-to-speech | Emotionally directed speech with voice personas |
The analysis prompts are carefully engineered to:
- Separate channels β voice acoustics are analyzed independently from word meaning
- Detect concordance β find the gap (or alignment) between what's said and how it's said
- Use nuanced vocabulary β Korean emotion words like μμ΄ν¨ (hurt mixed with disappointment), 체λ (resigned acceptance), μ΅μΈν¨ (feeling wronged) that have no direct English equivalent
- Generate structured image prompts β scene, pose, color palette, and lighting are all emotion-mapped to produce consistent, expressive illustrations
Every illustration follows strict constraints:
- Oil pastel and crayon on textured paper (never photorealistic or digital-looking)
- One small round white cat as the sole character
- Always from behind or three-quarter back view (never front-facing)
- Scene and color palette matched to the detected emotion
- No text, letters, human faces, or other animals
- Node.js 22+
- pnpm
- Gemini API key (with Tier 1+ billing)
- Supabase project (PostgreSQL + Storage)
- Kakao Developer app (for OAuth)
# Gemini
GEMINI_API_KEY=your_gemini_api_key
# Supabase
NEXT_PUBLIC_SUPABASE_URL=https://your-project.supabase.co
SUPABASE_SERVICE_ROLE_KEY=your_service_role_key
# Auth
KAKAO_CLIENT_ID=your_kakao_client_id
KAKAO_CLIENT_SECRET=your_kakao_client_secret
NEXTAUTH_URL=http://localhost:3000
NEXTAUTH_SECRET=your_nextauth_secretCreate the required storage buckets and database table:
-- Storage buckets (run via Supabase dashboard or API)
-- card-images: public, 10MB limit
-- card-audio: public, 10MB limit
-- Cards table
CREATE TABLE cards (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id TEXT,
nickname TEXT,
voice_tone JSONB NOT NULL,
text_content JSONB NOT NULL,
concordance JSONB NOT NULL,
core_emotion TEXT NOT NULL,
summary TEXT NOT NULL,
image_url TEXT NOT NULL,
image_prompt JSONB,
personal_message TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
audio_url TEXT,
input_mode TEXT DEFAULT 'voice',
surface_emotion JSONB,
hidden_emotion JSONB,
show_transcript BOOLEAN DEFAULT false
);
-- Enable Row Level Security
ALTER TABLE cards ENABLE ROW LEVEL SECURITY;
-- Allow public read for shared cards
CREATE POLICY "Public read" ON cards FOR SELECT USING (true);
-- Allow service role full access
CREATE POLICY "Service role full access" ON cards FOR ALL USING (true);pnpm install
pnpm devpnpm build
pnpm start# Set your GCP project
gcloud config set project YOUR_PROJECT_ID
# Enable required APIs
gcloud services enable run.googleapis.com artifactregistry.googleapis.com cloudbuild.googleapis.com
# Deploy from source
gcloud run deploy sumsori \
--source . \
--region asia-northeast3 \
--allow-unauthenticated \
--port 8080 \
--memory 512Mi \
--set-env-vars "GEMINI_API_KEY=...,NEXT_PUBLIC_SUPABASE_URL=...,..."Or use the helper script:
GCP_PROJECT_ID=your-project-id ./scripts/deploy-gcp.shSumsori includes pre-generated demo bundles for both voice and text modes, so users can experience the full flow without consuming API quota. Demo assets include:
- Voice demos: 4 Korean + 5 English scenarios with pre-generated illustrations
- Text demos: 3 Korean + 3 English scenarios with pre-generated TTS audio + illustrations
To regenerate text demo assets:
npx tsx scripts/generate-text-demos.tsSumsori is not a clinical emotion analyzer. It's a communication bridge.
The app exists for moments when words aren't enough β when you want to tell someone how you feel but can't find the right way to say it. Sumsori listens to the gap between what you say and what you mean, then translates that gap into something visual and shareable.
The illustrations use a small white cat seen from behind or in three-quarter back view (a hint of one eye is OK, but never fully front-facing), because:
- It's universal β not tied to any specific person or identity
- The back/side view invites projection β the viewer fills in the emotion
- Oil pastel style feels handmade, warm, and personal
- The cat is always alone in a scene, mirroring the solitude of unexpressed feelings
The accessibility-first text mode ensures that everyone β including deaf and hard-of-hearing users β can use this communication tool. The TTS output gives written words an emotional voice they wouldn't otherwise have.
Sumsori is built on top of HUA Framework, an open-source UI and i18n toolkit developed by HUA Labs prior to this hackathon:
@hua-labs/huaβ i18n provider, theme system, and shared utilities@hua-labs/uiβ component library (Avatar, Popover, overlays, etc.)
These packages are pre-existing open-source libraries and were not built for this hackathon. Everything else β the Gemini integration, analysis prompts, voice/text pipelines, image generation, TTS, demo system, and the Sumsori application itself β was built during the hackathon.
Built by HUA Labs for the Google DeepMind Gemini 3 Seoul Hackathon.
This project was created for a hackathon and is not currently licensed for redistribution.