Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions research-sentry/.env.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# TinyFish API key — https://agent.tinyfish.ai/api-keys
TINYFISH_API_KEY=your-tinyfish-api-key

# OpenAI API key (LLM + Whisper transcription) — https://platform.openai.com/api-keys
OPENAI_API_KEY=your-openai-api-key
7 changes: 0 additions & 7 deletions research-sentry/.env.local.example

This file was deleted.

4 changes: 3 additions & 1 deletion research-sentry/.gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
node_modules
.next
.env*.local
.env
.env.local
.vercel
*.tsbuildinfo
next-env.d.ts
270 changes: 138 additions & 132 deletions research-sentry/README.md
Original file line number Diff line number Diff line change
@@ -1,176 +1,182 @@
# Research Sentry
**Live: https://cookbook-research-sentry.vercel.app/**

**A voice-first academic research co-pilot** that scans live portals (ArXiv, PubMed, Semantic Scholar, IEEE Xplore, Google Scholar, SSRN, CORE, DOAJ) to assemble verified paper metadata and summaries. It uses the **TinyFish Web Agent** to automate multi-step portal navigation and extract structured results in real time.
**Voice-first academic research co-pilot — AI agents scrape 8 live research portals in parallel and assemble verified paper metadata in real time.**

Live: https://cookbook-research-sentry.vercel.app/
Speak or type a research query. Research Sentry parses your intent, dispatches one TinyFish browser agent per academic portal simultaneously, aggregates and deduplicates the results, and streams them back as each portal completes. Then ask follow-up questions, compare papers side-by-side, track citations, or export BibTeX.

Demo video: https://cookbook-research-sentry.vercel.app/

---
## Architecture

## How It Works
```
┌─────────────────────────────────────────────────────────────┐
│ Browser (Client) │
│ │
│ VoiceRecorder / SearchInterface → ResultsGrid │
│ ConversationInterface → PaperComparison → CitationTracker │
│ TinyFishAgentTerminal (live agent log) │
└──────────────────────────┬──────────────────────────────────┘
┌────────────┼─────────────┐
▼ ▼ ▼
/api/search/text /api/search/voice /api/compare
/api/summarize /api/conversation /api/citations/track
/api/export/bibtex
┌─────────────────────────────────────────────────────────────┐
│ lib/tinyfish.ts │
│ │
│ runTinyFishAutomation(url, goal, stealth?) │
│ Throws TinyFishError with typed codes: │
│ MISSING_API_KEY | RUN_FAILED | TIMEOUT | │
│ STREAM_ERROR | NO_RESULT │
│ │
│ client.agent.stream({ url, goal, browser_profile }) │
│ onComplete → RunStatus.COMPLETED → return result │
│ → RunStatus.FAILED → throw RUN_FAILED │
└──────────────────────────┬──────────────────────────────────┘
│ Promise.allSettled (x8 parallel)
┌─────────┬───────┼────────┬─────────┐
▼ ▼ ▼ ▼ ▼
ArXiv PubMed Semantic Google IEEE
Scholar Scholar Xplore
+ SSRN + CORE + DOAJ

Google Scholar + IEEE Xplore use browser_profile: 'stealth'
```

1. **Voice / text input** -- speak or type your research query.
2. **GPT-4o parses intent** -- OpenAI extracts topic, keywords, and target sources from your query.
3. **TinyFish agents scrape 8 academic portals in parallel** -- each portal gets its own headless browser session via the TinyFish API.
4. **Results aggregated & deduplicated** -- papers from every source are merged, normalized, and ranked by citation count.
5. **Summarize, compare, export** -- ask follow-up questions, compare papers side-by-side, track citations, or export BibTeX.
### OpenAI usage

---
```
lib/intent-parser.ts → parse topic, keywords, sources from query
lib/summarizer.ts → summarize individual papers
lib/comparator.ts → structured methodology/results comparison
lib/conversation.ts → conversational follow-up answers
lib/citation-tracker.ts → citation velocity and impact prediction
lib/whisper.ts → speech-to-text via OpenAI's Whisper endpoint
```

## Key Features

- **Voice input** -- record a question and Whisper transcribes it into a search query.
- **Multi-source search** -- scrapes ArXiv, PubMed, Semantic Scholar, Google Scholar, IEEE Xplore, SSRN, CORE, and DOAJ simultaneously.
- **Paper comparison** -- select papers and get a structured methodology/results comparison via GPT-4o.
- **Citation tracking** -- monitor a paper's citation velocity and predicted impact.
- **BibTeX export** -- download selected papers as a `.bib` file.
- **Conversational follow-ups** -- ask the AI assistant questions about your results.

---

## TinyFish API Usage

The core integration lives in `lib/tinyfish.ts`. Here is the SSE call that drives every search:

```ts
const res = await fetch("https://agent.tinyfish.ai/v1/automation/run-sse", {
method: "POST",
headers: {
"X-API-Key": process.env.TINYFISH_API_KEY!,
"Content-Type": "application/json",
},
body: JSON.stringify({
url,
goal,
browser_profile: stealth ? "stealth" : "lite",
}),
});

// Parse the SSE stream
const reader = res.body!.getReader();
const decoder = new TextDecoder();
let buffer = "";
let result = null;

while (true) {
const { done, value } = await reader.read();
if (done) break;
buffer += decoder.decode(value, { stream: true });
const lines = buffer.split("\n");
buffer = lines.pop() || "";
for (const line of lines) {
if (line.startsWith("data: ")) {
const event = JSON.parse(line.slice(6));
if (event.type === "COMPLETE") result = event.resultJson;
}
}
}
```
- **Voice input** — record a question, OpenAI Whisper transcribes it into a search query
- **Multi-source search** — 8 portals scraped simultaneously: ArXiv, PubMed, Semantic Scholar, Google Scholar, IEEE Xplore, SSRN, CORE, DOAJ
- **Paper comparison** — structured methodology/results comparison across selected papers
- **Citation tracking** — monitor citation velocity and predicted impact
- **BibTeX export** — download selected papers as a `.bib` file
- **Conversational follow-ups** — ask the AI assistant questions about your results
- **Live agent terminal** — watch each TinyFish agent's progress in real time

---
## Scraping Flow

## Tech Stack
1. User speaks or types a research query
2. OpenAI (`intent-parser.ts`) extracts topic, keywords, and target sources
3. One TinyFish agent fires per portal — all in parallel via `Promise.allSettled`
4. Each agent navigates the portal's live DOM with a tight, focused goal prompt
5. Results stream back to the aggregator as each agent completes
6. `aggregator.ts` deduplicates and ranks by citation count
7. Results appear in the UI as portals finish — no waiting for the slowest one

| Layer | Technology |
|-------|-----------|
| Framework | Next.js 14 (App Router) |
| Web scraping | TinyFish API (SSE) |
| LLM | OpenAI GPT-4o |
| Speech-to-text | OpenAI Whisper |
| Styling | Tailwind CSS |
| Icons | Lucide React |
## Setup

---
### Prerequisites

## Setup
- Node.js 18+
- TinyFish API key
- OpenAI API key

### Environment Variables

```bash
# 1. Install dependencies
npm install
cp .env.example .env.local
```

# 2. Create your env file
cp .env.local.example .env.local
Then fill in:

# 3. Add your API keys to .env.local
# TINYFISH_API_KEY -- get one at https://tinyfish.ai
# OPENAI_API_KEY -- get one at https://platform.openai.com
```env
# TinyFish (required) — https://agent.tinyfish.ai/api-keys
TINYFISH_API_KEY=your-tinyfish-key-here

# 4. Start the dev server
npm run dev
# OpenAI (required) — https://platform.openai.com/api-keys
OPENAI_API_KEY=your-openai-api-key
```

Open http://localhost:3000 to use the app.
### Install & Run

```bash
npm install
npm run dev
```

---
Open http://localhost:3000

## Folder Structure
## Project Structure

```
research-sentry/
├── app/
│ ├── api/
│ │ ├── citations/track/route.ts # Citation velocity analysis
│ │ ├── compare/route.ts # Paper comparison endpoint
│ │ ├── conversation/route.ts # Conversational follow-ups
│ │ ├── emails/extract/route.ts # Author email extraction
│ │ ├── export/bibtex/route.ts # BibTeX export
│ │ ├── health/route.ts # Health check
│ │ ├── search/text/route.ts # Text search endpoint
│ │ ├── search/voice/route.ts # Voice search endpoint
│ │ └── summarize/route.ts # Paper summarization
│ ├── globals.css
│ ├── layout.tsx
│ └── page.tsx # Main UI
│ ├── page.tsx # Main UI
│ ├── globals.css
│ └── api/
│ ├── citations/track/route.ts # Citation velocity analysis
│ ├── compare/route.ts # Paper comparison
│ ├── conversation/route.ts # Conversational follow-ups
│ ├── emails/extract/route.ts # Author email extraction
│ ├── export/bibtex/route.ts # BibTeX export
│ ├── health/route.ts # Health check
│ ├── search/text/route.ts # Text search
│ ├── search/voice/route.ts # Voice search
│ └── summarize/route.ts # Paper summarization
├── components/
│ ├── SearchInterface.tsx
│ ├── VoiceRecorder.tsx
│ ├── ResultsGrid.tsx
│ ├── PaperCard.tsx
│ ├── PaperComparison.tsx
│ ├── PaperSummary.tsx
│ ├── CitationTracker.tsx
│ ├── ConversationInterface.tsx
│ ├── CoPilotMode.tsx
│ ├── WorkflowSelector.tsx
│ ├── TinyFishAgentTerminal.tsx # Live agent log display
│ ├── ErrorMessage.tsx
│ ├── LoadingSpinner.tsx
│ ├── PaperCard.tsx
│ ├── PaperComparison.tsx
│ ├── PaperSummary.tsx
│ ├── ResultsGrid.tsx
│ ├── SearchInterface.tsx
│ ├── TinyFishAgentTerminal.tsx # Live agent log display
│ ├── VoiceRecorder.tsx
│ └── WorkflowSelector.tsx
│ └── LoadingSpinner.tsx
├── hooks/
│ └── useVoiceCommands.ts
├── lib/
│ ├── aggregator.ts # Deduplication & ranking
│ ├── tinyfish.ts # TinyFish agent client (typed errors)
│ ├── intent-parser.ts # OpenAI — query intent parsing
│ ├── summarizer.ts # OpenAI — paper summarization
│ ├── comparator.ts # OpenAI — paper comparison
│ ├── conversation.ts # OpenAI — conversational follow-ups
│ ├── citation-tracker.ts # OpenAI — citation velocity
│ ├── whisper.ts # OpenAI Whisper — speech-to-text
│ ├── aggregator.ts # Deduplication & ranking
│ ├── search.ts # Multi-source search orchestration
│ ├── workflows.ts
│ ├── audio-utils.ts
│ ├── citation-tracker.ts
│ ├── comparator.ts
│ ├── conversation.ts
│ ├── email-utils.ts
│ ├── intent-parser.ts # GPT-4o query parsing
│ ├── pdf-utils.ts
│ ├── search.ts # Multi-source search engine
│ ├── summarizer.ts
│ ├── tinyfish.ts # TinyFish SSE client
│ ├── types.ts
│ └── workflows.ts
└── .env.local.example
│ └── types.ts
├── .env.example
└── package.json
```

---
## Constraint Checklist

## Architecture
| Constraint | Status |
|---|---|
| External database used? | NO (pure in-memory) |
| Scraping parallel? | YES (`Promise.allSettled` across 8 portals) |
| Bot-protected sites handled? | YES (Google Scholar + IEEE use `browser_profile: 'stealth'`) |
| SDK errors surfaced? | YES (typed `TinyFishError` with code — no silent `null` returns) |
| Voice input? | YES (OpenAI Whisper transcription) |
| BibTeX export? | YES |

```mermaid
graph TD
User((User)) -->|Voice/Text| UI[Search Interface]
UI -->|Intent| Parser[Intent Parser GPT-4o]
Parser -->|Plan| Engine[Search Engine]
Engine -->|Dispatch| Agent1[TinyFish Agent: ArXiv]
Engine -->|Dispatch| Agent2[TinyFish Agent: PubMed]
Engine -->|Dispatch| Agent3[TinyFish Agent: Scholar]
Agent1 -->|Scraping| Web[Live Web DOM]
Agent2 -->|Scraping| Web
Agent3 -->|Scraping| Web
Web -->|Result| Aggregator[Synthesis & Deduplication]
Aggregator -->|JSON Payload| UI
UI -->|Visuals| Terminal[Live Log Terminal]
```
## Tech Stack

- **Framework:** Next.js (App Router), TypeScript, Tailwind CSS
- **Browser Agents:** TinyFish SDK (`client.agent.stream`)
- **LLM:** OpenAI (gpt-4o-mini) + Speech-to-text: OpenAI Whisper
- **Icons:** Lucide React
- **Deployment:** Vercel
4 changes: 2 additions & 2 deletions research-sentry/app/api/citations/track/route.ts
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
import { NextRequest, NextResponse } from 'next/server';
import { analyzeCitationTrend } from '@/lib/citation-tracker';
import { analyzeCitationNetwork } from '@/lib/citation-tracker';

export async function POST(req: NextRequest) {
try {
Expand All @@ -9,7 +9,7 @@ export async function POST(req: NextRequest) {
return NextResponse.json({ error: 'Paper data required' }, { status: 400 });
}

const trackedData = await analyzeCitationTrend(paper);
const trackedData = await analyzeCitationNetwork(paper);

// In a real app, we would save this to a database here

Expand Down
13 changes: 10 additions & 3 deletions research-sentry/app/api/conversation/route.ts
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
import { NextRequest, NextResponse } from 'next/server';
import { generateConversationResponse } from '@/lib/conversation';
import { continueConversation, Message } from '@/lib/conversation';

export const maxDuration = 60;

Expand All @@ -11,9 +11,16 @@ export async function POST(req: NextRequest) {
return NextResponse.json({ error: 'Invalid history format' }, { status: 400 });
}

const response = await generateConversationResponse(history, context);
// Build messages array — prepend context as system message if provided
const messages: Message[] = [];
if (context) {
messages.push({ role: 'system', content: `Research context: ${JSON.stringify(context)}` });
}
messages.push(...(history as Message[]));

const response = await continueConversation(messages);

return NextResponse.json(response);
return NextResponse.json({ response });
} catch (error) {
console.error('Conversation API Error:', error);
return NextResponse.json({ error: 'Failed to generate response' }, { status: 500 });
Expand Down
Loading