From 50fa5f2c366b668d9bfc314d219162c8357cc6b2 Mon Sep 17 00:00:00 2001 From: heznpc Date: Sat, 11 Apr 2026 20:09:44 +0900 Subject: [PATCH 1/2] feat: insights-parser tests + EMPTY_DATA + README replace MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - parser.ts: surface EMPTY_DATA when the zip validates as an Instagram export but parseable records are all empty. The UI now tells the user "this looks like an IG export but the format may have changed" instead of rendering a blank dashboard. Applied to both parseInstagramZip and parseFileFull. - parser.test.ts: the "empty followers_and_following directory" case flipped — it now asserts EMPTY_DATA rather than accepting a zero-result success. - locales/{en,ko}.json: new EMPTY_DATA error copy, framed as a call to action ("request a fresh export, open an issue on GitHub"). - insights-parser.test.ts (new): regression guards for the fragile HTML parsers — likedPosts / savedPosts / profileSearches / wordSearches / loginActivity / chatList — with both KO and EN label fixtures so a class-name or locale change in Instagram's export is caught before hitting users. Fixtures intentionally ship the extra wrapper noise IG emits. - README.md: replace the create-next-app template with an actual description of what followprint does, how to get the IG export, how to run tests, character classification table, and the Instagram-format-change playbook. --- README.md | 86 ++++++--- src/lib/__tests__/insights-parser.test.ts | 205 ++++++++++++++++++++++ src/lib/__tests__/parser.test.ts | 13 +- src/lib/parser.ts | 27 ++- src/locales/en.json | 1 + src/locales/ko.json | 1 + 6 files changed, 304 insertions(+), 29 deletions(-) create mode 100644 src/lib/__tests__/insights-parser.test.ts diff --git a/README.md b/README.md index e215bc4..776021c 100644 --- a/README.md +++ b/README.md @@ -1,36 +1,80 @@ -This is a [Next.js](https://nextjs.org) project bootstrapped with [`create-next-app`](https://nextjs.org/docs/app/api-reference/cli/create-next-app). +# followprint -## Getting Started +> Instagram 데이터 export ZIP 한 개를 끌어다 놓으면 팔로우 관계와 활동 패턴을 +> 그 자리에서 분석한다. **모든 처리는 브라우저 안에서 끝난다 — 서버 없음, 업로드 없음, 로그인 없음.** -First, run the development server: +## 무엇을 보여주는가 + +| 영역 | 내용 | +| --- | --- | +| 관계 분석 | 맞팔(`mutual`) / 내가만 따르는(`nonMutual`) / 팬만(`fansOnly`) / 보류 / 최근 언팔 / 친한 친구 / 차단 / 제한 | +| 캐릭터 카드 | 6개 캐릭터 타입 (Influencer / Butterfly / Observer / Selective / Explorer / Minimalist) + 4개 점수 (Social / Loyalty / Curiosity / Selectivity) + 활동 시간대 + 월간 팔로우 속도 | +| 인사이트 | 좋아요 많이 누른 계정 Top 20, 저장 게시물 Top 20, 프로필 검색 / 단어 검색 기록, 24시간 로그인 분포, 채팅 상대 | + +## 데이터를 어떻게 받는가 + +1. Instagram 앱 → **설정 → 내 정보 및 권한 → 정보 다운로드** +2. 형식: **JSON** (HTML도 호환) +3. 데이터 종류: 모두 또는 `followers_and_following + activity` 만 +4. 받은 ZIP 파일을 followprint 페이지에 끌어다 놓는다 + +## 개인정보 + +- ZIP 안의 모든 파일은 **`JSZip` 으로 브라우저에서 직접 풀고 파싱한다** +- 네트워크 요청은 폰트와 정적 자산 외에 **0건** +- HTML 파싱 단계는 모두 `DOMPurify` 의 명시적 화이트리스트 (a, div, span, p, td, tr, table, ...) 를 통과한 뒤에만 DOMParser에 도달한다 +- 새로고침하면 데이터는 메모리에서 사라진다 + +## 기술 스택 + +- **Next.js 16** (App Router, `output: "export"` — 정적 사이트) +- **React 19** + TypeScript strict +- **Tailwind v4** +- **JSZip** + **DOMPurify** + **vitest** + **jsdom** +- **i18n**: 한국어 / 영어 토글, Instagram export 의 KO/EN 날짜 포맷 모두 파싱 + +## 개발 ```bash -npm run dev -# or -yarn dev -# or -pnpm dev -# or -bun dev +npm install +npm run dev # 개발 서버 +npm run build # 정적 사이트 빌드 (out/ 에 떨어짐) +npm test # vitest run +npm run lint # eslint ``` -Open [http://localhost:3000](http://localhost:3000) with your browser to see the result. +## 테스트 + +`src/lib/__tests__/` 안에 vitest 케이스가 있다: -You can start editing the page by modifying `app/page.tsx`. The page auto-updates as you edit the file. +- `parser.test.ts` — JSON / HTML 양 포맷 + mutual / nonMutual / fansOnly 계산 + INVALID_ZIP / UNSUPPORTED_FORMAT / malformed entries + 7종 분류 (pending / unfollowed / closeFriends / blocked / restricted) +- `parse-utils.test.ts` — KO / EN 날짜 (오전·오후·12시 경계) + DOMPurify XSS 회귀 (script / onclick stripping) +- `character.test.ts` — 6개 캐릭터 타입 분류 + 점수 0~100 범위 + highlight 매칭 + 빈 입력 / 동률 케이스 +- `insights-parser.test.ts` — likedPosts / savedPosts / profileSearches / wordSearches / loginActivity / chatList 회귀 가드 -This project uses [`next/font`](https://nextjs.org/docs/app/building-your-application/optimizing/fonts) to automatically optimize and load [Geist](https://vercel.com/font), a new font family for Vercel. +CI (`.github/workflows/ci.yml`) 에서 push / PR 마다 자동 실행한다. -## Learn More +## 캐릭터 분류 기준 -To learn more about Next.js, take a look at the following resources: +| 타입 | 조건 | +| --- | --- | +| **Influencer** | followers / following 비율 > 3 AND followers > 500 | +| **Selective** | following < 200 AND mutual / following > 0.6 | +| **Explorer** | pending / (pending + following) > 0.1 | +| **Butterfly** | following > 300 AND mutual / following > 0.5 (또는 default with mutualRate > 0.5) | +| **Observer** | following > 300 AND mutual / following < 0.3 (또는 default) | +| **Minimalist** | following < 100 AND followers < 100 | -- [Next.js Documentation](https://nextjs.org/docs) - learn about Next.js features and API. -- [Learn Next.js](https://nextjs.org/learn) - an interactive Next.js tutorial. +`src/lib/character.ts` 에 정의되어 있다. -You can check out [the Next.js GitHub repository](https://github.com/vercel/next.js) - your feedback and contributions are welcome! +## Instagram 포맷 변경 대응 -## Deploy on Vercel +Instagram 은 가끔 export 디렉토리 구조와 HTML 클래스명을 바꾼다. 회귀가 발생하면 +`src/lib/__tests__/parser.test.ts` 와 `insights-parser.test.ts` 가 먼저 깨지고, +`parser.ts` 의 `validateInstagramZip` 가 새로운 경로 패턴을 받아들이지 못하면 +사용자에게 `INVALID_ZIP` 또는 `EMPTY_DATA` 가 노출된다. 두 함수 중 하나가 +fail 하면 IG export 형식 변경을 의심해야 한다. -The easiest way to deploy your Next.js app is to use the [Vercel Platform](https://vercel.com/new?utm_medium=default-template&filter=next.js&utm_source=create-next-app&utm_campaign=create-next-app-readme) from the creators of Next.js. +## License -Check out our [Next.js deployment documentation](https://nextjs.org/docs/app/building-your-application/deploying) for more details. +MIT diff --git a/src/lib/__tests__/insights-parser.test.ts b/src/lib/__tests__/insights-parser.test.ts new file mode 100644 index 0000000..e76c798 --- /dev/null +++ b/src/lib/__tests__/insights-parser.test.ts @@ -0,0 +1,205 @@ +// Regression guards for the insights HTML parsers. The Instagram export +// format is not stable — class names and label text change every few months — +// and these parsers are the most fragile surface in the project. Any test +// that goes red here is a strong signal that IG changed their layout. +// +// The fixtures below are minimal extracts of real exports, simplified to the +// shape that each parser actually walks. They intentionally include the +// extra wrapper divs and class noise that IG ships, so that selector changes +// (e.g. dropping `_2piu`) are caught. + +import { describe, it, expect } from "vitest"; +import JSZip from "jszip"; +import { parseInsights } from "@/lib/insights-parser"; + +async function buildZip(files: Record): Promise { + const zip = new JSZip(); + for (const [path, content] of Object.entries(files)) { + zip.file(path, content); + } + // Round-trip through generateAsync so that the resulting JSZip behaves the + // same as one loaded from disk (file metadata, not just in-memory shortcut). + const blob = await zip.generateAsync({ type: "blob" }); + return JSZip.loadAsync(blob); +} + +describe("parseInsights — likedPosts (KO label)", () => { + it("extracts usernames from `사용자 이름` rows", async () => { + const html = ` + +
+ + +
사용자 이름alice
+
+
+ + +
사용자 이름bob
+
+ + `; + const zip = await buildZip({ + "your_instagram_activity/likes/liked_posts.html": html, + }); + const insights = await parseInsights(zip); + const names = insights.topLikedAccounts.map((r) => r.name).sort(); + expect(names).toEqual(["alice", "bob"]); + }); +}); + +describe("parseInsights — likedPosts (EN label)", () => { + it("extracts usernames from `Username` rows", async () => { + const html = ` + + + + +
Usernamecarol
Usernamedave
+ + `; + const zip = await buildZip({ + "your_instagram_activity/likes/liked_posts.html": html, + }); + const insights = await parseInsights(zip); + const names = insights.topLikedAccounts.map((r) => r.name).sort(); + expect(names).toEqual(["carol", "dave"]); + }); +}); + +describe("parseInsights — savedPosts (h2 usernames)", () => { + it("collects single-token h2 entries", async () => { + const html = ` + +

spaceship_one

+

not a username

+

cometchaser

+

this_is_too_long_to_be_a_real_instagram_handle_xxxxxxxx

+ + `; + const zip = await buildZip({ + "your_instagram_activity/saved/saved_posts.html": html, + }); + const insights = await parseInsights(zip); + const names = insights.topSavedAccounts.map((r) => r.name).sort(); + // "not a username" rejected (whitespace), 50+ char string rejected. + expect(names).toEqual(["cometchaser", "spaceship_one"]); + }); +}); + +describe("parseInsights — profileSearches", () => { + it("returns h2 names with extracted timestamps", async () => { + const html = ` + +
+

searched_user_1

+
3월 16, 2026 6:41 오후
+
+
+

searched_user_2

+
4월 1, 2026 9:00 오전
+
+ + `; + const zip = await buildZip({ + "your_instagram_activity/recent_searches/profile_searches.html": html, + }); + const insights = await parseInsights(zip); + expect(insights.profileSearches).toHaveLength(2); + expect(insights.profileSearches[0].name).toBe("searched_user_1"); + expect(insights.profileSearches[0].timestamp).toBeGreaterThan(0); + }); +}); + +describe("parseInsights — wordSearches", () => { + it("extracts query text from 검색 / Search rows", async () => { + const html = ` + + + + + +
검색
코딩
3월 16, 2026 6:41 오후
+ + + + +
Search
music
4월 1, 2026 9:00 오전
+ + `; + const zip = await buildZip({ + "your_instagram_activity/recent_searches/word_or_phrase_searches.html": html, + }); + const insights = await parseInsights(zip); + const queries = insights.wordSearches.map((r) => r.name).sort(); + expect(queries).toEqual(["music", "코딩"]); + }); +}); + +describe("parseInsights — loginActivity", () => { + it("counts ISO timestamps in h2 elements per hour", async () => { + const html = ` + +

2026-04-01T09:23:00Z

+

2026-04-01T09:45:00Z

+

2026-04-01T18:01:00Z

+ + `; + const zip = await buildZip({ + "security_and_login_information/login_activity.html": html, + }); + const insights = await parseInsights(zip); + expect(insights.loginHours[9]).toBe(2); + expect(insights.loginHours[18]).toBe(1); + expect(insights.loginHours.reduce((a, b) => a + b, 0)).toBe(3); + }); + + it("counts KO 오전/오후 cells in 12-hour clock", async () => { + const html = ` + + + + + +
3월 16, 2026 6:41 오후
3월 16, 2026 6:50 오후
3월 16, 2026 9:00 오전
+ + `; + const zip = await buildZip({ + "security_and_login_information/login_activity.html": html, + }); + const insights = await parseInsights(zip); + expect(insights.loginHours[18]).toBe(2); + expect(insights.loginHours[9]).toBe(1); + }); +}); + +describe("parseInsights — chats", () => { + it("extracts chat partner names from h2 a", async () => { + const html = ` + +

alice

+

bob

+ + `; + const zip = await buildZip({ + "your_instagram_activity/messages/chats.html": html, + }); + const insights = await parseInsights(zip); + expect(insights.chatNames.sort()).toEqual(["alice", "bob"]); + }); +}); + +describe("parseInsights — empty / missing files", () => { + it("returns zeros when none of the source files exist", async () => { + const zip = await buildZip({ + "followers_and_following/followers_1.html": "", + }); + const insights = await parseInsights(zip); + expect(insights.topLikedAccounts).toEqual([]); + expect(insights.topSavedAccounts).toEqual([]); + expect(insights.profileSearches).toEqual([]); + expect(insights.wordSearches).toEqual([]); + expect(insights.chatNames).toEqual([]); + expect(insights.loginHours).toEqual(new Array(24).fill(0)); + }); +}); diff --git a/src/lib/__tests__/parser.test.ts b/src/lib/__tests__/parser.test.ts index 8ce6f7f..cf26671 100644 --- a/src/lib/__tests__/parser.test.ts +++ b/src/lib/__tests__/parser.test.ts @@ -126,17 +126,16 @@ describe("parseInstagramZip", () => { await expect(parseInstagramZip(file)).rejects.toThrow("INVALID_ZIP"); }); - it("handles empty followers_and_following directory", async () => { + it("rejects an Instagram-shaped zip with no actual records as EMPTY_DATA", async () => { + // Validation passes (path contains "followers") but no parseable data + // exists. The parser surfaces this as EMPTY_DATA so the UI can tell the + // user "this looks like an IG export but the format may have changed", + // which is more actionable than rendering an empty dashboard. const zip = new JSZip(); - // Directory marker exists but no actual data files inside zip.file("followers_and_following/readme.txt", "empty export"); const file = await zipToFile(zip); - const result = await parseInstagramZip(file); - - expect(result.followers).toHaveLength(0); - expect(result.following).toHaveLength(0); - expect(result.mutual).toHaveLength(0); + await expect(parseInstagramZip(file)).rejects.toThrow("EMPTY_DATA"); }); it("parses pending, unfollowed, closeFriends, blocked, restricted", async () => { diff --git a/src/lib/parser.ts b/src/lib/parser.ts index 61ee6eb..8446724 100644 --- a/src/lib/parser.ts +++ b/src/lib/parser.ts @@ -151,6 +151,18 @@ function validateInstagramZip(zip: JSZip): void { if (!isInstagram) throw new Error("INVALID_ZIP"); } +function isAnalysisEmpty(a: AnalysisResult): boolean { + return ( + a.followers.length === 0 && + a.following.length === 0 && + a.pendingRequests.length === 0 && + a.recentlyUnfollowed.length === 0 && + a.closeFriends.length === 0 && + a.blockedAccounts.length === 0 && + a.restrictedAccounts.length === 0 + ); +} + // ── Main entry ── export async function parseInstagramZip( @@ -158,7 +170,11 @@ export async function parseInstagramZip( ): Promise { const zip = await JSZip.loadAsync(file); validateInstagramZip(zip); - return analyzeZip(zip); + const analysis = await analyzeZip(zip); + if (isAnalysisEmpty(analysis)) { + throw new Error("EMPTY_DATA"); + } + return analysis; } export async function parseFileFull(file: File): Promise { @@ -172,5 +188,14 @@ export async function parseFileFull(file: File): Promise { parseInsights(zip), ]); + // The validate step only checks that *some* path mentions followers / + // following — that catches "you uploaded the wrong zip" — but it can still + // produce 0 records if Instagram changed their export schema. Surface that + // as a distinct error so the UI can tell the user "this looks like an IG + // export but the format may have changed" instead of an empty dashboard. + if (isAnalysisEmpty(analysis)) { + throw new Error("EMPTY_DATA"); + } + return { analysis, insights }; } diff --git a/src/locales/en.json b/src/locales/en.json index ca10d19..aeb9c06 100644 --- a/src/locales/en.json +++ b/src/locales/en.json @@ -48,6 +48,7 @@ "INVALID_ZIP": "This doesn't look like an Instagram data export. Make sure you downloaded the ZIP from Instagram.", "UNSUPPORTED_FORMAT": "Please upload a .zip file from Instagram's data export.", "FILE_TOO_LARGE": "File is too large. Maximum allowed size is 500 MB.", + "EMPTY_DATA": "We could read the ZIP, but it didn't contain any followers or following data. Instagram may have changed their export format — please request a fresh export, and if the problem persists, open an issue on GitHub.", "default": "Something went wrong. Please try again with a valid Instagram data export." } }, diff --git a/src/locales/ko.json b/src/locales/ko.json index 7b4ab85..e3643b4 100644 --- a/src/locales/ko.json +++ b/src/locales/ko.json @@ -48,6 +48,7 @@ "INVALID_ZIP": "인스타그램 데이터 내보내기 파일이 아닌 것 같습니다. 인스타그램에서 다운로드한 ZIP 파일인지 확인해주세요.", "UNSUPPORTED_FORMAT": "인스타그램 데이터 내보내기에서 받은 .zip 파일을 올려주세요.", "FILE_TOO_LARGE": "파일이 너무 큽니다. 최대 허용 크기는 500 MB입니다.", + "EMPTY_DATA": "ZIP은 정상적으로 읽었지만 팔로워/팔로잉 데이터가 들어있지 않습니다. 인스타그램이 내보내기 형식을 바꿨을 수 있습니다 — 다시 다운로드해보시고, 그래도 같은 문제라면 GitHub 이슈로 알려주세요.", "default": "문제가 발생했습니다. 유효한 인스타그램 데이터 파일로 다시 시도해주세요." } }, From f8608dddee8ca3294831c402022df6ce8e9ce68b Mon Sep 17 00:00:00 2001 From: heznpc Date: Sat, 11 Apr 2026 20:11:20 +0900 Subject: [PATCH 2/2] chore: ignore scripts/ in eslint scripts/debug-zip.js and scripts/generate-test-data.js are local CommonJS one-offs (never bundled into the site) and the `@typescript-eslint/no-require-imports` rule was failing CI on their `require()` usage. Add scripts/** to globalIgnores so the Next lint rules only gate actual product code. --- eslint.config.mjs | 3 +++ 1 file changed, 3 insertions(+) diff --git a/eslint.config.mjs b/eslint.config.mjs index 05e726d..12cc271 100644 --- a/eslint.config.mjs +++ b/eslint.config.mjs @@ -12,6 +12,9 @@ const eslintConfig = defineConfig([ "out/**", "build/**", "next-env.d.ts", + // Local debug / fixture generation scripts — CommonJS one-offs that + // are never bundled into the site and don't need the Next lint rules. + "scripts/**", ]), ]);