Skip to content

fix(route/cw): Deprecate obsolete puppeteer method.#21158

Open
dzx-dzx wants to merge 2 commits intoDIYgod:masterfrom
dzx-dzx:cw-0215
Open

fix(route/cw): Deprecate obsolete puppeteer method.#21158
dzx-dzx wants to merge 2 commits intoDIYgod:masterfrom
dzx-dzx:cw-0215

Conversation

@dzx-dzx
Copy link
Contributor

@dzx-dzx dzx-dzx commented Feb 15, 2026

Involved Issue / 该 PR 相关 Issue

Close #

Example for the Proposed Route(s) / 路由地址示例

/cw/today

New RSS Route Checklist / 新 RSS 路由检查表

  • New Route / 新的路由
  • Anti-bot or rate limit / 反爬/频率限制
    • If yes, do your code reflect this sign? / 如果有, 是否有对应的措施?
  • Date and time / 日期和时间
    • Parsed / 可以解析
    • Correct time zone / 时区正确
  • New package added / 添加了新的包
  • Puppeteer

Note / 说明

Copilot AI review requested due to automatic review settings February 15, 2026 14:53
@github-actions github-actions bot added the route label Feb 15, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the /cw/today route implementation to stop using the deprecated Puppeteer helper and instead use the newer getPuppeteerPage helper.

Changes:

  • Switch CW utils from browser.newPage() to getPuppeteerPage() for cookie acquisition and page rendering.
  • Refactor CW parsing helpers to avoid passing a Puppeteer browser into getCookie/parseItems.
  • Simplify /cw/today handler by removing manual browser creation/closure.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.

File Description
lib/routes/cw/utils.ts Migrates CW Puppeteer usage to getPuppeteerPage and adjusts helper function signatures.
lib/routes/cw/today.ts Removes explicit browser lifecycle management and calls the updated parsePage.
Comments suppressed due to low confidence (1)

lib/routes/cw/utils.ts:59

  • getPuppeteerPage creates a new browser and only auto-closes it via a 30s timeout. Here the page is closed, but the browser is never closed/destroyed, which can leak resources under load. Destructure destory from getPuppeteerPage(...) and ensure it is called in a finally block after you finish using the page/UA.
    const { page, browser } = await getPuppeteerPage(pageUrl, { noGoto: true });
    await setCookies(page, cookie, 'cw.com.tw');
    await page.goto(pageUrl, {
        waitUntil: 'domcontentloaded',
    });

    await page.waitForSelector('.caption');
    const response = await page.evaluate(() => document.documentElement.innerHTML);
    await page.close();

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 49 to 55
logger.http(`Requesting ${pageUrl}`);
const cookie = await getCookie(cache.tryGet);
const { page, browser } = await getPuppeteerPage(pageUrl, { noGoto: true });
await setCookies(page, cookie, 'cw.com.tw');
await page.goto(pageUrl, {
waitUntil: 'domcontentloaded',
});
Copy link

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous implementation intercepted requests and aborted non-document/script resources; switching to getPuppeteerPage without setting request interception can significantly increase bandwidth/latency for these pages. Consider using getPuppeteerPage(..., { onBeforeLoad }) to enable request interception (or otherwise block images/fonts/styles) before navigation to keep behavior/performance consistent.

Copilot uses AI. Check for mistakes.
Comment on lines 30 to 32
async function handler(ctx) {
const browser = await puppeteer();

const { $, items } = await parsePage('today', browser, ctx);

await browser.close();
const { $, items } = await parsePage('today', ctx);

Copy link

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file still imports puppeteer (top of file) but no longer uses it after switching to parsePage('today', ctx). Since ESLint enforces @typescript-eslint/no-unused-vars, remove the unused import to avoid CI/lint failures.

Copilot uses AI. Check for mistakes.
Comment on lines 35 to 40
cookie = await tryGet('cw:cookie', async () => {
const page = await browser.newPage();
await page.setRequestInterception(true);
page.on('request', (request) => {
request.resourceType() === 'document' || request.resourceType() === 'script' ? request.continue() : request.abort();
});
logger.http(`Requesting ${baseUrl}/user/get/cookie-bar`);
await page.goto(`${baseUrl}/user/get/cookie-bar`, {
waitUntil: 'domcontentloaded',
});
const { page } = (await getPuppeteerPage(`${baseUrl}/user/get/cookie-bar`));
cookie = await getCookies(page);
await page.close();
return cookie;
Copy link

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getCookie uses getPuppeteerPage(...) but only closes the page; the underlying browser is left open until the helper's 30s timeout. Please also close/destroy the browser returned by getPuppeteerPage (e.g. via its destory function) to avoid accumulating Chromium processes on cache misses.

Copilot uses AI. Check for mistakes.
await page.goto(`${baseUrl}/user/get/cookie-bar`, {
waitUntil: 'domcontentloaded',
});
const { page } = (await getPuppeteerPage(`${baseUrl}/user/get/cookie-bar`));
Copy link

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously this cookie-fetch step aborted non-essential resource types via request interception; the new getPuppeteerPage call will load all subresources by default. Consider setting up request interception via onBeforeLoad (or equivalent) to keep the cookie fetch lightweight.

Suggested change
const { page } = (await getPuppeteerPage(`${baseUrl}/user/get/cookie-bar`));
const { page } = await getPuppeteerPage(`${baseUrl}/user/get/cookie-bar`, {
onBeforeLoad: async (page) => {
await page.setRequestInterception(true);
page.on('request', (request) => {
const resourceType = request.resourceType();
if (resourceType === 'document' || resourceType === 'xhr' || resourceType === 'fetch') {
request.continue();
} else {
request.abort();
}
});
},
});

Copilot uses AI. Check for mistakes.
Comment on lines +46 to +51
const parsePage = async (path, ctx) => {
const pageUrl = `${baseUrl}${pathMap[path].pageUrl(ctx.req.param('channel'))}`;

const cookie = await getCookie(browser, cache.tryGet);
const page = await browser.newPage();
await page.setRequestInterception(true);
page.on('request', (request) => {
request.resourceType() === 'document' || request.resourceType() === 'script' ? request.continue() : request.abort();
});
await setCookies(page, cookie, 'cw.com.tw');
logger.http(`Requesting ${pageUrl}`);
const cookie = await getCookie(cache.tryGet);
const { page, browser } = await getPuppeteerPage(pageUrl, { noGoto: true });
Copy link

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parsePage signature was changed to (path, ctx), but other CW route handlers (e.g. lib/routes/cw/author.ts, master.ts, sub.ts) still call parsePage(path, browser, ctx). This will cause TypeScript compile failures; either update those call sites to the new signature, or keep backward compatibility (e.g. accept an optional browser parameter).

Copilot uses AI. Check for mistakes.
@github-actions
Copy link
Contributor

Successfully generated as following:

http://localhost:1200/cw/today - Failed ❌
HTTPError: Response code 503 (Service Unavailable)

Error Message:<br/>FetchError: [GET] &quot;https://www.cw.com.tw/article/5139772&quot;: 403 Forbidden
Route: /cw/today
Full Route: /cw/today
Node Version: v24.13.1
Git Hash: 11ff3131

@github-actions github-actions bot added the auto: not ready to review Users can't get the RSS feed output according to automated testing results label Feb 15, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Feb 15, 2026

Auto Review

  • [Rule 51] lib/routes/cw/utils.ts:82-83: Unnecessary await inside map() function. The async/await wrapper is redundant since Promise.all() already handles promise resolution. Suggested fix: Keep it as list.map((item) => tryGet(item.link, async () => {...})) without the outer async/await.

  • [Code Quality] lib/routes/cw/today.ts:2: The import puppeteer is now unused after the handler refactoring. Suggested fix: Remove the unused import statement.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@github-actions github-actions bot removed the auto: not ready to review Users can't get the RSS feed output according to automated testing results label Feb 15, 2026
@github-actions
Copy link
Contributor

Successfully generated as following:

http://localhost:1200/cw/today - Failed ❌
HTTPError: Response code 503 (Service Unavailable)

Error Message:<br/>FetchError: [GET] &quot;https://www.cw.com.tw/article/5139772&quot;: 403 Forbidden
Route: /cw/today
Full Route: /cw/today
Node Version: v24.13.1
Git Hash: d3c976cb

@github-actions github-actions bot added the auto: not ready to review Users can't get the RSS feed output according to automated testing results label Feb 15, 2026
@github-actions
Copy link
Contributor

This PR is stale because it has been opened for more than 3 weeks with no activity. Comment or this will be closed in 7 days.

@github-actions github-actions bot added the Stale No feedback from OP label Mar 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto: not ready to review Users can't get the RSS feed output according to automated testing results route Stale No feedback from OP

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants