-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Labels
bugSomething isn't workingSomething isn't working
Description
When using multiple providers (via `providers.all()` or an array), `combineResults` in `src/archive.ts:115-155` merges all pages and sorts by timestamp, but doesn't deduplicate.
The same URL archived at the same time can appear from both Wayback and CommonCrawl (they share CDX-based data). The result array ends up with near-identical entries that only differ in `_meta.provider`.
This matters when users rely on `.pages.length` or iterate over results — they process the same snapshot twice.
A reasonable dedup key would be `url + timestamp` (or `url + snapshot`), keeping the first occurrence per provider ordering. Could also be opt-in via an option if preserving all provider entries is wanted in some cases.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working