Stratified validation reporting #6

dosumis · 2025-11-06T12:53:14Z

No description provided.

Phase 2 web scraping

…lit_agent into stratified-validation-reporting

- Revert validation demo to 100 URL sample size for cost control (377 URLs was 2.36x more expensive than expected) - Fix unit test assertion to expect PDF_EXTRACTION instead of API_LOOKUP for PDFExtractor - Maintain stratified validation reporting benefits while keeping performance manageable The full 377 URL corpus proved much more complex than sampled URLs: - More PDFs requiring LLM extraction - More web scraping fallbacks for failed Phase 1 extractions - Research data repositories and paywalled content 100 URL sampling provides statistically valid stratified analysis without performance penalty. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

- Add domain blacklist for 8 problematic publishers (ScienceDirect, MDPI, Oxford Academic, etc.) - Implement two-stage web search approach: general search + targeted PubMed search - Add comprehensive fallback system with URL fragment extraction and metadata parsing - Include NCBI E-utilities integration for complete identifier retrieval (PMID, DOI, PMC) - Transform 66 blocked domain failures into potential successes - Update validation demo path to validation_workspace/demo_reports 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

Removed Python 3.9 support

dosumis and others added 5 commits November 6, 2025 12:13

Merge pull request #5 from dosumis/phase-2-web-scraping

adc6bfb

Phase 2 web scraping

Merge branch 'stratified-validation-reporting' of github.com:dosumis/…

48cb7ec

…lit_agent into stratified-validation-reporting

Update test.yml

d1cb097

Removed Python 3.9 support

dosumis merged commit 08afc09 into main Nov 7, 2025
4 checks passed

dosumis deleted the stratified-validation-reporting branch November 7, 2025 06:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Stratified validation reporting #6

Stratified validation reporting #6

Uh oh!

dosumis commented Nov 6, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Stratified validation reporting #6

Stratified validation reporting #6

Uh oh!

Conversation

dosumis commented Nov 6, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants