Skip to content

OER-Forge/wikiaccess

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

38 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

WikiAccess πŸ“š

Transform DokuWiki into Accessible WCAG 2.1 Compliant Documents

WikiAccess converts DokuWiki pages into accessible HTML and Word documents with comprehensive accessibility testing, image processing, and broken link detection.


πŸš€ Quick Start

Prerequisites

  • Python 3.8+
  • Node.js & npm (for accessibility testing)
  • Pandoc 2.9+

Installation

# Clone and setup
git clone https://github.com/OER-Forge/wikiaccess.git
cd wikiaccess

python3 -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate

pip install -r requirements.txt
npm install pa11y

πŸ“‹ Complete Test Workflow

Step 1: Create URLS.txt

Create a file named URLS.txt with one DokuWiki URL per line:

https://msuperl.org/wikis/pcubed/doku.php?id=183_notes:scalars_and_vectors
https://msuperl.org/wikis/pcubed/doku.php?id=183_notes:displacement_and_velocity
https://msuperl.org/wikis/pcubed/doku.php?id=183_notes:modeling_with_vpython

Step 2: Convert Seed Pages

python3 convert_from_file_list.py

This will:

  • βœ… Convert all seed URLs to HTML, DOCX, and Markdown
  • πŸ“₯ Download all images with alt-text
  • πŸ“Š Test accessibility (WCAG 2.1 AA/AAA)
  • πŸ“ Organize output in output/ directory
  • πŸ” Auto-discover pages referenced by broken links
  • πŸ“Š Generate initial reports

Step 3: Review Discovered Pages

python3 review_discoveries.py

This will:

  • πŸ“Š Show statistics on discovered pages
  • πŸ”— List pages found from broken links
  • ⭐ Let you approve/reject each discovery
  • πŸ’Ύ Save approved pages for conversion

Quick approval of all:

python3 review_discoveries.py --bulk-approve

Step 4: Convert Approved Discovered Pages

python3 convert_approved.py

This will:

  • βœ… Convert all approved discovered pages
  • πŸ“₯ Download their images
  • πŸ“Š Test accessibility
  • πŸ” Auto-discover more pages (next depth)
  • πŸ”„ Update discovery status in database

Step 5: Verify Conversion & Regenerate Reports

python3 test_full_workflow.py

This will:

  • πŸ“Š Show updated database statistics
  • πŸ”„ Regenerate all accessibility reports
  • πŸ“ˆ Show discovery workflow progress
  • πŸ“ List all generated output files

Step 6: Repeat Discovery Cycle (Optional)

If more pages were discovered in Step 4, repeat Steps 3-5 until no new pages are found.

Step 7: Analyze Final Broken Links

python3 test_broken_links.py

This will:

  • πŸ”— Identify any remaining broken internal wiki links
  • πŸ“Š Show conversion coverage
  • πŸ’― Final statistics

πŸ“ Output Structure

output/
β”œβ”€β”€ html/                    # Accessible HTML pages
β”œβ”€β”€ docx/                    # Microsoft Word documents
β”œβ”€β”€ markdown/                # Editable Markdown sources
β”œβ”€β”€ images/                  # Downloaded media assets
β”œβ”€β”€ reports/                 # Accessibility compliance reports
β”‚   β”œβ”€β”€ index.html          # Hub with all reports
β”‚   β”œβ”€β”€ accessibility_report.html      # WCAG 2.1 scores
β”‚   β”œβ”€β”€ image_report.html              # Image analysis
β”‚   β”œβ”€β”€ broken_links_report.html       # Broken links
β”‚   └── [page]_accessibility.html      # Per-page reports
└── conversion_history.db    # SQLite database with all metadata

πŸ—‚οΈ Database Features

WikiAccess tracks all conversions in SQLite:

-- View database stats
sqlite3 output/conversion_history.db

-- Check page conversions
SELECT COUNT(*) FROM pages;

-- Check image downloads
SELECT status, COUNT(*) FROM images GROUP BY status;

-- Check link status
SELECT status, COUNT(*) FROM links GROUP BY status;

πŸ“Š Output Formats

HTML

  • Semantic HTML5 structure
  • MathJax 3 equations
  • Responsive design
  • Dark mode support
  • Interactive navigation

Word (DOCX)

  • Native OMML equations
  • Embedded images
  • Accessibility metadata
  • Editable formatting
  • Print-friendly layout

Reports

  • Accessibility Dashboard: WCAG 2.1 AA/AAA scores
  • Image Report: Alt-text quality, download status, statistics
  • Broken Links Report: Missing page references
  • Individual Page Reports: Detailed accessibility issues per page

🎯 Key Features

β™Ώ Accessibility Testing

  • WCAG 2.1 AA/AAA Compliance: Powered by pa11y
  • Comprehensive Scoring: 50+ accessibility rules
  • Interactive Reports: Click-through dashboards with fix recommendations
  • Progress Tracking: Historical trends and aggregate statistics

πŸ–ΌοΈ Image Processing

  • Auto-Download: Fetches all images from wiki
  • Alt-Text Extraction: Preserves accessibility metadata
  • YouTube Support: Auto-generates thumbnails
  • Status Tracking: Identifies failed downloads
  • Analytics: Reports image usage statistics

πŸ”— Link Management

  • Internal Link Resolution: Converts wiki links to full URLs
  • Broken Link Detection: Identifies pages not yet converted
  • Link Analytics: Shows which pages are most referenced
  • Discovery Integration: Suggests missing pages for conversion

πŸ“Š Database Tracking

  • Conversion History: Complete audit trail
  • Incremental Updates: Skips already-converted pages
  • Batch Management: Track conversion runs
  • Statistics Export: CSV reports for stakeholders

πŸ”§ Advanced Usage

Convert Single Page

from wikiaccess import convert_wiki_page

result = convert_wiki_page(
    wiki_url="https://msuperl.org/wikis/pcubed",
    page_name="183_notes:scalars_and_vectors",
    output_dir="output"
)

print(f"HTML: {result['html_path']}")
print(f"WCAG AA Score: {result['aa_score']}%")

Edit & Re-Convert (No Re-Scraping)

# Edit markdown
nano output/markdown/my_page.md

# Re-convert without fetching from wiki
python3 convert_from_markdown.py output/markdown/my_page.md

Check Specific Page Accessibility

python3 -c "
from wikiaccess.database import ConversionDatabase
db = ConversionDatabase()
pages = db.get_all_pages_with_scores()
for p in pages:
    if 'scalars' in p['page_id']:
        print(f\"{p['page_id']}: AA={p['aa_score']}%, AAA={p['aaa_score']}%\")
"

πŸ“š Documentation


πŸ› οΈ Technical Stack

  • Python: BeautifulSoup4, python-docx, Pillow, requests
  • Accessibility: pa11y engine (50+ WCAG rules)
  • Document Conversion: Pandoc
  • Database: SQLite3
  • Equations: LaTeX β†’ MathJax (HTML) / OMML (Word)

πŸ“„ License

MIT License - see LICENSE file for details


Made with ❀️ for accessible education and documentation

About

Transform DokuWiki into Accessible Documents

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •