Transform DokuWiki into Accessible WCAG 2.1 Compliant Documents
WikiAccess converts DokuWiki pages into accessible HTML and Word documents with comprehensive accessibility testing, image processing, and broken link detection.
- Python 3.8+
- Node.js & npm (for accessibility testing)
- Pandoc 2.9+
# Clone and setup
git clone https://github.com/OER-Forge/wikiaccess.git
cd wikiaccess
python3 -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txt
npm install pa11yCreate a file named URLS.txt with one DokuWiki URL per line:
https://msuperl.org/wikis/pcubed/doku.php?id=183_notes:scalars_and_vectors
https://msuperl.org/wikis/pcubed/doku.php?id=183_notes:displacement_and_velocity
https://msuperl.org/wikis/pcubed/doku.php?id=183_notes:modeling_with_vpython
python3 convert_from_file_list.pyThis will:
- β Convert all seed URLs to HTML, DOCX, and Markdown
- π₯ Download all images with alt-text
- π Test accessibility (WCAG 2.1 AA/AAA)
- π Organize output in
output/directory - π Auto-discover pages referenced by broken links
- π Generate initial reports
python3 review_discoveries.pyThis will:
- π Show statistics on discovered pages
- π List pages found from broken links
- β Let you approve/reject each discovery
- πΎ Save approved pages for conversion
Quick approval of all:
python3 review_discoveries.py --bulk-approvepython3 convert_approved.pyThis will:
- β Convert all approved discovered pages
- π₯ Download their images
- π Test accessibility
- π Auto-discover more pages (next depth)
- π Update discovery status in database
python3 test_full_workflow.pyThis will:
- π Show updated database statistics
- π Regenerate all accessibility reports
- π Show discovery workflow progress
- π List all generated output files
If more pages were discovered in Step 4, repeat Steps 3-5 until no new pages are found.
python3 test_broken_links.pyThis will:
- π Identify any remaining broken internal wiki links
- π Show conversion coverage
- π― Final statistics
output/
βββ html/ # Accessible HTML pages
βββ docx/ # Microsoft Word documents
βββ markdown/ # Editable Markdown sources
βββ images/ # Downloaded media assets
βββ reports/ # Accessibility compliance reports
β βββ index.html # Hub with all reports
β βββ accessibility_report.html # WCAG 2.1 scores
β βββ image_report.html # Image analysis
β βββ broken_links_report.html # Broken links
β βββ [page]_accessibility.html # Per-page reports
βββ conversion_history.db # SQLite database with all metadata
WikiAccess tracks all conversions in SQLite:
-- View database stats
sqlite3 output/conversion_history.db
-- Check page conversions
SELECT COUNT(*) FROM pages;
-- Check image downloads
SELECT status, COUNT(*) FROM images GROUP BY status;
-- Check link status
SELECT status, COUNT(*) FROM links GROUP BY status;- Semantic HTML5 structure
- MathJax 3 equations
- Responsive design
- Dark mode support
- Interactive navigation
- Native OMML equations
- Embedded images
- Accessibility metadata
- Editable formatting
- Print-friendly layout
- Accessibility Dashboard: WCAG 2.1 AA/AAA scores
- Image Report: Alt-text quality, download status, statistics
- Broken Links Report: Missing page references
- Individual Page Reports: Detailed accessibility issues per page
- WCAG 2.1 AA/AAA Compliance: Powered by pa11y
- Comprehensive Scoring: 50+ accessibility rules
- Interactive Reports: Click-through dashboards with fix recommendations
- Progress Tracking: Historical trends and aggregate statistics
- Auto-Download: Fetches all images from wiki
- Alt-Text Extraction: Preserves accessibility metadata
- YouTube Support: Auto-generates thumbnails
- Status Tracking: Identifies failed downloads
- Analytics: Reports image usage statistics
- Internal Link Resolution: Converts wiki links to full URLs
- Broken Link Detection: Identifies pages not yet converted
- Link Analytics: Shows which pages are most referenced
- Discovery Integration: Suggests missing pages for conversion
- Conversion History: Complete audit trail
- Incremental Updates: Skips already-converted pages
- Batch Management: Track conversion runs
- Statistics Export: CSV reports for stakeholders
from wikiaccess import convert_wiki_page
result = convert_wiki_page(
wiki_url="https://msuperl.org/wikis/pcubed",
page_name="183_notes:scalars_and_vectors",
output_dir="output"
)
print(f"HTML: {result['html_path']}")
print(f"WCAG AA Score: {result['aa_score']}%")# Edit markdown
nano output/markdown/my_page.md
# Re-convert without fetching from wiki
python3 convert_from_markdown.py output/markdown/my_page.mdpython3 -c "
from wikiaccess.database import ConversionDatabase
db = ConversionDatabase()
pages = db.get_all_pages_with_scores()
for p in pages:
if 'scalars' in p['page_id']:
print(f\"{p['page_id']}: AA={p['aa_score']}%, AAA={p['aaa_score']}%\")
"- DATABASE.md - Database schema and queries
- docs/MODULE_DOCUMENTATION.md - Full API reference
- docs/ACCESSIBILITY_SCORING.md - WCAG 2.1 details
- Python: BeautifulSoup4, python-docx, Pillow, requests
- Accessibility: pa11y engine (50+ WCAG rules)
- Document Conversion: Pandoc
- Database: SQLite3
- Equations: LaTeX β MathJax (HTML) / OMML (Word)
MIT License - see LICENSE file for details
Made with β€οΈ for accessible education and documentation