Skip to content

Latex Docs Build Broken #10314

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
FoamyGuy opened this issue May 6, 2025 · 1 comment · Fixed by #10322
Closed

Latex Docs Build Broken #10314

FoamyGuy opened this issue May 6, 2025 · 1 comment · Fixed by #10322

Comments

@FoamyGuy
Copy link
Collaborator

FoamyGuy commented May 6, 2025

Here is an example of an action that failed due to this issue: https://github.com/adafruit/circuitpython/actions/runs/14861943022/job/41734534030?pr=10309#step:10:2610

I believe the root cause is an anti-scraping measure implemented by a server that hosts one of the svg files downloaded as part of the pdf docs build.

The error trace in the actions log above notes an error parsing an SVG file. I ran a latex build locally and found that in place of that SVG file is actually an html file:

<!doctype html><html lang="en"><head><title>Making sure you&#39;re not a bot!</title><link rel="stylesheet" href="/.within.website/x/xess/xess.min.css?cachebuster=1.17.1"><meta name="viewport" content="width=device-width, initial-scale=1.0"><meta name="robots" content="noindex,nofollow"><style>
        body,
        html {
            height: 100%;
            display: flex;
            justify-content: center;
            align-items: center;
            margin-left: auto;
            margin-right: auto;
        }

        .centered-div {
            text-align: center;
        }

        #status {
            font-variant-numeric: tabular-nums;
        }

        #progress {
            display: none;
            width: min(20rem, 90%);
            height: 2rem;
            border-radius: 1rem;
            overflow: hidden;
            margin: 1rem 0 2rem;
            outline-color: #b16286;
            outline-offset: 2px;
            outline-style: solid;
            outline-width: 4px;
        }

        .bar-inner {
            background-color: #b16286;
            height: 100%;
            width: 0;
            transition: width 0.25s ease-in;
        }
    </style><script id="anubis_version" type="application/json">"1.17.1"
</script><script id="anubis_challenge" type="application/json">{"challenge":"974fcba5abc68da6dbfaf08d55f7a505964aee2399a5a0597c5a1963982d830d","rules":{"difficulty":4,"report_as":4,"algorithm":"fast"}}
</script><script id="anubis_base_prefix" type="application/json">""
</script></head><body id="top"><main><center><h1 id="title" class=".centered-div">Making sure you&#39;re not a bot!</h1></center><div class="centered-div"><img id="image" style="width:100%;max-width:256px;" src="/.within.website/x/cmd/anubis/static/img/pensive.webp?cacheBuster=1.17.1"> <img style="display:none;" style="width:100%;max-width:256px;" src="/.within.website/x/cmd/anubis/static/img/happy.webp?cacheBuster=1.17.1"><p id="status">Loading...</p><script async type="module" src="/.within.website/x/cmd/anubis/static/js/main.mjs?cacheBuster=1.17.1"></script><div id="progress" role="progressbar" aria-labelledby="status"><div class="bar-inner"></div></div><details><summary>Why am I seeing this?</summary><p>You are seeing this because the administrator of this website has set up <a href="https://github.com/TecharoHQ/anubis">Anubis</a> to protect the server against the scourge of <a href="https://thelibre.news/foss-infrastructure-is-under-attack-by-ai-companies/">AI companies aggressively scraping websites</a>. This can and does cause downtime for the websites, which makes their resources inaccessible for everyone.</p><p>Anubis is a compromise. Anubis uses a <a href="https://anubis.techaro.lol/docs/design/why-proof-of-work">Proof-of-Work</a> scheme in the vein of <a href="https://en.wikipedia.org/wiki/Hashcash">Hashcash</a>, a proposed proof-of-work scheme for reducing email spam. The idea is that at individual scales the additional load is ignorable, but at mass scraper levels it adds up and makes scraping much more expensive.</p><p>Ultimately, this is a hack whose real purpose is to give a "good enough" placeholder solution so that more time can be spent on fingerprinting and identifying headless browsers (EG: via how they do font rendering) so that the challenge proof of work page doesn't need to be presented to users that are much more likely to be legitimate.</p><p>Please note that Anubis requires the use of modern JavaScript features that plugins like <a href="https://jshelter.org/">JShelter</a> will disable. Please disable JShelter or other such plugins for this domain.</p></details><noscript><p>Sadly, you must enable JavaScript to get past this challenge. This is required because AI companies have changed the social contract around how website hosting works. A no-JS solution is a work-in-progress.</p></noscript><div id="testarea"></div></div><footer><center><p>Protected by <a href="https://github.com/TecharoHQ/anubis">Anubis</a> from <a href="https://techaro.lol">Techaro</a>. Made with ❤️ in 🇨🇦.</p><p>Mascot design by <a href="https://bsky.app/profile/celphase.bsky.social">CELPHASE</a>.</p></center></footer></main></body></html>

Which is of course not a valid SVG so this tracks with the error.

I searched the repo for "svg-badge.svg" and found the only references are in the URL: https://hosted.weblate.org/widgets/circuitpython/-/svg-badge.svg I loaded that URL in my firefox browser and indeed for a few seconds I saw a "verifying your human" type page, presumably the same one from the html code. After a few seconds I was forwarded to the svg badge automatically.

It seems that something within the sphinx build / latex pdf making is trying to fetch this SVG, I assume to include it in the PDF, but that the server is returning this "are you a human" challenge page instead of the SVG.

I'm not really sure what we do to resolve that though. The HTML code notes that they are trying to combat AI company scrapping which we are not. But we are making automated requests to their server for this file, so... if they are going to continue to try to block any automated traffic then I don't see a way that we can do anything to resolve it, since we need to fetch the file and the thing fetching it will be automated traffic since it's coming from inside of the sphinx / latex build somewhere rather than a human in a browser.

@tannewt tannewt added the bug label May 7, 2025
@tannewt tannewt added this to the 10.0.0 milestone May 7, 2025
@tannewt
Copy link
Member

tannewt commented May 7, 2025

Maybe there is a way to skip adding the badge to the pdf. It doesn't make much sense anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants