Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please add site https://buntls.com #1646

Open
Tenome opened this issue Jan 23, 2025 · 1 comment
Open

Please add site https://buntls.com #1646

Tenome opened this issue Jan 23, 2025 · 1 comment

Comments

@Tenome
Copy link

Tenome commented Jan 23, 2025

Provide URL for web page that contains Table of Contents (list of chapters) of a typical story on the site

https://buntls.com/story/artist-who-paints-dungeon/

Did you try using the Default Parser for the site? If not, why not?

It looks like the website scrambles the text until you load it, so the text appears jumbled when you try to scrape it. This is also probably why the website is so painfully slow...

What settings did you use? What didn't work?

  • URL of first chapter
    https://buntls.com/chapter/chapter-1-%f0%9f%96%bc/
  • CSS selector for element holding content to put into EPUB
    #chapter-content
  • CSS selector for element holding Title of Chapter
    #paragraph-0
  • CSS selector for element(s) to remove
@dteviot
Copy link
Owner

dteviot commented Jan 26, 2025

@Tenome
You can use EpubEditor https://github.com/dteviot/EpubEditor with a script like this to decrypt the epub once you create it.

let decryptTable = new Map();
let crypt = "abcde fghij klmno pqrst uvwxyz ABCDE FGHIJ KLMNO PQRST UVWXYZ";
let clear = "tonqu erzla wicvf jpsyh gdmkbx JKABR UDQZC THFVL IWNEY PSXGOM";
for(let i = 0; i < crypt.length; ++i) {
    decryptTable.set(crypt[i], clear[i]);
}
let decryptChar = c => decryptTable.get(c) ?? c;
let decryptString = cypherText => cypherText.split("").map(c => decryptChar(c)).join("");
for(let e of dom.querySelectorAll("p")) {
    e.textContent = decryptString(e.textContent);
}
return true;

Please note, you'll probably need to figure out the correct values to put into the "clear" value, because it looks like site might be using a number of different character substitutions.
Basically steps are:

  1. Create an Epub.
  2. Open a chapter in the Epub
  3. Open same chapter on Web site.

Look at a string, and you can start to fill in the missing text.
e.g.
On epub

Mhxozldnh, Grnnhgcdro hmenrphht zlr uhghdwhb clh txmh rvvdgdxn orcdgh xt Glx Xux zhuh whup ehuenhshb.

On web site

Meanwhile, Collection employees who received the same official notice as Cha Ara were very perplexed.

So, looking at these we can see that
M -> M
h -> e
x -> a
G -> C
X -> A

etc.
So, you can look up the epub value in the "crypt" string and then change the matching value in the "clear" string.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants