Thanks for making this useful library! π
I'm wondering where I can find the docs, so that I can see how to use it for each data source.
Btw, it lists web scraping but what about reading .htm/.html files or images from disk?
And what about files that are very similar to htm files, like chm and epub?
(In my use case I need to ingest from disk a lot of .htm files, as well as images & PDF files that contain schematics and tables in embedded images, and chm files, to convert them into vector embeddings (convert the images to alt text or to markdown table if it contains a table).)
I'm also curious, what about reading DJVU files (which are similar to scanned PDF files)?
Thanks π