-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question: Use directly ZIM files? #42
Comments
This could be very useful, because (with appropriate client) I think local ZIM files (fetched from IPFS) can be searched, etc? |
cross-post from derhuerst/build-wikipedia-feed#2 (comment) :
|
@lidel Would be really interested to refresh that ticket. Let me know if you want to discuss this. Pretty motivated to help this project go forward. |
Potential path to using ZIM directlyPublishing ZIM on IPFSSomething we could start doing today, is publishing @kelson42 Is this something Kiwix would be interested in trying to do? Web-based readerThis one is a long shot, but opens exciting possibilities: if a ZIM reader in pure JS existed, a web browser would be the only software required to browse distributed mirror. The reader would be just a set of static HTML+CSS+JS files published on IPFS along with ZIM archives, making it a self-contained proposition. While it would be possible to read ZIM from HTTP Gateway via range requests, a more decentralized option would be to run embedded JS IPFS node on a page and request specific byte ranges via something like ipfs.cat('/ipfs/QmHash/wiki.zim', { offset: x, length: y }) Question: how feasible this is from the perspective of existing ZIM readers? Concerns / UnknownsI am new to this endeavor, but from quick eyeballing it looks like ZIM is a flat file optimized for random seeks within. At some level it is similar to files put on IPFS: they get chunked and produced blocks are assembled into balanced trees (Mergle-DAGs) optimized for random access. Not sure how performance would look like if we put ZIM on IPFS and try to fetch it over the network, but we can experiment with this. Data deduplication (dedicated chunker for ZIM?)Potential problem with using ZIM directly is deduplication. When we put unpacked mirror on IPFS, a lot of data does not change between snapshots. All media assets such as images, audio files etc get deduplicated across entire IPFS swarm (all snapshots, all websites using the same image are cohosting it). iiuc ZIM does internal compression of Clusters (>1MB) of data, which means each ZIM file is a different stream of bytes, defeating deduplication provided by IPFS. My understanding is that good deduplication is not possible, unless ZIM Cluster compression is deterministic across snapshots (always compresses same assets, and compressing same assets produces exactly the same array of bytes) AND/OR we add ZIM to IPFS using custom chunker, that is aware of its internal structure, enabling deduplication of the same content across snapshots. This could also be neat demo of what is possible with https://ipld.io Update: created #71 to benchmark the level of deduplication we can get with regular Please let me know if I missed something here. |
@lidel Distributing ZIM files via IPFS would be interesting and I would volunteer to make it if the process is not too complex. We have two ZIM readers in Javascript:
But so far Kiwix-JS is not able to read a ZIM file online, see kiwix/kiwix-js#356 But what I had in mind originally was to provide a server side service (so not just HTML files) able to read the ZIM files on demand and provide the content via IPFS. This would simplify the publication process by avoiding the data extraction process from the ZIM files. Not sure this is technically possible. |
Another possibility would be to compile https://github.com/dignifiedquire/zim to webassembly to use in a browser |
@eminence This is also basically possible with the libzim/libkiwix as well... but looks like the result is not able to handle files over 4GB :( |
Hi friends, I've published a draft of a devgrant for adding IPFS support to kiwix-js: ipfs/devgrants#49 It tries to define steps to have kiwix-js reading Wikipedia .zim archives from IPFS. Right now we are looking for people with bandwidth and interest in creating PoC to test feasibility of that approach. Feel free to comment on ipfs/devgrants#49 |
(quick update on the state of things for drive-by reader) The current process of unpacking ZIM and tweaking HTML on per-case basis is a very, very wasteful, impossible to automate across different languages and not sustainable. Every time something breaks, and we need to sink a lot of time to fix the build scripts – if we allocated that time into web-based ZIM reader we would already be there. I believe our effort should go into putting ZIM on IPFS and then reading them from IPFS via web browser (as I elaborated in the original idea draft + we read in the latest research tracked in past in kiwix/kiwix-js#595 and continued now in kiwix/kiwix-js#659). |
tl;dr we need web-based reader for ZIM archives: NOTE: the link above is old devgrant, we now can do it better with
|
A lot changed since we've looked into this. Many new oportunities and protocols exist now, that did not before. |
Would that be possible without having to unpack it in millions of small single files?
Would a soft like kiwix-serve (or any other ZIM reader) be able to serve content through IPFS?
More information about kiwix-serve:
http://wiki.kiwix.org/wiki/Kiwix-serve
The text was updated successfully, but these errors were encountered: