Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to search for articles? #76

Open
SeanPedersen opened this issue Jan 31, 2021 · 9 comments
Open

How to search for articles? #76

SeanPedersen opened this issue Jan 31, 2021 · 9 comments
Labels
dif/expert Extensive knowledge (implications, ramifications) required effort/weeks Estimated to take multiple weeks help wanted P2 Medium: Good to have, but can wait until someone steps up

Comments

@SeanPedersen
Copy link

I am wondering how to use ipns://en.wikipedia-on-ipfs.org/wiki/ effectively? I see no option to search for an article. How am I supposed to find the content I am looking for?

@SeanPedersen SeanPedersen added the need/triage Needs initial labeling and prioritization label Jan 31, 2021
@mburns
Copy link

mburns commented Feb 11, 2021

Search isn't directly supported, but theoretically could be in the future.

One options is to search for site:en.wikipedia-on-ipfs.org SEARCH TERMS in your preferred search engine to discover new pages:

@ipfs ipfs deleted a comment from welcome bot Feb 18, 2021
@lidel lidel added dif/expert Extensive knowledge (implications, ramifications) required effort/days Estimated to take multiple days, but less than a week help wanted P2 Medium: Good to have, but can wait until someone steps up effort/weeks Estimated to take multiple weeks and removed need/triage Needs initial labeling and prioritization effort/days Estimated to take multiple days, but less than a week labels Feb 18, 2021
@lidel
Copy link
Member

lidel commented Feb 18, 2021

We have some prior art in #44
Code is 4 year old but could be a good starting point if someone has bandwidth to help with this.

@lidel
Copy link
Member

lidel commented Mar 10, 2021

In case somoneone wants to pick this up before I have spare bandwidth: simply re-use existing UI from mobile Wikipedia: https://en.m.wikipedia.org/, which already has subtle branding + search box:

2021-03-10--13-22-22

Hamburger menu could be replaced with our icon, and clicking on it would jump to the footer explaining the mirror project.

@RuiNtD
Copy link

RuiNtD commented May 28, 2021

Both Google and DDG have methods of adding a custom website search bar to your website:

https://cse.google.com/
https://duckduckgo.com/search_box

I tested both out and unfortunately, the results I'm getting with DDG are all 404 errors because it's putting .html at the end of URLs. If you want to try both out, here are some links:

https://cse.google.com/cse?cx=230751f5750677644
https://duckduckgo.com/search.html?site=en.wikipedia-on-ipfs.org&prefill=Search%20Wikipedia%20on%20IPFS

EDIT: It's also worth noting that both engines do have ads above actual results. It is possible to remove ads (and branding) on DDG with URL params, but it's against ToS unless used for personal use.

@ngbrown
Copy link

ngbrown commented May 29, 2021

There has recently been some work on hosting a full-text search engine in WebAssembly for very large data sets. This was directly influenced by IPFS's hosting of Wikipedia.

The key feature is to pull only the data needed from the static index to the client to execute the search. For example, doing a full text search on an index of size 14 GByte takes 2 seconds, and only needs to download only ~1.5MByte of the index.

Tantivy is a full-text search engine library inspired by Apache Lucene and written in Rust. It was initially designed to run on a server, but Rust can run in WebAssembly. There is a pull request quickwit-oss/tantivy#1067 that adapts Tantivy to running fully in WebAssembly on the client side.

See the pull request for a demo using the Wikipedia dataset.

@fulmicoton
Copy link

fulmicoton commented Jun 3, 2021

(tantivy creatomaintainer and quickwit CEO)
quickwit (https://github.com/quickwit-inc/quickwit) aims precisely at allowing client-side search on a distant high latency storage. We are in the process of opensourcing our code under the AGPL license. Once this is done. We'd be happy to help.

@unbeatable-101
Copy link

unbeatable-101 commented Jun 3, 2021

Speaking of which, what is the code of distributed-wikipedia-mirror licensed under? Because if it isn't GPLv3 too it won't be able to use quickwit

@johnsonjsyuen
Copy link

johnsonjsyuen commented Jun 5, 2021

This shows how it can be done with a static sqlite database that serves as the index. Sqlite supports full text search. Sqlite static hosted

@lidel lidel pinned this issue Nov 29, 2021
@DmitriyShepelev
Copy link

DmitriyShepelev commented Sep 11, 2022

In Brave Browser, you can create a keyboard shortcut for text that will prefix whatever you type after activating said keyboard shortcut, which can be used to search for IPFS Wikipedia pages. In Brave, go to "Settings > Search engine > Manage search engines and site search > Add", which will prompt you with a dialog box to add a search engine. For example, if you want to use Brave's search engine to search for IPFS Wikipedia pages, you can input https://search.brave.com/search?q=site%3Aen.wikipedia-on-ipfs.org %s for the URL with %s in place of query field (and whatever you want for the Search engine and Shortcut fields).

@ipfs ipfs deleted a comment from VIP0000fa Aug 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dif/expert Extensive knowledge (implications, ramifications) required effort/weeks Estimated to take multiple weeks help wanted P2 Medium: Good to have, but can wait until someone steps up
Projects
None yet
Development

No branches or pull requests

9 participants