Min-Hash Signatures Generator

The understanding of how much an object is similar to another object is a common task in our daily lives. The amount of knowledge we can gain from realizing these similarities between different objects can give us great insights on the problems we are dealing with.

The min-Hash algorithm was introduced as an efficient algorithm in time and space manners for calculating similarity between sets. Calculating resemblance and containment of documents using min-Hash was introduced by Broder [1] in 1997, and [2] in 2000, Broder focused on specific part of it, which is the min-wise permutations, that was essential to the algorithm of AltaVista web index for finding similar web pages from a huge collection of web pages. An index as AltaVista doesn't need unnecessary duplications of such amount of documents, and an efficient algorithm was required to find duplicates or near-duplicates documents.

Getting Started

Executing the example:

python main.py

Authors

Omri Lahav

E-mail: [email protected]
Linked-in: https://www.linkedin.com/in/omri-lahav-a89b1957

License

This software can be used free of charge. Please cite and reference.

References

[1] A. Z. Broder, “On the Resemblance and Containment of Documents,” Proc. Compression Complex. Seq. 1997, pp. 21–29, 1997.
[2] A. Z. Broder, “Min-wise independent permutations: Theory and practice,” Autom. Lang. Program., vol. 1853, p. 808, 2000.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
data_examples		data_examples
resources		resources
src		src
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Min-Hash Signatures Generator

Getting Started

Authors

License

References

About

Releases 4

Packages

Languages

omrilahav/MinHash

Folders and files

Latest commit

History

Repository files navigation

Min-Hash Signatures Generator

Getting Started

Authors

License

References

About

Resources

Stars

Watchers

Forks

Releases 4

Packages 0

Languages

Packages