Vector space model is an algebraic model for representing text documents as vectors and Cosine similarity is used to compute the similarity between documents and queries. The Result is filtered with alpha=0.005. It is used in information retrieval, indexing and relevancy rankings.
Files in this project include:
- ShortStories (50)
- Stop-words list
Sample Queries:
-
q= "love hate"
4 documents
43.txt - 0.03291
6.txt - 0.01589
1.txt - 0.00635
9.txt - 0.00531
-
q= "lodie"
1 document
24.txt - 0.12751
-
q= "travel water"
3 documents
21.txt - 0.02184
19.txt - 0.01530
11.txt - 0.01274
-
q= "king queen"
8 documents
31.txt - 0.25304
7.txt - 0.01565
34.txt - 0.01546
43.txt - 0.01163
49.txt - 0.00764
40.txt - 0.00744
25.txt - 0.00667
40.txt - 0.00614