Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Show top concepts for canonical docs #6

Open
kcarnold opened this issue Oct 5, 2010 · 1 comment
Open

Show top concepts for canonical docs #6

kcarnold opened this issue Oct 5, 2010 · 1 comment

Comments

@kcarnold
Copy link
Member

kcarnold commented Oct 5, 2010

When a canonical document is selected, the info pane should show a list of commonly-occuring concepts in the documents that are most similar to it.

These should probably be pre-SVD counts, not post-SVD similarity scores. For example, counting the number of times the words "chinese" and "thai" occur, weighted by which documents they are in, but not weighted by anything involving the "chinese" and "thai" concept vectors themselves. This would reassure users that the data reflects reality, even if the SVD comes out kind of weird.

Probably the best way to report these values would be as percentages.

[This bug transferred from Launchpad]

@kcarnold
Copy link
Member Author

kcarnold commented Oct 5, 2010

[from sgt101 on Launchpad]
Related to this it would be very useful to put the info on canonical documents into some structured file like .xls or .csv separate from the general concepts file that is in results now.

I would like to see the following records produced :

filename.txt,polarity_value, concept 1, concept 2, concept 3

The concepts should not be word occurences but generalised terms from the analogy space...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant