-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
0BSD detected over ISC #4
Comments
Hey @frapposelli - thank you for the reporting this case. I believe, current approach to pre-processing and hash-based similarity detection is already known to have number of cases where the detection fails to identify the correct license. Current approach taken by this tool is to focus on the constant time predictions specifically fit for batch workloads of large-scale repository mining (based on approximation of Jaccard similarity for Bag-of-word representation of the license documents). The way to debug it would be to compare weighted bag of words of for these two licenses and see if they differ (bug in preprocessing) and then check the similarity scores by running it with "LICENSE_DEBUG=1". |
Hello, I have the same issue with the https://github.com/davecgh/go-spew project as @frapposelli described in his link above. I urgently need the right license information detected for a customer. Approximately when will this issue be resolved? |
@wami4262 you are free to submit PR to fix this issue |
👋🏻 |
Hi! 👋🏻
I'm using this library as part of
wwhrd
, which is used to detect licenses in go-based projects.One of the users of
wwhrd
found an interesting issue (frapposelli/wwhrd#40) where even when presented with a verbatimISC
license, the library detects a0BSD
license with 95% probability.I was previously using the v3 version of the library, which presented a 93% probability of being
0BSD
and 84% of beingISC
, which is still wrong but slightly more accurate.Although the
0BSD
one is shorter, the licenses are very similar, missing a critical sentence in the first part.Happy to help with the debug process 👐🏻
The text was updated successfully, but these errors were encountered: