Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0BSD detected over ISC #4

Closed
frapposelli opened this issue Nov 24, 2020 · 4 comments
Closed

0BSD detected over ISC #4

frapposelli opened this issue Nov 24, 2020 · 4 comments

Comments

@frapposelli
Copy link

Hi! 👋🏻

I'm using this library as part of wwhrd, which is used to detect licenses in go-based projects.

One of the users of wwhrd found an interesting issue (frapposelli/wwhrd#40) where even when presented with a verbatim ISC license, the library detects a 0BSD license with 95% probability.

I was previously using the v3 version of the library, which presented a 93% probability of being 0BSD and 84% of being ISC, which is still wrong but slightly more accurate.

Although the 0BSD one is shorter, the licenses are very similar, missing a critical sentence in the first part.

Happy to help with the debug process 👐🏻

@bzz
Copy link
Member

bzz commented Dec 15, 2020

Hey @frapposelli - thank you for the reporting this case.

I believe, current approach to pre-processing and hash-based similarity detection is already known to have number of cases where the detection fails to identify the correct license.

Current approach taken by this tool is to focus on the constant time predictions specifically fit for batch workloads of large-scale repository mining (based on approximation of Jaccard similarity for Bag-of-word representation of the license documents).

The way to debug it would be to compare weighted bag of words of for these two licenses and see if they differ (bug in preprocessing) and then check the similarity scores by running it with "LICENSE_DEBUG=1".

@wami4262
Copy link

wami4262 commented Mar 18, 2021

Hello, I have the same issue with the https://github.com/davecgh/go-spew project as @frapposelli described in his link above. I urgently need the right license information detected for a customer. Approximately when will this issue be resolved?

@lafriks
Copy link
Contributor

lafriks commented Mar 19, 2021

@wami4262 you are free to submit PR to fix this issue

@frapposelli
Copy link
Author

👋🏻 wwhrd moved to a different library in the latest version, closing this as it seems it's a known issue with the approach this library uses.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants