Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non deterministic license identification for github.com/magiconair/properties #41

Closed
breml opened this issue Nov 19, 2020 · 5 comments · Fixed by #42
Closed

Non deterministic license identification for github.com/magiconair/properties #41

breml opened this issue Nov 19, 2020 · 5 comments · Fixed by #42

Comments

@breml
Copy link
Contributor

breml commented Nov 19, 2020

I use wwhrd now for +3 years and I am thankful for your work.

Lately, I started to observe some issues though:

The package github.com/magiconair/properties (v1.8.1) is a dependency of the widely used github.com/spf13/viper and github.com/spf13/cobra packages. Unfortunately the license for this package is not identified with the same value in every run. After ~20 runs, I got the following results:

  • 0BSD
  • BSD-1-Clause
  • BSD-2-Clause
  • BSD-3-Clause
  • BSD-4-Clause
  • BSD-Protection
$   for i in `seq 1 20`; do wwhrd list --no-color 2>&1 | grep github.com/magiconair/properties; done
time="2020-11-19T07:44:12+01:00" level=info msg="Found License" license=BSD-Protection package="github.com/magiconair/properties" 
time="2020-11-19T07:44:16+01:00" level=info msg="Found License" license=BSD-4-Clause package="github.com/magiconair/properties" 
time="2020-11-19T07:44:21+01:00" level=info msg="Found License" license=BSD-4-Clause package="github.com/magiconair/properties" 
time="2020-11-19T07:44:25+01:00" level=info msg="Found License" license=BSD-4-Clause package="github.com/magiconair/properties" 
time="2020-11-19T07:44:30+01:00" level=info msg="Found License" license=BSD-4-Clause package="github.com/magiconair/properties" 
time="2020-11-19T07:44:34+01:00" level=info msg="Found License" license=BSD-4-Clause package="github.com/magiconair/properties" 
time="2020-11-19T07:44:39+01:00" level=info msg="Found License" license=BSD-2-Clause package="github.com/magiconair/properties" 
time="2020-11-19T07:44:43+01:00" level=info msg="Found License" license=BSD-3-Clause package="github.com/magiconair/properties" 
time="2020-11-19T07:44:48+01:00" level=info msg="Found License" license=BSD-3-Clause package="github.com/magiconair/properties" 
time="2020-11-19T07:44:53+01:00" level=info msg="Found License" license=0BSD package="github.com/magiconair/properties" 
time="2020-11-19T07:44:58+01:00" level=info msg="Found License" license=BSD-4-Clause package="github.com/magiconair/properties" 
time="2020-11-19T07:45:03+01:00" level=info msg="Found License" license=BSD-3-Clause package="github.com/magiconair/properties" 
time="2020-11-19T07:45:07+01:00" level=info msg="Found License" license=BSD-4-Clause package="github.com/magiconair/properties" 
time="2020-11-19T07:45:12+01:00" level=info msg="Found License" license=BSD-1-Clause package="github.com/magiconair/properties" 
time="2020-11-19T07:45:17+01:00" level=info msg="Found License" license=BSD-2-Clause package="github.com/magiconair/properties" 
time="2020-11-19T07:45:22+01:00" level=info msg="Found License" license=BSD-2-Clause package="github.com/magiconair/properties" 
time="2020-11-19T07:45:27+01:00" level=info msg="Found License" license=0BSD package="github.com/magiconair/properties" 
time="2020-11-19T07:45:32+01:00" level=info msg="Found License" license=BSD-2-Clause package="github.com/magiconair/properties" 
time="2020-11-19T07:45:37+01:00" level=info msg="Found License" license=BSD-2-Clause package="github.com/magiconair/properties" 
time="2020-11-19T07:45:42+01:00" level=info msg="Found License" license=BSD-2-Clause package="github.com/magiconair/properties"

Based on my manual check of the license I would classify the license as BSD-2-Clause.

I see two problems:

  • For the usage of wwhrd in the CI pipeline, it is very unfortunate, if the detection is not deterministic.
  • It is a pretty big difference, between all of these licenses (especially the BSD-Protection), so I ask myself, how the detection can go that wrong.

This issue might be related to #40. Also I guess, it is not really an issue with wwhrd it self but more with the package used for the license detection (gopkg.in/src-d/go-license-detector.v3).

Update:

  • Add version information for the package in question (github.com/magiconair/properties v1.8.1)
@frapposelli
Copy link
Owner

Hi @breml, and thanks for filing this issue!

I was able to reproduce with the library included in v0.3.0, and it looks like it's a bug in the license evaluation logic in the library.

I've tested with an updated version of the library and the problem has been fixed:

❯❯❯ for i in (seq 1 20); wwhrd list --no-color 2>&1 | grep github.com/magiconair/properties; end                                                                                                                                  
time="2020-11-24T10:23:07+01:00" level=info msg="Found License" license=BSD-2-Clause package=github.com/magiconair/properties
time="2020-11-24T10:23:10+01:00" level=info msg="Found License" license=BSD-2-Clause package=github.com/magiconair/properties
time="2020-11-24T10:23:12+01:00" level=info msg="Found License" license=BSD-2-Clause package=github.com/magiconair/properties
time="2020-11-24T10:23:15+01:00" level=info msg="Found License" license=BSD-2-Clause package=github.com/magiconair/properties
time="2020-11-24T10:23:17+01:00" level=info msg="Found License" license=BSD-2-Clause package=github.com/magiconair/properties
time="2020-11-24T10:23:20+01:00" level=info msg="Found License" license=BSD-2-Clause package=github.com/magiconair/properties
time="2020-11-24T10:23:22+01:00" level=info msg="Found License" license=BSD-2-Clause package=github.com/magiconair/properties
time="2020-11-24T10:23:24+01:00" level=info msg="Found License" license=BSD-2-Clause package=github.com/magiconair/properties
time="2020-11-24T10:23:27+01:00" level=info msg="Found License" license=BSD-2-Clause package=github.com/magiconair/properties
time="2020-11-24T10:23:30+01:00" level=info msg="Found License" license=BSD-2-Clause package=github.com/magiconair/properties
time="2020-11-24T10:23:32+01:00" level=info msg="Found License" license=BSD-2-Clause package=github.com/magiconair/properties
time="2020-11-24T10:23:35+01:00" level=info msg="Found License" license=BSD-2-Clause package=github.com/magiconair/properties
time="2020-11-24T10:23:37+01:00" level=info msg="Found License" license=BSD-2-Clause package=github.com/magiconair/properties
time="2020-11-24T10:23:40+01:00" level=info msg="Found License" license=BSD-2-Clause package=github.com/magiconair/properties
time="2020-11-24T10:23:42+01:00" level=info msg="Found License" license=BSD-2-Clause package=github.com/magiconair/properties
time="2020-11-24T10:23:45+01:00" level=info msg="Found License" license=BSD-2-Clause package=github.com/magiconair/properties
time="2020-11-24T10:23:47+01:00" level=info msg="Found License" license=BSD-2-Clause package=github.com/magiconair/properties
time="2020-11-24T10:23:50+01:00" level=info msg="Found License" license=BSD-2-Clause package=github.com/magiconair/properties
time="2020-11-24T10:23:52+01:00" level=info msg="Found License" license=BSD-2-Clause package=github.com/magiconair/properties
time="2020-11-24T10:23:55+01:00" level=info msg="Found License" license=BSD-2-Clause package=github.com/magiconair/properties

I'll cut a patch version very soon.

@frapposelli
Copy link
Owner

v0.3.1 is now available and should fix this issue 👍🏻

@breml
Copy link
Contributor Author

breml commented Nov 25, 2020

@frapposelli just tested with version v0.3.1 and I can confirm, that my issue is resolved. Thank you very much.

@breml
Copy link
Contributor Author

breml commented Nov 25, 2020

@frapposelli I have bad news for you, now another license shows the same problem:

$ for i in (seq 1 20); wwhrd list --no-color 2>&1 | grep github.com/DATA-DOG/go-sqlmock; end
time="2020-11-25T11:46:28+01:00" level=info msg="Found License" license=BSD-3-Clause package=github.com/DATA-DOG/go-sqlmock
time="2020-11-25T11:46:32+01:00" level=info msg="Found License" license=BSD-1-Clause package=github.com/DATA-DOG/go-sqlmock
time="2020-11-25T11:46:36+01:00" level=info msg="Found License" license=0BSD package=github.com/DATA-DOG/go-sqlmock
time="2020-11-25T11:46:40+01:00" level=info msg="Found License" license=BSD-1-Clause package=github.com/DATA-DOG/go-sqlmock
time="2020-11-25T11:46:44+01:00" level=info msg="Found License" license=BSD-3-Clause package=github.com/DATA-DOG/go-sqlmock
time="2020-11-25T11:46:48+01:00" level=info msg="Found License" license=BSD-4-Clause package=github.com/DATA-DOG/go-sqlmock
time="2020-11-25T11:46:51+01:00" level=info msg="Found License" license=BSD-2-Clause package=github.com/DATA-DOG/go-sqlmock
time="2020-11-25T11:46:55+01:00" level=info msg="Found License" license=BSD-2-Clause package=github.com/DATA-DOG/go-sqlmock
time="2020-11-25T11:46:59+01:00" level=info msg="Found License" license=BSD-Protection package=github.com/DATA-DOG/go-sqlmock
time="2020-11-25T11:47:03+01:00" level=info msg="Found License" license=BSD-2-Clause package=github.com/DATA-DOG/go-sqlmock
time="2020-11-25T11:47:07+01:00" level=info msg="Found License" license=BSD-Protection package=github.com/DATA-DOG/go-sqlmock
time="2020-11-25T11:47:12+01:00" level=info msg="Found License" license=BSD-4-Clause package=github.com/DATA-DOG/go-sqlmock
time="2020-11-25T11:47:16+01:00" level=info msg="Found License" license=BSD-Protection package=github.com/DATA-DOG/go-sqlmock
time="2020-11-25T11:47:20+01:00" level=info msg="Found License" license=BSD-2-Clause package=github.com/DATA-DOG/go-sqlmock
time="2020-11-25T11:47:24+01:00" level=info msg="Found License" license=BSD-3-Clause package=github.com/DATA-DOG/go-sqlmock
time="2020-11-25T11:47:28+01:00" level=info msg="Found License" license=BSD-Protection package=github.com/DATA-DOG/go-sqlmock
time="2020-11-25T11:47:33+01:00" level=info msg="Found License" license=BSD-Protection package=github.com/DATA-DOG/go-sqlmock
time="2020-11-25T11:47:37+01:00" level=info msg="Found License" license=BSD-Protection package=github.com/DATA-DOG/go-sqlmock
time="2020-11-25T11:47:41+01:00" level=info msg="Found License" license=BSD-2-Clause package=github.com/DATA-DOG/go-sqlmock
time="2020-11-25T11:47:46+01:00" level=info msg="Found License" license=BSD-Protection package=github.com/DATA-DOG/go-sqlmock
$ wwhrd -v
version 0.3.1
commit b4cd831edb8a6779055c34a3dfa8bb151bb40e97
date 2020-11-24T09:58:52Z

Do you prefer a new issue for this or do you want to re-open this one?

This is very likely related also to #40 and #43.

@breml
Copy link
Contributor Author

breml commented Nov 25, 2020

I added a debug statement (fmt.Println(fpath, l[0].Matches)) to the wwhrd code and I see the following return from licensedb.Analyse(fpath):

vendor/github.com/DATA-DOG/go-sqlmock [{BSD-4-Clause 0.5} {BSD-Protection 0.5} {BSD-1-Clause 0.5} {BSD-3-Clause 0.5} {BSD-2-Clause 0.5} {0BSD 0.5} {BSD-3-Clause-Attribution 0.33333334} {BSD-2
-Clause-FreeBSD 0.33333334} {BSD-3-Clause-LBNL 0.33333334} {BSD-2-Clause-Patent 0.33333334} {BSD-2-Clause-NetBSD 0.33333334} {BSD-3-Clause-Clear 0.33333334} {BSD-Source-Code 0.33333334} {BSD-4-Clause-UC 0.33333334}]

So in order to get deterministic results, at least the Matches with the same confidence level should be sorted e.g. by license identifier, such that the result is always the same.

That being said, if comes as a surprise to me, that all the BSD license have the same confidence level and additionally, that this confidence is only 0.5.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants