Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(w02_t03) Nemenyi Test Unclear #15

Open
jakob-r opened this issue Nov 18, 2020 · 5 comments
Open

(w02_t03) Nemenyi Test Unclear #15

jakob-r opened this issue Nov 18, 2020 · 5 comments
Assignees

Comments

@jakob-r
Copy link
Collaborator

jakob-r commented Nov 18, 2020

Following things should be clear

  • The test statistic q follows which distribution (suggest: Studentized Range Distribution with parameter k = number of algorithms bud I haven't found anything on the degrees of freedom)
  • Does the Post-hoc Nemenyi test control the FWER? (I suppose yes)
  • How to derive the critical difference (suggest: q.alpha = qtukey(1 - 0.05, k, Inf) / sqrt(2L); cd.nemenyi = q.alpha * sqrt(k * (k + 1L) / (6L * n)))
  • I guess the test statistic q only consists of the absolute value of Rj1 - Rj2
@jakob-r jakob-r changed the title Nemenyi Test Unclear (w02_t03) Nemenyi Test Unclear Nov 18, 2020
@UgurKap
Copy link

UgurKap commented Apr 17, 2021

From Demšar, Janez. "Statistical comparisons of classifiers over multiple data sets." The Journal of Machine Learning Research 7 (2006): 1-30:

Critical values q_\alpha are based on the Studentized range statistic divided by √2.

resim

I think this should be made more clear in Post-Hoc Test II page, as critical difference is actually what we are comparing against. So, we find a mean rank for each algorithm, and then connect them in the graph if their mean is less than the critical difference. If two algorithms are not connected, their performance is different.

In the slides, it is stated that lower rank can be considered better, but I think "lower" rank is an ambiguous term as it is more intuitive to think rank 1 is better than rank 2. Maybe it should say rank closer to 1 is the better algorithm.

@UgurKap
Copy link

UgurKap commented Apr 17, 2021

I think the difference (or similarity?) between Nemenyi and Bonferroni-Dunn test should be explained in more detail.

Again from Demšar, Janez. "Statistical comparisons of classifiers over multiple data sets." The Journal of Machine Learning Research 7 (2006): 1-30:

The tests differ in the way they adjust the value of α to compensate for multiple comparisons. The Bonferroni-Dunn test (Dunn, 1961) controls the family-wise error rate by dividing α by the number of comparisons made (k−1, in our case). The alternative way to compute the same test is to calculate the CD using the same equation as for the Nemenyi test, but using the critical values for α/(k−1) (for convenience, they are given in Table 5(b)).

resim

@mlindauer
Copy link
Collaborator

Thanks for providing this valuable feedback.

@larskotthoff @berndbischl is this already addressed in the new slides of w02_t03? Or can/should we point to further material here?

@larskotthoff
Copy link
Contributor

This is not addressed -- can we do this for the next iteration? It doesn't sound like it's super urgent.

@mlindauer
Copy link
Collaborator

Sure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants