You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Update README.md
Update to include configuration description, matching the description of arguments in the GOSDTClassifier object defined in src/gosdt/_classifier.py
* Update README.md
(include hyperparameters from fit, crucially including the cost matrix and the reference for lower bound guessing)
* Update README.md
Copy file name to clipboardexpand all lines: README.md
+77-1
Original file line number
Diff line number
Diff line change
@@ -22,6 +22,7 @@ This work builds on a number of innovations for scalable construction of optimal
22
22
## Table of Contents
23
23
24
24
-[Installation](#installation)
25
+
-[Configuration](#configuration)
25
26
-[Example](#example)
26
27
-[Frequently Asked Questions](#faq)
27
28
-[How to build the project](#how-to-build-the-project)
@@ -39,7 +40,81 @@ pip3 install gosdt
39
40
```
40
41
41
42
Note: Our x86_64 wheels all use modern ISA extensions such AVX to perform fast bitmap operations.
42
-
If you're running on an older system where that's not possible, we recommend that you build from source following the [instructions bellow](#how-to-build-the-project).
43
+
If you're running on an older system where that's not possible, we recommend that you build from source following the [instructions below](#how-to-build-the-project).
44
+
45
+
## Configuration
46
+
47
+
When initializing a `GOSDTClassifier` object, the following hyperparameters can be specified:
48
+
49
+
regularization : float, default=0.05
50
+
The regularization penalty incurred for each leaf in the model. We
51
+
highly recommend setting the regularization to a value larger than
52
+
1 / (# of samples). A small regularization will lead to a longer
53
+
training time. If a smaller regularization (than 1 / (# of samples)) is
54
+
preferredm you mus set the parameter `allow_small_reg` to True, which
55
+
by default is False.
56
+
57
+
allow_small_reg : bool, default=False
58
+
Boolean flag for allowing a regularization that's less than 1 / (# of samples).
59
+
If False the effective regularization is bounded below by 1 / (# of samples).
60
+
61
+
depth_budget : int | None, default=None
62
+
Sets the maximum tree depth for a solution model, counting a tree with just
63
+
the root node as a tree of depth 0
64
+
65
+
time_limit: int | None, default=None
66
+
A time limit (in seconds) upon which the algorithm will terminate. If
67
+
the time limit is reached without a solution being found, the algorithm will terminate with an error.
68
+
69
+
balance: bool, default=False
70
+
A boolean flag enabling overriding the sample importance by equalizing the importance of each present class.
71
+
72
+
cancellation: bool, default=True
73
+
A boolean flag enabling the propagation of task cancellations up the dependency graph.
74
+
75
+
look_ahead: bool, default=True
76
+
A boolean flag enabling the one-step look-ahead bound implemented via scopes.
77
+
78
+
similar_support: bool, default=True
79
+
A boolean flag enabling the similar support bound implemented via a distance index.
80
+
81
+
rule_list: bool, default=False
82
+
A boolean flag enabling rule-list constraints on models.
83
+
84
+
non_binary: bool, default=False
85
+
A boolean flag enabling non-binary model trees.
86
+
#todo(Ilias: Our tree parser does not currently handle this flag)
87
+
88
+
diagnostics: bool, default=False
89
+
A boolean flag enabling printing of diagnostic traces when an error is encountered.
90
+
This is intended for debugging the C++ logic and is not intended for end-user use.
91
+
92
+
model_limit: int, default=1
93
+
The maximum number of optimal models to extract, in the case of multiple optima.
94
+
95
+
debug: bool, default=False
96
+
A boolean flag that enables saving the state of the optimization, so that it can be
97
+
inspected or ran again in the future. This is intended for debugging the C++ logic and
98
+
is not intended for end-user use.
99
+
100
+
When calling `fit`, the following arguments are available:
101
+
102
+
X : array-like of shape (n_samples, n_features)
103
+
The training input samples. Boolean values are expected.
104
+
105
+
y : array-like of shape (n_samples,)
106
+
The target values. The target values can be binary or multiclass.
107
+
108
+
input_features : array-like of shape (n_features,) | None, default=None
109
+
The feature names for the input data. If None, the feature names will be set to ["x0", "x1", ...].
110
+
111
+
y_ref : array-like of shape (n_samples,) | None, default=None
112
+
Theese represent the predictions made by some blackbox model, that will be used to guide optimization.
113
+
The reference labels can be binary or multiclass, but must have the same classes and shape as y.
114
+
115
+
cost_matrix : array-like of shape (n_classes, n_classes) | None, default=None
116
+
The cost matrix for the optimization. If None, a cost matrix will be created based on
117
+
the number of classes and whether a balanced cost matrix is requested.
0 commit comments