Skip to content

Commit e58fac9

Browse files
authored
Add configuration to ReadMe (#22)
* Update README.md Update to include configuration description, matching the description of arguments in the GOSDTClassifier object defined in src/gosdt/_classifier.py * Update README.md (include hyperparameters from fit, crucially including the cost matrix and the reference for lower bound guessing) * Update README.md
1 parent ab482cc commit e58fac9

File tree

1 file changed

+77
-1
lines changed

1 file changed

+77
-1
lines changed

README.md

+77-1
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@ This work builds on a number of innovations for scalable construction of optimal
2222
## Table of Contents
2323

2424
- [Installation](#installation)
25+
- [Configuration](#configuration)
2526
- [Example](#example)
2627
- [Frequently Asked Questions](#faq)
2728
- [How to build the project](#how-to-build-the-project)
@@ -39,7 +40,81 @@ pip3 install gosdt
3940
```
4041

4142
Note: Our x86_64 wheels all use modern ISA extensions such AVX to perform fast bitmap operations.
42-
If you're running on an older system where that's not possible, we recommend that you build from source following the [instructions bellow](#how-to-build-the-project).
43+
If you're running on an older system where that's not possible, we recommend that you build from source following the [instructions below](#how-to-build-the-project).
44+
45+
## Configuration
46+
47+
When initializing a `GOSDTClassifier` object, the following hyperparameters can be specified:
48+
49+
regularization : float, default=0.05
50+
The regularization penalty incurred for each leaf in the model. We
51+
highly recommend setting the regularization to a value larger than
52+
1 / (# of samples). A small regularization will lead to a longer
53+
training time. If a smaller regularization (than 1 / (# of samples)) is
54+
preferredm you mus set the parameter `allow_small_reg` to True, which
55+
by default is False.
56+
57+
allow_small_reg : bool, default=False
58+
Boolean flag for allowing a regularization that's less than 1 / (# of samples).
59+
If False the effective regularization is bounded below by 1 / (# of samples).
60+
61+
depth_budget : int | None, default=None
62+
Sets the maximum tree depth for a solution model, counting a tree with just
63+
the root node as a tree of depth 0
64+
65+
time_limit: int | None, default=None
66+
A time limit (in seconds) upon which the algorithm will terminate. If
67+
the time limit is reached without a solution being found, the algorithm will terminate with an error.
68+
69+
balance: bool, default=False
70+
A boolean flag enabling overriding the sample importance by equalizing the importance of each present class.
71+
72+
cancellation: bool, default=True
73+
A boolean flag enabling the propagation of task cancellations up the dependency graph.
74+
75+
look_ahead: bool, default=True
76+
A boolean flag enabling the one-step look-ahead bound implemented via scopes.
77+
78+
similar_support: bool, default=True
79+
A boolean flag enabling the similar support bound implemented via a distance index.
80+
81+
rule_list: bool, default=False
82+
A boolean flag enabling rule-list constraints on models.
83+
84+
non_binary: bool, default=False
85+
A boolean flag enabling non-binary model trees.
86+
#todo(Ilias: Our tree parser does not currently handle this flag)
87+
88+
diagnostics: bool, default=False
89+
A boolean flag enabling printing of diagnostic traces when an error is encountered.
90+
This is intended for debugging the C++ logic and is not intended for end-user use.
91+
92+
model_limit: int, default=1
93+
The maximum number of optimal models to extract, in the case of multiple optima.
94+
95+
debug: bool, default=False
96+
A boolean flag that enables saving the state of the optimization, so that it can be
97+
inspected or ran again in the future. This is intended for debugging the C++ logic and
98+
is not intended for end-user use.
99+
100+
When calling `fit`, the following arguments are available:
101+
102+
X : array-like of shape (n_samples, n_features)
103+
The training input samples. Boolean values are expected.
104+
105+
y : array-like of shape (n_samples,)
106+
The target values. The target values can be binary or multiclass.
107+
108+
input_features : array-like of shape (n_features,) | None, default=None
109+
The feature names for the input data. If None, the feature names will be set to ["x0", "x1", ...].
110+
111+
y_ref : array-like of shape (n_samples,) | None, default=None
112+
Theese represent the predictions made by some blackbox model, that will be used to guide optimization.
113+
The reference labels can be binary or multiclass, but must have the same classes and shape as y.
114+
115+
cost_matrix : array-like of shape (n_classes, n_classes) | None, default=None
116+
The cost matrix for the optimization. If None, a cost matrix will be created based on
117+
the number of classes and whether a balanced cost matrix is requested.
43118

44119
## Example
45120

@@ -98,6 +173,7 @@ print(f"Training accuracy: {clf.score(X_train_guessed, y_train)}")
98173
print(f"Test accuracy: {clf.score(X_test_guessed, y_test)}")
99174
```
100175

176+
101177
## FAQ
102178

103179
- **Does GOSDT (implicitly) restrict the depth of the resulting tree?**

0 commit comments

Comments
 (0)