You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
|`bench`:`taskset`| None || Value for `-c` argument of `taskset` utility used over benchmark subcommand. |
75
+
|`bench`:`vtune_profiling`| None || Analysis type for `collect` argument of Intel(R) VTune* Profiler tool. Linux* OS only. |
76
+
|`bench`:`vtune_results_directory`|`_vtune_results`|| Directory path to store Intel(R) VTune* Profiler results. |
77
+
|`bench`:`n_runs`|`10`|| Number of runs for measured entity. |
78
+
|`bench`:`time_limit`|`3600`|| Time limit in seconds before the benchmark early stop. |
79
+
|`bench`:`memory_profile`| False || Profiles memory usage of benchmark process. |
80
+
|`bench`:`flush_cache`| False || Flushes cache before every time measurement if enabled. |
81
+
|`bench`:`cpu_profile`| False || Profiles average CPU load during benchmark run. |
82
+
|`bench`:`distributor`| None | None, `mpi`| Library used to handle distributed algorithm. |
83
+
|`bench`:`mpi_params`| Empty dict || Parameters for `mpirun` command of MPI library. |
84
+
|<h3>Data parameters</h3>||||
85
+
|`data`:`cache_directory`|`data_cache`|| Directory path to store cached datasets for fast loading. |
86
+
|`data`:`raw_cache_directory`|`data`:`cache_directory` + "raw" || Directory path to store downloaded raw datasets. |
87
+
|`data`:`dataset`| None || Name of dataset to use from implemented dataset loaders. |
88
+
|`data`:`source`| None |`fetch_openml`, `make_regression`, `make_classification`, `make_blobs`| Data source to use for loading or synthetic generation. |
89
+
|`data`:`id`| None || OpenML data id for `fetch_openml` source. |
90
+
|`data`:`preprocessing_kwargs`:`replace_nan`|`median`|`median`, `mean`| Value to replace NaNs in preprocessed data. |
91
+
|`data`:`preprocessing_kwargs`:`category_encoding`|`ordinal`|`ordinal`, `onehot`, `drop`, `ignore`| How to encode categorical features in preprocessed data. |
92
+
|`data`:`preprocessing_kwargs`:`normalize`| False || Enables normalization of preprocessed data. |
93
+
|`data`:`preprocessing_kwargs`:`force_for_sparse`| True || Forces preprocessing for sparse data formats. |
94
+
|`data`:`split_kwargs`| Empty `dict` or default split from dataset description || Data split parameters for `train_test_split` function. |
95
+
|`data`:`format`|`pandas`|`pandas`, `numpy`, `cudf`| Data format to use in benchmark. |
96
+
|`data`:`order`|`F`|`C`, `F`| Data order to use in benchmark: contiguous(C) or Fortran. |
97
+
|`data`:`dtype`|`float64`|| Data type to use in benchmark. |
98
+
|`data`:`distributed_split`| None | None, `rank_based`| Split type used to distribute data between machines in distributed algorithm. `None` type means usage of all data without split on all machines. `rank_based` type splits the data equally between machines with split sequence based on rank id from MPI. |
|`data`:`dataset`| all |`all_named`| Sets datasets to use as list of all named datasets available in loaders. |
137
+
|`data`:`generation_kwargs`:`n_informative`| all |*float* value in [0, 1] range | Sets datasets to use as list of all named datasets available in loaders. |
138
+
|`bench`:`taskset`| all | Specification of numa nodes in `numa:{numa_node_0}[\|{numa_node_1}...]` format | Sets CPUs affinity using `taskset` utility. |
139
+
|`algorithm`:`estimator_params`:`n_jobs`| sklearn_estimator |`physical_cpus`, `logical_cpus`, or ratio of previous ones in format `{type}_cpus:{ratio}` where `ratio` is float | Sets `n_jobs` parameter to a number of physical/logical CPUs or ratio of them for an estimator. |
140
+
|`algorithm`:`estimator_params`:`scale_pos_weight`| sklearn_estimator |`auto`| Sets `scale_pos_weight` parameter to `sum(negative instances) / sum(positive instances)` value for estimator. |
141
+
|`algorithm`:`estimator_params`:`n_clusters`| sklearn_estimator |`auto`| Sets `n_clusters` parameter to number of clusters or classes from dataset description for estimator. |
142
+
|`algorithm`:`estimator_params`:`eps`| sklearn_estimator |`distances_quantile:{quantile}` format where quantile is *float* value in [0, 1] range | Computes `eps` parameter as quantile value of distances in `x_train` matrix for estimator. |
143
+
144
+
## Range of Values
145
+
146
+
You can define some parameters as a range of values with the `[RANGE]` prefix in string value:
|`bench`:`taskset`| None || Value for `-c` argument of `taskset` utility used over benchmark subcommand. |
87
-
|`bench`:`vtune_profiling`| None || Analysis type for `collect` argument of Intel(R) VTune* Profiler tool. Linux* OS only. |
88
-
|`bench`:`vtune_results_directory`|`_vtune_results`|| Directory path to store Intel(R) VTune* Profiler results. |
89
-
|`bench`:`n_runs`|`10`|| Number of runs for measured entity. |
90
-
|`bench`:`time_limit`|`3600`|| Time limit in seconds before the benchmark early stop. |
91
-
|`bench`:`distributor`| None | None, `mpi`| Library used to handle distributed algorithm. |
92
-
|`bench`:`mpi_params`| Empty dict || Parameters for `mpirun` command of MPI library. |
93
-
|<h3>Data parameters</h3>||||
94
-
|`data`:`cache_directory`|`data_cache`|| Directory path to store cached datasets for fast loading. |
95
-
|`data`:`raw_cache_directory`|`data`:`cache_directory` + "raw" || Directory path to store downloaded raw datasets. |
96
-
|`data`:`dataset`| None || Name of dataset to use from implemented dataset loaders. |
97
-
|`data`:`source`| None |`fetch_openml`, `make_regression`, `make_classification`, `make_blobs`| Data source to use for loading or synthetic generation. |
98
-
|`data`:`id`| None || OpenML data id for `fetch_openml` source. |
99
-
|`data`:`preprocessing_kwargs`:`replace_nan`|`median`|`median`, `mean`| Value to replace NaNs in preprocessed data. |
100
-
|`data`:`preprocessing_kwargs`:`category_encoding`|`ordinal`|`ordinal`, `onehot`, `drop`, `ignore`| How to encode categorical features in preprocessed data. |
101
-
|`data`:`preprocessing_kwargs`:`normalize`| False || Enables normalization of preprocessed data. |
102
-
|`data`:`preprocessing_kwargs`:`force_for_sparse`| True || Forces preprocessing for sparse data formats. |
103
-
|`data`:`split_kwargs`| Empty `dict` or default split from dataset description || Data split parameters for `train_test_split` function. |
104
-
|`data`:`format`|`pandas`|`pandas`, `numpy`, `cudf`| Data format to use in benchmark. |
105
-
|`data`:`order`|`F`|`C`, `F`| Data order to use in benchmark: contiguous(C) or Fortran. |
106
-
|`data`:`dtype`|`float64`|| Data type to use in benchmark. |
107
-
|`data`:`distributed_split`| None | None, `rank_based`| Split type used to distribute data between machines in distributed algorithm. `None` type means usage of all data without split on all machines. `rank_based` type splits the data equally between machines with split sequence based on rank id from MPI. |
|`data`:`dataset`| all |`all_named`| Sets datasets to use as list of all named datasets available in loaders. |
146
-
|`data`:`generation_kwargs`:`n_informative`| all |*float* value in [0, 1] range | Sets datasets to use as list of all named datasets available in loaders. |
147
-
|`bench`:`taskset`| all | Specification of numa nodes in `numa:{numa_node_0}[\|{numa_node_1}...]` format | Sets CPUs affinity using `taskset` utility. |
148
-
|`algorithm`:`estimator_params`:`n_jobs`| sklearn_estimator |`physical_cpus`, `logical_cpus`, or ratio of previous ones in format `{type}_cpus:{ratio}` where `ratio` is float | Sets `n_jobs` parameter to a number of physical/logical CPUs or ratio of them for an estimator. |
149
-
|`algorithm`:`estimator_params`:`scale_pos_weight`| sklearn_estimator |`auto`| Sets `scale_pos_weight` parameter to `sum(negative instances) / sum(positive instances)` value for estimator. |
150
-
|`algorithm`:`estimator_params`:`n_clusters`| sklearn_estimator |`auto`| Sets `n_clusters` parameter to number of clusters or classes from dataset description for estimator. |
151
-
|`algorithm`:`estimator_params`:`eps`| sklearn_estimator |`distances_quantile:{quantile}` format where quantile is *float* value in [0, 1] range | Computes `eps` parameter as quantile value of distances in `x_train` matrix for estimator. |
152
-
153
-
## Range of Values
154
-
155
-
You can define some parameters as a range of values with the `[RANGE]` prefix in string value:
Refer to [`Benchmarking Config Specification`](BENCH-CONFIG-SPEC.md) for the details how to read and write benchmarking configs in `scikit-learn_bench`.
0 commit comments