Skip to content

Commit 4320e3f

Browse files
andrewdalpinoMacGruber91exzachlyvv
authored
0.4.0 (#163)
* RBX file format (#158) * Initial commit * Fix coding style * Shorten revision number and header checksum * Filesystem test uses RBX format * Recurse into object tree * Check for header/body class match * Fix coding style * Variable compression levels * Remove length header * Outline of specification * Convert checksums to HMACs * Spiff up a little bit * Tidy up * Use Native base serializer under the hood * Better error messages * Base serializer now injectable * Introduce encrypted variant * Appease Stan * Add beef to RBXP payload hash * Rename RBX portable to RBX standard * Use password digests by default * Appease Stan * More appeasement * Unrestricted digest length * Benchmark serializers * Change default base serializers * Switch payload HMAC to sha256 * Fix hmac * Fix mkdocs nav * RBX use checksums instead of HMACs * No default password * Tidy up * Remove PHP 8.0 from CI due to 3rd party incompatibility * Move RBXE to Extras package * Appease Stan * Dynamic column width in console output (#149) * Added dynamic column size in console Refactoring Console.php * Cast columnSize to int * Replace array_reduce with foreach * Mark tests as skipped * GitHub CLA * GitHub CLA check * GitHub CLA check with other username * Embed library version in RBX format * Appease Stan * Added custom class revision mismatch exception * Add RBX stuff to the user guide * Deprecate Igbinary serializer * New Transformer: Boolean Converter (#159) * add a boolean converter which converts true to 1 and false to 0. * updating the BooleanConverter too accept a customizable true/false value. Also updated docs to include the BooleanConverter * fix up the PHPdoc. Failed static analysis. * working through static analysis failures. * using single quotes * improvements per Andrew's comments on the PR * Add windows latest to CI build environments * Add fileinfo to required CI extensions * add a boolean converter which converts true to 1 and false to 0. * updating the BooleanConverter too accept a customizable true/false value. Also updated docs to include the BooleanConverter * fix up the PHPdoc. Failed static analysis. * working through static analysis failures. * using single quotes * improvements per Andrew's comments on the PR Co-authored-by: Andrew DalPino <[email protected]> * Appease Stan * Remove debug.log that should have been ignored by Git * Appease Stan * Update changelog * Tighten up RBX format * Initial commit (#162) * Deprecate explainedVar() and noiseVar() methods on PCA and LDA * Add missing extension specification and exception * Rename Autotrack Revisions * No need to sort singular values * Implement transformer conduits * Add return transformers method to Conduit * Clean up * Revert conduits * Single-threaded by default * Bump Tensor version requirement * Polish up for release Co-authored-by: Vladimir Stepanov <[email protected]> Co-authored-by: Zachary Vander Velden <[email protected]>
1 parent 61de61f commit 4320e3f

File tree

128 files changed

+1604
-362
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

128 files changed

+1604
-362
lines changed

.php_cs.dist

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -122,4 +122,4 @@ return Config::create()->setRules([
122122
'trim_array_spaces' => true,
123123
'unary_operator_spaces' => true,
124124
'whitespace_after_comma_in_array' => true,
125-
])->setFinder($finder);
125+
])->setFinder($finder);

CHANGELOG.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,13 @@
1+
- 0.4.0
2+
- Added Truncated SVD transformer
3+
- Added Rubix Object File (RBX) format serializer
4+
- Added class revision() method to the Persistable interface
5+
- Added custom class revision mismatch exception
6+
- Add Boolean Converter transformer
7+
- Deprecated Igbinary serializer and move to Extras package
8+
- Deprecate explainedVar() and noiseVar() methods on PCA and LDA
9+
- Added missing extension specification and exception
10+
111
- 0.3.2
212
- Fix t-SNE momentum gain bus error when using Tensor extension
313
- Optimize t-SNE matrix instantiation

README.md

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -24,11 +24,10 @@ $ composer require rubix/ml
2424
#### Optional
2525

2626
- [Extras Package](https://github.com/RubixML/Extras) for experimental features
27-
- [SVM extension](https://php.net/manual/en/book.svm.php) for Support Vector Machine engine (libsvm)
28-
- [Mbstring extension](https://www.php.net/manual/en/book.mbstring.php) for fast multibyte string manipulation
2927
- [GD extension](https://php.net/manual/en/book.image.php) for image manipulation
28+
- [Mbstring extension](https://www.php.net/manual/en/book.mbstring.php) for fast multibyte string manipulation
29+
- [SVM extension](https://php.net/manual/en/book.svm.php) for Support Vector Machine engine (libsvm)
3030
- [Redis extension](https://github.com/phpredis/phpredis) for persisting to a Redis DB
31-
- [Igbinary extension](https://github.com/igbinary/igbinary) for binary serialization of persistables
3231

3332
## Documentation
3433
Read the latest docs [here](https://docs.rubixml.com).
Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
<?php
2+
3+
namespace Rubix\ML\Benchmarks\Persisters\Serializers;
4+
5+
use Rubix\ML\Datasets\Generators\Blob;
6+
use Rubix\ML\Classifiers\KNearestNeighbors;
7+
use Rubix\ML\Datasets\Generators\Agglomerate;
8+
use Rubix\ML\Persisters\Serializers\Gzip;
9+
10+
/**
11+
* @Groups({"Serializers"})
12+
* @BeforeMethods({"setUp"})
13+
*/
14+
class GzipBench
15+
{
16+
protected const TRAINING_SIZE = 2500;
17+
18+
/**
19+
* @var \Rubix\ML\Persisters\Serializers\Gzip
20+
*/
21+
protected $serializer;
22+
23+
/**
24+
* @var \Rubix\ML\Persistable
25+
*/
26+
protected $persistable;
27+
28+
public function setUp() : void
29+
{
30+
$generator = new Agglomerate([
31+
'Iris-setosa' => new Blob([5.0, 3.42, 1.46, 0.24], [0.35, 0.38, 0.17, 0.1]),
32+
'Iris-versicolor' => new Blob([5.94, 2.77, 4.26, 1.33], [0.51, 0.31, 0.47, 0.2]),
33+
'Iris-virginica' => new Blob([6.59, 2.97, 5.55, 2.03], [0.63, 0.32, 0.55, 0.27]),
34+
]);
35+
36+
$training = $generator->generate(self::TRAINING_SIZE);
37+
38+
$estimator = new KNearestNeighbors(5, true);
39+
40+
$estimator->train($training);
41+
42+
$this->persistable = $estimator;
43+
44+
$this->serializer = new Gzip();
45+
}
46+
47+
/**
48+
* @Subject
49+
* @revs(10)
50+
* @Iterations(5)
51+
* @OutputTimeUnit("milliseconds", precision=3)
52+
*/
53+
public function serializeUnserialize() : void
54+
{
55+
$encoding = $this->serializer->serialize($this->persistable);
56+
57+
$persistable = $this->serializer->unserialize($encoding);
58+
}
59+
}
Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
<?php
2+
3+
namespace Rubix\ML\Benchmarks\Persisters\Serializers;
4+
5+
use Rubix\ML\Datasets\Generators\Blob;
6+
use Rubix\ML\Classifiers\KNearestNeighbors;
7+
use Rubix\ML\Datasets\Generators\Agglomerate;
8+
use Rubix\ML\Persisters\Serializers\Native;
9+
10+
/**
11+
* @Groups({"Serializers"})
12+
* @BeforeMethods({"setUp"})
13+
*/
14+
class NativeBench
15+
{
16+
protected const TRAINING_SIZE = 2500;
17+
18+
/**
19+
* @var \Rubix\ML\Persisters\Serializers\Native
20+
*/
21+
protected $serializer;
22+
23+
/**
24+
* @var \Rubix\ML\Persistable
25+
*/
26+
protected $persistable;
27+
28+
public function setUp() : void
29+
{
30+
$generator = new Agglomerate([
31+
'Iris-setosa' => new Blob([5.0, 3.42, 1.46, 0.24], [0.35, 0.38, 0.17, 0.1]),
32+
'Iris-versicolor' => new Blob([5.94, 2.77, 4.26, 1.33], [0.51, 0.31, 0.47, 0.2]),
33+
'Iris-virginica' => new Blob([6.59, 2.97, 5.55, 2.03], [0.63, 0.32, 0.55, 0.27]),
34+
]);
35+
36+
$training = $generator->generate(self::TRAINING_SIZE);
37+
38+
$estimator = new KNearestNeighbors(5, true);
39+
40+
$estimator->train($training);
41+
42+
$this->persistable = $estimator;
43+
44+
$this->serializer = new Native();
45+
}
46+
47+
/**
48+
* @Subject
49+
* @revs(10)
50+
* @Iterations(5)
51+
* @OutputTimeUnit("milliseconds", precision=3)
52+
*/
53+
public function serializeUnserialize() : void
54+
{
55+
$encoding = $this->serializer->serialize($this->persistable);
56+
57+
$persistable = $this->serializer->unserialize($encoding);
58+
}
59+
}
Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
<?php
2+
3+
namespace Rubix\ML\Benchmarks\Persisters\Serializers;
4+
5+
use Rubix\ML\Datasets\Generators\Blob;
6+
use Rubix\ML\Classifiers\KNearestNeighbors;
7+
use Rubix\ML\Datasets\Generators\Agglomerate;
8+
use Rubix\ML\Persisters\Serializers\RBX;
9+
10+
/**
11+
* @Groups({"Serializers"})
12+
* @BeforeMethods({"setUp"})
13+
*/
14+
class RBXBench
15+
{
16+
protected const TRAINING_SIZE = 2500;
17+
18+
/**
19+
* @var \Rubix\ML\Persisters\Serializers\RBX
20+
*/
21+
protected $serializer;
22+
23+
/**
24+
* @var \Rubix\ML\Persistable
25+
*/
26+
protected $persistable;
27+
28+
public function setUp() : void
29+
{
30+
$generator = new Agglomerate([
31+
'Iris-setosa' => new Blob([5.0, 3.42, 1.46, 0.24], [0.35, 0.38, 0.17, 0.1]),
32+
'Iris-versicolor' => new Blob([5.94, 2.77, 4.26, 1.33], [0.51, 0.31, 0.47, 0.2]),
33+
'Iris-virginica' => new Blob([6.59, 2.97, 5.55, 2.03], [0.63, 0.32, 0.55, 0.27]),
34+
]);
35+
36+
$training = $generator->generate(self::TRAINING_SIZE);
37+
38+
$estimator = new KNearestNeighbors(5, true);
39+
40+
$estimator->train($training);
41+
42+
$this->persistable = $estimator;
43+
44+
$this->serializer = new RBX();
45+
}
46+
47+
/**
48+
* @Subject
49+
* @revs(10)
50+
* @Iterations(5)
51+
* @OutputTimeUnit("milliseconds", precision=3)
52+
*/
53+
public function serializeUnserialize() : void
54+
{
55+
$encoding = $this->serializer->serialize($this->persistable);
56+
57+
$persistable = $this->serializer->unserialize($encoding);
58+
}
59+
}
Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
<?php
2+
3+
namespace Rubix\ML\Benchmarks\Transformers;
4+
5+
use Rubix\ML\Datasets\Generators\Blob;
6+
use Rubix\ML\Transformers\NumericStringConverter;
7+
8+
/**
9+
* @Groups({"Transformers"})
10+
* @BeforeMethods({"setUp"})
11+
*/
12+
class NumericStringConverterBench
13+
{
14+
protected const DATASET_SIZE = 100000;
15+
16+
/**
17+
* @var \Rubix\ML\Datasets\Dataset
18+
*/
19+
public $dataset;
20+
21+
/**
22+
* @var \Rubix\ML\Transformers\NumericStringConverter
23+
*/
24+
protected $transformer;
25+
26+
public function setUp() : void
27+
{
28+
$generator = new Blob([0.0, 0.0, 0.0, 0.0]);
29+
30+
$this->dataset = $generator->generate(self::DATASET_SIZE)
31+
->transformColumn(1, 'strval')
32+
->transformColumn(3, 'strval');
33+
34+
$this->transformer = new NumericStringConverter();
35+
}
36+
37+
/**
38+
* @Subject
39+
* @Iterations(3)
40+
* @OutputTimeUnit("milliseconds", precision=3)
41+
*/
42+
public function apply() : void
43+
{
44+
$this->dataset->apply($this->transformer);
45+
}
46+
}
Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
<?php
2+
3+
namespace Rubix\ML\Benchmarks\Transformers;
4+
5+
use Rubix\ML\Datasets\Generators\Blob;
6+
use Rubix\ML\Datasets\Generators\Agglomerate;
7+
use Rubix\ML\Transformers\TruncatedSVD;
8+
9+
/**
10+
* @Groups({"Transformers"})
11+
* @BeforeMethods({"setUp"})
12+
*/
13+
class TruncatedSVDBench
14+
{
15+
protected const DATASET_SIZE = 10000;
16+
17+
/**
18+
* @var \Rubix\ML\Datasets\Labeled
19+
*/
20+
public $dataset;
21+
22+
/**
23+
* @var \Rubix\ML\Transformers\TruncatedSVD
24+
*/
25+
protected $transformer;
26+
27+
public function setUp() : void
28+
{
29+
$generator = new Agglomerate([
30+
'Iris-setosa' => new Blob([5.0, 3.42, 1.46, 0.24], [0.35, 0.38, 0.17, 0.1]),
31+
'Iris-versicolor' => new Blob([5.94, 2.77, 4.26, 1.33], [0.51, 0.31, 0.47, 0.2]),
32+
'Iris-virginica' => new Blob([6.59, 2.97, 5.55, 2.03], [0.63, 0.32, 0.55, 0.27]),
33+
]);
34+
35+
$this->dataset = $generator->generate(self::DATASET_SIZE);
36+
37+
$this->transformer = new TruncatedSVD(1);
38+
}
39+
40+
/**
41+
* @Subject
42+
* @Iterations(3)
43+
* @OutputTimeUnit("milliseconds", precision=3)
44+
*/
45+
public function apply() : void
46+
{
47+
$this->dataset->apply($this->transformer);
48+
}
49+
}

composer.json

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@
3636
"ext-json": "*",
3737
"amphp/parallel": "^1.3",
3838
"psr/log": "^1.1",
39-
"rubix/tensor": "^2.0",
39+
"rubix/tensor": "^2.2",
4040
"symfony/polyfill-mbstring": "^1.0",
4141
"symfony/polyfill-php73": "^1.20",
4242
"symfony/polyfill-php80": "^1.17",
@@ -45,7 +45,7 @@
4545
"require-dev": {
4646
"friendsofphp/php-cs-fixer": "2.18.*",
4747
"league/flysystem-memory": "^2.0",
48-
"phpbench/phpbench": "1.0.0-alpha4",
48+
"phpbench/phpbench": "1.0.0-alpha6",
4949
"phpstan/extension-installer": "^1.0",
5050
"phpstan/phpstan": "0.12.*",
5151
"phpstan/phpstan-phpunit": "0.12.*",
@@ -96,8 +96,6 @@
9696
"sort-packages": true,
9797
"process-timeout": 3000
9898
},
99-
"minimum-stability": "dev",
100-
"prefer-stable": true,
10199
"funding": [
102100
{
103101
"type": "github",

docs/hyper-parameter-tuning.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,22 +2,22 @@
22
Hyper-parameter tuning is an experimental process that incorporates [cross-validation](cross-validation.md) to guide hyper-parameter selection. When choosing an estimator for your project it often helps to fine-tune its hyper-parameters in order to get the best accuracy and performance from the model.
33

44
## Manual Tuning
5-
In a manual scenario, a user will train an estimator with one set of hyper-parameters, obtain a validation score, and then use that as a baseline to make future adjustments. The goal at each iteration is to determine whether the adjustments improve accuracy or cause it to decrease. We can consider a model to be *fully* tuned when adjustments to the hyper-parameters can no longer make improvements to the validation score. In the example below, we'll tune the *radius* parameter of [Radius Neighbors Regressor](regressors/radius-neighbors-regressor.md) by iterating over the following block of code with a different setting each time. At first, we can start by choosing radius from a set of values and then honing in on the best value once we have obtained the settings with the highest [SMAPE](cross-validation/metrics/smape.md) score.
5+
When actively tuning a model, we will train an estimator with one set of hyper-parameters, obtain a validation score, and then use that as a baseline to make future adjustments. The goal at each iteration is to determine whether the adjustments improve accuracy or cause it to decrease. We can consider a model to be *fully* tuned when adjustments to the hyper-parameters can no longer make improvements to the validation score. With practice, we'll develop an intuition for which parameters need adjusting. Refer to the API documentation for each learner for a description of each hyper-parameter. In the example below, we'll tune the *radius* parameter of [Radius Neighbors Regressor](regressors/radius-neighbors-regressor.md) by iterating over the following block of code with a different setting each time. At first, we can start by choosing radius from a set of values and then honing in on the best value once we have obtained the settings with the highest [SMAPE](cross-validation/metrics/smape.md) score.
66

77
```php
88
use Rubix\ML\Regressors\RadiusNeighborsRegressor;
99
use Rubix\ML\CrossValidation\Metrics\SMAPE;
1010

1111
[$training, $testing] = $dataset->randomize()->split(0.8);
1212

13-
$metric = new SMAPE();
14-
1513
$estimator = new RadiusNeighborsRegressor(0.5); // 0.1, 0.5, 1.0, 2.0, 5.0
1614

1715
$estimator->train($training);
1816

1917
$predictions = $estimator->predict($testing);
2018

19+
$metric = new SMAPE();
20+
2121
$score = $metric->score($predictions, $testing->labels());
2222

2323
echo $score;

0 commit comments

Comments
 (0)