Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible issue for IsolationForest Division by zero #353

Open
indianalalal opened this issue Mar 5, 2025 · 1 comment
Open

Possible issue for IsolationForest Division by zero #353

indianalalal opened this issue Mar 5, 2025 · 1 comment

Comments

@indianalalal
Copy link

Hello,

I'm trying to use IsolationForest.

$dataset =  new CSV('test.csv', false);
$dataset = Unlabeled::fromIterator($dataset);

$oneHotEncoder = new OneHotEncoder();
$dataset->apply($oneHotEncoder);

$estimator = new IsolationForest(2, 0.1, 0.5); 
$estimator->train($dataset);

There is an exemple of the csv content. In reality I have more than 1000 lines

1,1,46,3,1
1,1,58,6,3
2,5,52,3,1
1,11,52,3,3
2,11,52,6,3
3,1,46,3,1
3,1,58,6,3
3,11,52,3,3

And I have this issue

In IsolationForest.php line 311:
Division by zero  

Is there a problem with the code or with my data?
Thx

@andrewdalpino
Copy link
Member

Hey @indianalalal there's no need to one-hot encode your categorical features with IsolationForest since it's compatible with categorical features. The division by zero is because there managed to be a split of samples where all features for the randomly selected continuous column were zero. This could be related to your one-hot encoding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants