Skip to content

[ML] overall_accuracy is low for an imbalanced classification #1955

Open
@wwang500

Description

@wwang500

When using sklean imbalanced dataset (imbalanced ratio 99:1), our DFA job shows poor performance: overall_accuracy is 0.02

Step to reproduce:

On latest master build (Jul 19's)

  1. On Data Visualizer, import the imbalance.csv file to index: imbalance
    imbalance.csv

  2. During the import, change the mapping of column 30 from double to long:

"30": {
      "type": "long"
    }
  1. Create and start dfa job from dev console:
PUT _ml/data_frame/analytics/imbalance
{
  "source": {
    "index": [
      "imbalance"
    ],
    "query": {
      "match_all": {}
    }
  },
  "dest": {
    "index": "dest-imbalance",
    "results_field": "ml"
  },
  "analysis": {
       "classification" : {
          "dependent_variable" : "30",
          "class_assignment_objective" : "maximize_minimum_recall",
          "num_top_classes" : 2,
          "prediction_field_name" : "30_prediction",
          "training_percent" : 80.0,
          "randomize_seed" : 4642014714383011104,
          "early_stopping_enabled" : true
        }
  },
  "model_memory_limit": "1gb",
  "allow_lazy_start": false,
  "max_num_threads": 1
}

POST _ml/data_frame/analytics/imbalance/_start

  1. Once job finishes, run the evaluation
POST _ml/data_frame/_evaluate
{
        "index": "dest-imbalance",
        "query": {
        "term": {
          "ml.is_training": {
            "value": "false" 
          }
        }
      },
        "evaluation": {
         "classification": {
            "actual_field": "30",
            "predicted_field": "ml.30_prediction",
            "metrics": {
              "accuracy" : {}
            }
          }
        }
      }

Result:

{
  "classification" : {
    "accuracy" : {
      "classes" : [
        {
          "class_name" : "0",
          "value" : 0.02
        },
        {
          "class_name" : "1",
          "value" : 0.02
        }
      ],
      "overall_accuracy" : 0.02
    }
  }
}

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions