Skip to content

DropoutNet - Use official config & sample data but AUC and loss worsen with more training steps #513

Open
@martin0258

Description

@martin0258

Description

I attempted to train the official DropoutNet model using the provided sample Taobao dataset and the sample configuration file. However, during training, I observed that the AUC decreased and the losses increased as the training steps progressed. Based on my understanding, the expected behavior is that the AUC should increase and the losses should decrease as training continues.

Steps to reproduce

OS: Ubuntu 20.04
GPU: 1 NVIDIA RTX 3090
Python: 3.10.16
TensorFlow: 2.14.0 with CUDA

  1. git clone the easyrec repo (commit SHA: 4b0b1f5)
  2. install easyrec
  3. download the sample taobao dataset:
wget http://easyrec.oss-cn-beijing.aliyuncs.com/data/git_oss_sample_data/data_test_tb_data_b1579db090d72b3b70b59ba3c7692701 -O tb_data.tar.gz
tar -zxf tb_data.tar.gz
  1. run the training with the sample dropoutnet config and sample dataset
python -m easy_rec.python.train_eval --pipeline_config_path samples/model_config/dropoutnet_on_taobao.config

Actual training result

TensorBoard:

tensorboard --logdir experiments/dropoutnet_taobao_ckpt/eval_val

image

Initial AUC and loss:
image

Final AUC and loss:
image

Expected behavior

  • AUC should increase with more training steps.
  • Losses should decrease with more training steps.

Could you please confirm if this is expected behavior or if there might be an issue with the sample configuration or dataset? If additional debugging information is needed, I am happy to provide more details.

Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions