Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DropoutNet - Use official config & sample data but AUC and loss worsen with more training steps #513

Open
martin0258 opened this issue Jan 7, 2025 · 1 comment

Comments

@martin0258
Copy link

martin0258 commented Jan 7, 2025

Description

I attempted to train the official DropoutNet model using the provided sample Taobao dataset and the sample configuration file. However, during training, I observed that the AUC decreased and the losses increased as the training steps progressed. Based on my understanding, the expected behavior is that the AUC should increase and the losses should decrease as training continues.

Steps to reproduce

OS: Ubuntu 20.04
GPU: 1 NVIDIA RTX 3090
Python: 3.10.16
TensorFlow: 2.14.0 with CUDA

  1. git clone the easyrec repo (commit SHA: 4b0b1f5)
  2. install easyrec
  3. download the sample taobao dataset:
wget http://easyrec.oss-cn-beijing.aliyuncs.com/data/git_oss_sample_data/data_test_tb_data_b1579db090d72b3b70b59ba3c7692701 -O tb_data.tar.gz
tar -zxf tb_data.tar.gz
  1. run the training with the sample dropoutnet config and sample dataset
python -m easy_rec.python.train_eval --pipeline_config_path samples/model_config/dropoutnet_on_taobao.config

Actual training result

TensorBoard:

tensorboard --logdir experiments/dropoutnet_taobao_ckpt/eval_val

image

Initial AUC and loss:
image

Final AUC and loss:
image

Expected behavior

  • AUC should increase with more training steps.
  • Losses should decrease with more training steps.

Could you please confirm if this is expected behavior or if there might be an issue with the sample configuration or dataset? If additional debugging information is needed, I am happy to provide more details.

Thank you!

@martin0258
Copy link
Author

FYR: I could not reproduce the same AUC/loss curve with more steps (2500 -> 25000)

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant