-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gradmatch Data subset selection method making training slow #78
Comments
Can you point out what version of GradMatch you are using? Ideally, subset selection should be faster unless something is wrong with the experimental setup. Please attach the log files so that I can figure out the issue after analyzing them. |
@krishnatejakk [06/27 16:40:56] train_sl INFO: DotMap(setting='SL', is_reg=True, dataset=DotMap(name='cifar10', datadir='../storage', feature='dss', type='image'), dataloader=DotMap(shuffle=True, batch_size=256, pin_memory=True, num_workers=8), model=DotMap(architecture='ResNet50_224', type='pre-defined', numclasses=10), ckpt=DotMap(is_load=False, is_save=True, dir='results/', save_every=20), loss=DotMap(type='CrossEntropyLoss', use_sigmoid=False), optimizer=DotMap(type='sgd', momentum=0.9, lr=0.01, weight_decay=0.0005, nesterov=False), scheduler=DotMap(type='cosine_annealing', T_max=300), dss_args=DotMap(type='GradMatch', fraction=0.3, select_every=5, lam=0.5, selection_type='PerClassPerGradient', v1=True, valid=False, kappa=0, eps=1e-100, linear_layer=True), train_args=DotMap(num_epochs=300, device='cuda', print_every=1, results_dir='results/', print_args=['val_loss', 'val_acc', 'tst_loss', 'tst_acc', 'time'], return_args=[])) |
@krishnatejakk
[Full dataset]: [GradMatch]: [CRAIG]: |
@shiyf129 What is the resolution of the images you are using while training? I am using 224x224. |
@animesh-007 @shiyf129 I am working on the issue. We have recently updated the OMP version in GradMatch code which improves its performance further. However the new OMP version is making it slower in this case. I will debug why it is very slow in this case. For faster training, One option is to use GradMatchPB (i.e., perBatch version) or revert back to previous OMP version in GradMatch strategy code below:
In import statement, remove _V1 to revert back to previous version of OMP code |
I use the original cifar10 dataset, 32*32 image size |
@krishnatejakk I test GradMatchPB algorithm and set v1=False to use previous OMP version. I compared the beginning 10 epoch training between GradMatchPB alogithm and full dataset training, the result shows GradMatchPB takes longer time, and the average accuracy is relatively low. Do you know the reason about it? GradMatchPB
Full dataset training
GradMatchPB beginning 10 epoch training: <style> </style>
Full dataset beginning 10 epoch training: <style> </style>
|
@shiyf129 why is subset selection happening every epoch? We usually set it to 20. Subset selection takes some time and you dont need to select a subset every time. Furthermore, training with 10% subset should be 10x faster than full dataset training. From your logs, it doesn't seem that way. Can you check if you create a 10% subset of dataset and train on it for one epoch, is it 10x faster than full training? |
@krishnatejakk I modified the code to select a subset every 20 epoches.
|
I tried to run some experiments as follows:
I am using scaled resolution images of cifar10 i.e 224x224 resolution and accordingly defined resnet50 architecture.
Can you let me know how to speed up experiments 2 and 3? In general subset selection method should faster the whole training process right?
The text was updated successfully, but these errors were encountered: