[AestheticAnalisis]Could you please tell me how to train the initModel? #10

yt605155624 · 2017-06-29T03:08:49Z

Dear Shu Kong:
I am a student from China ,my name is Yuan Tian , and I am studying deepImageAestheticAnalisis myself recently,I have read the paper [Photo Aesthetics Ranking Network with Attributes and Content Adaptation],and run the python code "caffe_prediction.py",but I'd like to know how can I train the net on my own datasets , I found that in the "initModel.prototxt",there are no data layer and no loss layers,I want to know what the input looks like? Is it leveldb , lmdb or h5? I found that in this net you have taken the 11 attributes in account ,so if there have 12 labels have to input into the net , as 11 for the 11 attributes and 1 for the final score?So does the input have one file of images and 12 txts of labels ? I am not clearly...
So , is it possible that you tell me the way to train the "initModel.prototxt"and tell me what do the input ,input layer and the loss layers look like in training step ,or does it possible you send me the "train_val.prototxt","solver.prototxt"and"deploy.protxt"of the "initModel.protxt"and I will study it myself.
Thank you very much!
Best wishes!

aimerykong · 2017-06-29T14:17:54Z

hi, Yuan, Thank you for your interest! When using caffe to train models, people usually have prototxt files for training and testing/application separately. Here I didn't include the training prototxt and sadly I can't find it in my desktop any more:( If you train your model, I suggest that you train in stages -- first train the regression model, then add rank loss or attributes branches, then combine all of these for the final model. To train my models, I modify alexnet ( https://github.com/BVLC/caffe/tree/master/models/bvlc_alexnet). If you train your model in a similar way, you can follow these prototxt files. Hope this helps:) Regards, Shu

…

On Wed, Jun 28, 2017 at 8:08 PM, yt605155624 ***@***.***> wrote: Dear Shu Kong: I am a student from China ,my name is Yuan Tian , and I am studying deepImageAestheticAnalisis myself recently,I have read the paper [Photo Aesthetics Ranking Network with Attributes and Content Adaptation],and run the python code "caffe_prediction.py",but I'd like to know how can I train the net on my own datasets , I found that in the "initModel.prototxt",there are no data layer and no loss layers,I want to know what the input looks like? Is it leveldb , lmdb or h5? I found that in this net you have taken the 11 attributes in account ,so if there have 12 labels have to input into the net , as 11 for the 11 attributes and 1 for the final score?So does the input have one file of images and 12 txts of labels ? I am not clearly... So , is it possible that you tell me the way to train the "initModel.prototxt"and tell me what do the input ,input layer and the loss layers look like in training step ,or does it possible you send me the "train_val.prototxt","solver.prototxt"and"deploy.protxt"of the "initModel.protxt"and I will study it myself. Thank you very much! Best wishes! — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#10>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AGKZJH8qKpyNFDbZOUyodMcMcpMrrY8Sks5sIxVBgaJpZM4OIxjF> .

yt605155624 · 2017-07-12T07:52:55Z

Hello:
I would like to know that ,when you train the attributes branches, do you
(1)use 11 attributes scores+1 final score to train fc8s 、 fc10andfc11,
(2)or just use 11 attributes to train fc8s first and then delete the 11 losses and add a loss after fc11 (use final score ) to train fc10 and fc11
hope your early reply!

aimerykong · 2017-07-12T15:13:02Z

When training the attribute branches, I fixed the model and only learned new layers for the new branches. In practice, like you guessed (2), I trained each attribute branch at one time and WITHOUT the aesthetics score, then I merged all the branches to put them together.

…

On Wed, Jul 12, 2017 at 12:52 AM, yt605155624 ***@***.***> wrote: Hello: I would like to know that ,when you train the attributes branches, do you (1)use 11 attributes scores+1 final score to train fc8s 、 fc10andfc11, (2)or just use 11 attributes to train fc8s first and then delete the 11 losses and add a loss after fc11 (use final score ) to train fc10 and fc11 hope your early reply! — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#10 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AGKZJGlaetY6GkcM4UL5o6Z4Ay3Nm4nIks5sNHtYgaJpZM4OIxjF> .

yt605155624 · 2017-07-13T02:18:07Z

Thank you very much ,when I trained
first I trained the seg model as fc8new->fc9new->loss get the rho 0.6522
then I trained the 12 scores at a time as fc8s->losses+fc11_socre->losses ,(my hdf5 input has 1 data and 12 labels),though I don't know how caffe bp the muti-and-sequence labels, I can get the rho 0.6668 as your paper said (att+seg+rank,though I haven't add rank...)
next time I would like to try new models for example vgg or resnet ,maybe will get better scores,
thank you very much,I have learned a lot through your paper ,dataset and codes

aimerykong · 2017-07-13T15:34:27Z

Great to see it works on your end! It's true that one might get better or worse results due to some randomness, like data preparation. I notice that some people used the very deep inception model and get a huge boost on AVA dataset. I believe if you want better performance, turning to a deeper model is a good start point.

…

On Wed, Jul 12, 2017 at 7:18 PM, yt605155624 ***@***.***> wrote: Tank you very much ,when I trained first I trained the seg model as fc8new->fc9new->loss get the rho 0.6522 then I trained the 12 scores at a time as fc8s->losses+fc11_socre->losses ,(my hdf5 input has 1 data and 12 labels),though I don't know how caffe bp the muti-and-sequence labels, I can get the rho 0.6668 as your paper said (att+seg+rank,though I haven't add rank...) next time I would like to try new models for example vgg or resnet ,maybe will get better scores, thank you very much,I haven't learned a lot through your paper ,dataset and codes — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#10 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AGKZJEVJVL7F1TzjLtCcFyTJPaK8F2Gpks5sNX5ggaJpZM4OIxjF> .

JiyuanLi · 2019-07-30T03:38:31Z

Hi Shu Kong,

I am trying to follow "initModel.prototxt" to reproduce your work based on the Images in "datasetImages_warp256.zip" and labels in "imgListFiles_Label.zip", and currently I could only get the rho=0.56... I got some questions about the implementation and hope you may help me on that:

The final score is ranged from [0, 1], but in "initModel.prototxt" the output layer is a innerProduct layer which could produce scores from [-inf, inf], just wonder if a sigmoid function is needed at the end?
In my training process:
1> I firstly trained the all the layers based on pre-trained alexnet from conv1 to fc7 + fc8new and fc9new with the regression loss, but after 2000 epochs only a rho=0.56 for the entire validation set is derived;
2> secondly I trained the whole network with all attribute loss at fc9s (fc9new is deleted) + final score loss at fc11, but it seems very little improvements could be achieved
Not sure if my training steps are correct and whether more epochs are needed
In the paper the input image is resized to 256256 and random cropped to get a 227227 segment, but in the "caffe_prediction.py" it seems the image is resized to 227*227 directly, which method should I use then?

Looking forward to your reply~ :)
Best Regards,
Jiyuan

aimerykong · 2019-07-30T05:20:58Z

Hello, Jiyuan, Thanks for trying all these. When I did this project, I first trained for binary classification on the aesthetic label by fine-tuning Alexnet. Then over the classification model, I build new layers for regression and fine-tune the model. Based on the regression model, I added in the rank loss and then built in the branches for attribute learning and fine-tune further. My impression was that it's hard to train directly for aesthetics regression from Alexnet. As for your first question on whether the sigmoid layer is necessary -- I am not totally sure if that is necessary after these years. From more experience since then, I would not use sigmoid layer now. I feel that sigmoid makes it a little hard for regression, while the linear layer (convolutoin or relu) seems easier. As for the input image size -- If I remember it correctly, 256x256 is the resized size while 227x227 is the real resolution as input image to the model. During training with caffe, images are first resized into 256x256, then randomly cropped 227x227 as input. So for testing/demo, 227x227 is the input size. If you evaluate the model, it always improves by randomly cropping more 227x227 from the single image -- so it is called test augmentation -- but it is secondary. Hope it helps. Regards, Shu

…

On Mon, Jul 29, 2019 at 8:38 PM JiyuanLi ***@***.***> wrote: Hi Shu Kong, I am trying to follow "initModel.prototxt" to reproduce your work based on the Images in "datasetImages_warp256.zip" and labels in "imgListFiles_Label.zip", and currently I could only get the rho=0.56... I got some questions about the implementation and hope you may help me on that: 1. The final score is ranged from 01, but in "initModel.prototxt" the output layer is a innerProduct layer which could produce scores from -infinf, just wonder if a sigmoid function is needed at the end? 2. In my training process: 1> I firstly trained the all the layers based on pre-trained alexnet from *conv1* to *fc7* + *fc8new* and *fc9new* with the regression loss, but after 2000 epochs only a rho=0.56 for the entire validation set is derived; 2> secondly I trained the whole network with all attribute loss at *fc9s* (fc9new is deleted) + final score loss at *fc11*, but it seems very little improvements could be achieved Not sure if my training steps are correct and whether more epochs are needed 3. In the paper the input image is resized to 256*256 and random cropped to get a 227*227 segment, but in the "caffe_prediction.py" it seems the image is resized to 227*227 directly, which method should I use then? Looking forward to your reply~ :) Best Regards, Jiyuan — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#10?email_source=notifications&email_token=ABRJSJCJRKLG7ER7E66Y7TLQB6ZTPA5CNFSM4DRDDDC2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3CU7DI#issuecomment-516247437>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABRJSJA3CLFGM4DK4LLZAK3QB6ZTPANCNFSM4DRDDDCQ> .

JiyuanLi · 2019-09-11T12:11:59Z

Hello, Jiyuan, Thanks for trying all these. When I did this project, I first trained for binary classification on the aesthetic label by fine-tuning Alexnet. Then over the classification model, I build new layers for regression and fine-tune the model. Based on the regression model, I added in the rank loss and then built in the branches for attribute learning and fine-tune further. My impression was that it's hard to train directly for aesthetics regression from Alexnet. As for your first question on whether the sigmoid layer is necessary -- I am not totally sure if that is necessary after these years. From more experience since then, I would not use sigmoid layer now. I feel that sigmoid makes it a little hard for regression, while the linear layer (convolutoin or relu) seems easier. As for the input image size -- If I remember it correctly, 256x256 is the resized size while 227x227 is the real resolution as input image to the model. During training with caffe, images are first resized into 256x256, then randomly cropped 227x227 as input. So for testing/demo, 227x227 is the input size. If you evaluate the model, it always improves by randomly cropping more 227x227 from the single image -- so it is called test augmentation -- but it is secondary. Hope it helps. Regards, Shu
…
On Mon, Jul 29, 2019 at 8:38 PM JiyuanLi @.> wrote: Hi Shu Kong, I am trying to follow "initModel.prototxt" to reproduce your work based on the Images in "datasetImages_warp256.zip" and labels in "imgListFiles_Label.zip", and currently I could only get the rho=0.56... I got some questions about the implementation and hope you may help me on that: 1. The final score is ranged from 01, but in "initModel.prototxt" the output layer is a innerProduct layer which could produce scores from -infinf, just wonder if a sigmoid function is needed at the end? 2. In my training process: 1> I firstly trained the all the layers based on pre-trained alexnet from conv1 to fc7 + fc8new and fc9new with the regression loss, but after 2000 epochs only a rho=0.56 for the entire validation set is derived; 2> secondly I trained the whole network with all attribute loss at fc9s (fc9new is deleted) + final score loss at fc11, but it seems very little improvements could be achieved Not sure if my training steps are correct and whether more epochs are needed 3. In the paper the input image is resized to 256256 and random cropped to get a 227227 segment, but in the "caffe_prediction.py" it seems the image is resized to 227227 directly, which method should I use then? Looking forward to your reply~ :) Best Regards, Jiyuan — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#10?email_source=notifications&email_token=ABRJSJCJRKLG7ER7E66Y7TLQB6ZTPA5CNFSM4DRDDDC2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3CU7DI#issuecomment-516247437>, or mute the thread https://github.com/notifications/unsubscribe-auth/ABRJSJA3CLFGM4DK4LLZAK3QB6ZTPANCNFSM4DRDDDCQ .

Hi Shu,

Thanks a lot for your reply, actually I tried re-implementing the network described in "initModel.prototxt" via pytorch and adopted the coefficients in "initModel.caffemodel" directly, but the final rho on the entire test set was only around 0.43... Not sure if these coefficients are the final training results?

I also noticed "Per Pixel Mean" is used at pre-processing stage to get the "Zero Mean Image", but there are two sets of mean vectors in "initModel.caffemodel"... The one I used is "mean_warp", but actually I'm just not sure about the differences...

Best Regards,
Jiyuan

SAB95852 · 2020-01-14T03:26:57Z

Hi.I doing garbage classification project for detecting biodegradable and nonbiodegradable wastes.I am using bvlc_alexnet model.How to label only two classes(bio,nonbio) in output layer.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AestheticAnalisis]Could you please tell me how to train the initModel? #10

[AestheticAnalisis]Could you please tell me how to train the initModel? #10

yt605155624 commented Jun 29, 2017

aimerykong commented Jun 29, 2017 via email

yt605155624 commented Jul 12, 2017

aimerykong commented Jul 12, 2017 via email

yt605155624 commented Jul 13, 2017 •

edited

Loading

aimerykong commented Jul 13, 2017 via email

JiyuanLi commented Jul 30, 2019 •

edited

Loading

aimerykong commented Jul 30, 2019 via email

JiyuanLi commented Sep 11, 2019

SAB95852 commented Jan 14, 2020

[AestheticAnalisis]Could you please tell me how to train the initModel? #10

[AestheticAnalisis]Could you please tell me how to train the initModel? #10

Comments

yt605155624 commented Jun 29, 2017

aimerykong commented Jun 29, 2017 via email

yt605155624 commented Jul 12, 2017

aimerykong commented Jul 12, 2017 via email

yt605155624 commented Jul 13, 2017 • edited Loading

aimerykong commented Jul 13, 2017 via email

JiyuanLi commented Jul 30, 2019 • edited Loading

aimerykong commented Jul 30, 2019 via email

JiyuanLi commented Sep 11, 2019

SAB95852 commented Jan 14, 2020

yt605155624 commented Jul 13, 2017 •

edited

Loading

JiyuanLi commented Jul 30, 2019 •

edited

Loading