Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AestheticAnalisis]Could you please tell me how to train the initModel? #10

Open
yt605155624 opened this issue Jun 29, 2017 · 9 comments

Comments

@yt605155624
Copy link

Dear Shu Kong:
I am a student from China ,my name is Yuan Tian , and I am studying deepImageAestheticAnalisis myself recently,I have read the paper [Photo Aesthetics Ranking Network with Attributes and Content Adaptation],and run the python code "caffe_prediction.py",but I'd like to know how can I train the net on my own datasets , I found that in the "initModel.prototxt",there are no data layer and no loss layers,I want to know what the input looks like? Is it leveldb , lmdb or h5? I found that in this net you have taken the 11 attributes in account ,so if there have 12 labels have to input into the net , as 11 for the 11 attributes and 1 for the final score?So does the input have one file of images and 12 txts of labels ? I am not clearly...
So , is it possible that you tell me the way to train the "initModel.prototxt"and tell me what do the input ,input layer and the loss layers look like in training step ,or does it possible you send me the "train_val.prototxt","solver.prototxt"and"deploy.protxt"of the "initModel.protxt"and I will study it myself.
Thank you very much!
Best wishes!

@aimerykong
Copy link
Owner

aimerykong commented Jun 29, 2017 via email

@yt605155624
Copy link
Author

Hello:
I would like to know that ,when you train the attributes branches, do you
(1)use 11 attributes scores+1 final score to train fc8s 、 fc10andfc11,
(2)or just use 11 attributes to train fc8s first and then delete the 11 losses and add a loss after fc11 (use final score ) to train fc10 and fc11
hope your early reply!

@aimerykong
Copy link
Owner

aimerykong commented Jul 12, 2017 via email

@yt605155624
Copy link
Author

yt605155624 commented Jul 13, 2017

Thank you very much ,when I trained
first I trained the seg model as fc8new->fc9new->loss get the rho 0.6522
then I trained the 12 scores at a time as fc8s->losses+fc11_socre->losses ,(my hdf5 input has 1 data and 12 labels),though I don't know how caffe bp the muti-and-sequence labels, I can get the rho 0.6668 as your paper said (att+seg+rank,though I haven't add rank...)
next time I would like to try new models for example vgg or resnet ,maybe will get better scores,
thank you very much,I have learned a lot through your paper ,dataset and codes

@aimerykong
Copy link
Owner

aimerykong commented Jul 13, 2017 via email

@JiyuanLi
Copy link

JiyuanLi commented Jul 30, 2019

Hi Shu Kong,

I am trying to follow "initModel.prototxt" to reproduce your work based on the Images in "datasetImages_warp256.zip" and labels in "imgListFiles_Label.zip", and currently I could only get the rho=0.56... I got some questions about the implementation and hope you may help me on that:

  1. The final score is ranged from [0, 1], but in "initModel.prototxt" the output layer is a innerProduct layer which could produce scores from [-inf, inf], just wonder if a sigmoid function is needed at the end?
  2. In my training process:
    1> I firstly trained the all the layers based on pre-trained alexnet from conv1 to fc7 + fc8new and fc9new with the regression loss, but after 2000 epochs only a rho=0.56 for the entire validation set is derived;
    2> secondly I trained the whole network with all attribute loss at fc9s (fc9new is deleted) + final score loss at fc11, but it seems very little improvements could be achieved
    Not sure if my training steps are correct and whether more epochs are needed
  3. In the paper the input image is resized to 256256 and random cropped to get a 227227 segment, but in the "caffe_prediction.py" it seems the image is resized to 227*227 directly, which method should I use then?

Looking forward to your reply~ :)
Best Regards,
Jiyuan

@aimerykong
Copy link
Owner

aimerykong commented Jul 30, 2019 via email

@JiyuanLi
Copy link

Hello, Jiyuan, Thanks for trying all these. When I did this project, I first trained for binary classification on the aesthetic label by fine-tuning Alexnet. Then over the classification model, I build new layers for regression and fine-tune the model. Based on the regression model, I added in the rank loss and then built in the branches for attribute learning and fine-tune further. My impression was that it's hard to train directly for aesthetics regression from Alexnet. As for your first question on whether the sigmoid layer is necessary -- I am not totally sure if that is necessary after these years. From more experience since then, I would not use sigmoid layer now. I feel that sigmoid makes it a little hard for regression, while the linear layer (convolutoin or relu) seems easier. As for the input image size -- If I remember it correctly, 256x256 is the resized size while 227x227 is the real resolution as input image to the model. During training with caffe, images are first resized into 256x256, then randomly cropped 227x227 as input. So for testing/demo, 227x227 is the input size. If you evaluate the model, it always improves by randomly cropping more 227x227 from the single image -- so it is called test augmentation -- but it is secondary. Hope it helps. Regards, Shu

On Mon, Jul 29, 2019 at 8:38 PM JiyuanLi @.> wrote: Hi Shu Kong, I am trying to follow "initModel.prototxt" to reproduce your work based on the Images in "datasetImages_warp256.zip" and labels in "imgListFiles_Label.zip", and currently I could only get the rho=0.56... I got some questions about the implementation and hope you may help me on that: 1. The final score is ranged from 01, but in "initModel.prototxt" the output layer is a innerProduct layer which could produce scores from -infinf, just wonder if a sigmoid function is needed at the end? 2. In my training process: 1> I firstly trained the all the layers based on pre-trained alexnet from conv1 to fc7 + fc8new and fc9new with the regression loss, but after 2000 epochs only a rho=0.56 for the entire validation set is derived; 2> secondly I trained the whole network with all attribute loss at fc9s (fc9new is deleted) + final score loss at fc11, but it seems very little improvements could be achieved Not sure if my training steps are correct and whether more epochs are needed 3. In the paper the input image is resized to 256256 and random cropped to get a 227227 segment, but in the "caffe_prediction.py" it seems the image is resized to 227227 directly, which method should I use then? Looking forward to your reply~ :) Best Regards, Jiyuan — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#10?email_source=notifications&email_token=ABRJSJCJRKLG7ER7E66Y7TLQB6ZTPA5CNFSM4DRDDDC2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3CU7DI#issuecomment-516247437>, or mute the thread https://github.com/notifications/unsubscribe-auth/ABRJSJA3CLFGM4DK4LLZAK3QB6ZTPANCNFSM4DRDDDCQ .

Hi Shu,

Thanks a lot for your reply, actually I tried re-implementing the network described in "initModel.prototxt" via pytorch and adopted the coefficients in "initModel.caffemodel" directly, but the final rho on the entire test set was only around 0.43... Not sure if these coefficients are the final training results?

I also noticed "Per Pixel Mean" is used at pre-processing stage to get the "Zero Mean Image", but there are two sets of mean vectors in "initModel.caffemodel"... The one I used is "mean_warp", but actually I'm just not sure about the differences...

Best Regards,
Jiyuan

@SAB95852
Copy link

Hi.I doing garbage classification project for detecting biodegradable and nonbiodegradable wastes.I am using bvlc_alexnet model.How to label only two classes(bio,nonbio) in output layer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants