-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AestheticAnalisis]Could you please tell me how to train the initModel? #10
Comments
hi, Yuan,
Thank you for your interest!
When using caffe to train models, people usually have prototxt files for
training and testing/application separately. Here I didn't include the
training prototxt and sadly I can't find it in my desktop any more:(
If you train your model, I suggest that you train in stages -- first train
the regression model, then add rank loss or attributes branches, then
combine all of these for the final model.
To train my models, I modify alexnet (
https://github.com/BVLC/caffe/tree/master/models/bvlc_alexnet). If you
train your model in a similar way, you can follow these prototxt files.
Hope this helps:)
Regards,
Shu
…On Wed, Jun 28, 2017 at 8:08 PM, yt605155624 ***@***.***> wrote:
Dear Shu Kong:
I am a student from China ,my name is Yuan Tian , and I am studying
deepImageAestheticAnalisis myself recently,I have read the paper [Photo
Aesthetics Ranking Network with Attributes and Content Adaptation],and run
the python code "caffe_prediction.py",but I'd like to know how can I train
the net on my own datasets , I found that in the "initModel.prototxt",there
are no data layer and no loss layers,I want to know what the input looks
like? Is it leveldb , lmdb or h5? I found that in this net you have taken
the 11 attributes in account ,so if there have 12 labels have to input into
the net , as 11 for the 11 attributes and 1 for the final score?So does the
input have one file of images and 12 txts of labels ? I am not clearly...
So , is it possible that you tell me the way to train the
"initModel.prototxt"and tell me what do the input ,input layer and the loss
layers look like in training step ,or does it possible you send me the
"train_val.prototxt","solver.prototxt"and"deploy.protxt"of the
"initModel.protxt"and I will study it myself.
Thank you very much!
Best wishes!
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#10>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AGKZJH8qKpyNFDbZOUyodMcMcpMrrY8Sks5sIxVBgaJpZM4OIxjF>
.
|
Hello: |
When training the attribute branches, I fixed the model and only learned
new layers for the new branches. In practice, like you guessed (2), I
trained each attribute branch at one time and WITHOUT the aesthetics score,
then I merged all the branches to put them together.
…On Wed, Jul 12, 2017 at 12:52 AM, yt605155624 ***@***.***> wrote:
Hello:
I would like to know that ,when you train the attributes branches, do you
(1)use 11 attributes scores+1 final score to train fc8s 、 fc10andfc11,
(2)or just use 11 attributes to train fc8s first and then delete the 11
losses and add a loss after fc11 (use final score ) to train fc10 and fc11
hope your early reply!
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#10 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AGKZJGlaetY6GkcM4UL5o6Z4Ay3Nm4nIks5sNHtYgaJpZM4OIxjF>
.
|
Thank you very much ,when I trained |
Great to see it works on your end!
It's true that one might get better or worse results due to some
randomness, like data preparation.
I notice that some people used the very deep inception model and get a huge
boost on AVA dataset. I believe if you want better performance, turning to
a deeper model is a good start point.
…On Wed, Jul 12, 2017 at 7:18 PM, yt605155624 ***@***.***> wrote:
Tank you very much ,when I trained
first I trained the seg model as fc8new->fc9new->loss get the rho 0.6522
then I trained the 12 scores at a time as fc8s->losses+fc11_socre->losses
,(my hdf5 input has 1 data and 12 labels),though I don't know how caffe bp
the muti-and-sequence labels, I can get the rho 0.6668 as your paper said
(att+seg+rank,though I haven't add rank...)
next time I would like to try new models for example vgg or resnet ,maybe
will get better scores,
thank you very much,I haven't learned a lot through your paper ,dataset
and codes
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#10 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AGKZJEVJVL7F1TzjLtCcFyTJPaK8F2Gpks5sNX5ggaJpZM4OIxjF>
.
|
Hi Shu Kong, I am trying to follow "initModel.prototxt" to reproduce your work based on the Images in "datasetImages_warp256.zip" and labels in "imgListFiles_Label.zip", and currently I could only get the rho=0.56... I got some questions about the implementation and hope you may help me on that:
Looking forward to your reply~ :) |
Hello, Jiyuan,
Thanks for trying all these.
When I did this project, I first trained for binary classification on the
aesthetic label by fine-tuning Alexnet. Then over the classification model,
I build new layers for regression and fine-tune the model. Based on the
regression model, I added in the rank loss and then built in the branches
for attribute learning and fine-tune further.
My impression was that it's hard to train directly for aesthetics
regression from Alexnet.
As for your first question on whether the sigmoid layer is necessary -- I
am not totally sure if that is necessary after these years. From more
experience since then, I would not use sigmoid layer now. I feel that
sigmoid makes it a little hard for regression, while the linear layer
(convolutoin or relu) seems easier.
As for the input image size -- If I remember it correctly, 256x256 is the
resized size while 227x227 is the real resolution as input image to the
model. During training with caffe, images are first resized into 256x256,
then randomly cropped 227x227 as input. So for testing/demo, 227x227 is the
input size. If you evaluate the model, it always improves by randomly
cropping more 227x227 from the single image -- so it is called test
augmentation -- but it is secondary.
Hope it helps.
Regards,
Shu
…On Mon, Jul 29, 2019 at 8:38 PM JiyuanLi ***@***.***> wrote:
Hi Shu Kong,
I am trying to follow "initModel.prototxt" to reproduce your work based on
the Images in "datasetImages_warp256.zip" and labels in
"imgListFiles_Label.zip", and currently I could only get the rho=0.56... I
got some questions about the implementation and hope you may help me on
that:
1. The final score is ranged from 01, but in "initModel.prototxt" the
output layer is a innerProduct layer which could produce scores from -infinf,
just wonder if a sigmoid function is needed at the end?
2. In my training process:
1> I firstly trained the all the layers based on pre-trained alexnet
from *conv1* to *fc7* + *fc8new* and *fc9new* with the regression
loss, but after 2000 epochs only a rho=0.56 for the entire validation set
is derived;
2> secondly I trained the whole network with all attribute loss at
*fc9s* (fc9new is deleted) + final score loss at *fc11*, but it seems
very little improvements could be achieved
Not sure if my training steps are correct and whether more epochs are
needed
3. In the paper the input image is resized to 256*256 and random
cropped to get a 227*227 segment, but in the "caffe_prediction.py" it
seems the image is resized to 227*227 directly, which method should I use
then?
Looking forward to your reply~ :)
Best Regards,
Jiyuan
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#10?email_source=notifications&email_token=ABRJSJCJRKLG7ER7E66Y7TLQB6ZTPA5CNFSM4DRDDDC2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3CU7DI#issuecomment-516247437>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABRJSJA3CLFGM4DK4LLZAK3QB6ZTPANCNFSM4DRDDDCQ>
.
|
Hi Shu, Thanks a lot for your reply, actually I tried re-implementing the network described in "initModel.prototxt" via pytorch and adopted the coefficients in "initModel.caffemodel" directly, but the final rho on the entire test set was only around 0.43... Not sure if these coefficients are the final training results? I also noticed "Per Pixel Mean" is used at pre-processing stage to get the "Zero Mean Image", but there are two sets of mean vectors in "initModel.caffemodel"... The one I used is "mean_warp", but actually I'm just not sure about the differences... Best Regards, |
Hi.I doing garbage classification project for detecting biodegradable and nonbiodegradable wastes.I am using bvlc_alexnet model.How to label only two classes(bio,nonbio) in output layer. |
Dear Shu Kong:
I am a student from China ,my name is Yuan Tian , and I am studying deepImageAestheticAnalisis myself recently,I have read the paper [Photo Aesthetics Ranking Network with Attributes and Content Adaptation],and run the python code "caffe_prediction.py",but I'd like to know how can I train the net on my own datasets , I found that in the "initModel.prototxt",there are no data layer and no loss layers,I want to know what the input looks like? Is it leveldb , lmdb or h5? I found that in this net you have taken the 11 attributes in account ,so if there have 12 labels have to input into the net , as 11 for the 11 attributes and 1 for the final score?So does the input have one file of images and 12 txts of labels ? I am not clearly...
So , is it possible that you tell me the way to train the "initModel.prototxt"and tell me what do the input ,input layer and the loss layers look like in training step ,or does it possible you send me the "train_val.prototxt","solver.prototxt"and"deploy.protxt"of the "initModel.protxt"and I will study it myself.
Thank you very much!
Best wishes!
The text was updated successfully, but these errors were encountered: