Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In which file are the 11(+1) attribute scores recorded with image IDs? #13

Open
spearsem opened this issue Aug 23, 2017 · 5 comments
Open

Comments

@spearsem
Copy link

spearsem commented Aug 23, 2017

Thank you so much for sharing your data and code! I have a simple question if you can spare some time for helping me out.

I downloaded the full data set and the file imgListFiles_label.zip, but I cannot determine which files contain the actual labeled scores per each image.

There are files of name "TestRegression...", "TestNewRegression..", "TrainRegression", and "ValidationRegression".

I am just looking for the (raw or averaged) individual scores for each aesthetic component and the overall score. Not any post-processing outputs that are specific to your model's input.

Assuming these represent splits of the data, should I concatenate all of these files together if I want a list for all of the AADB images? Also, what does "TestRegression" vs "TestNewRegression" mean. Do I need both, or other "TestRegression"?

Thanks!

@aimerykong
Copy link
Owner

aimerykong commented Aug 24, 2017 via email

@spearsem
Copy link
Author

spearsem commented Aug 27, 2017

Hi Shu,

Thanks for following up. When I approach it with your suggestion, I am finding there is a difference of 127 image file names between the set of 9958 images inside the 'datasetImages' directory and the list of images you get by concatenating them from 'TestNew', 'Train', and 'Validation' files.

For example, if I pick a single attribute, like 'DoF', then all 9958 rows of images should be accounted for by looking at all rows of imgListTestNewRegression_DoF.txt`, `imgListTrainRegression_DoF.txt`, and imgListValidationRegression_DoF.txt`.

However, if I separately make a set from all image file names in the 'datasetImages' folder, the two sets are not equal.

In [10]: filename_set = set(os.listdir("../datasetImages"))

In [11]: filenames_from_annotations = set(df.image_name.values)

In [12]: filename_set.difference(filenames_from_annotations)
Out[12]: 
{'farm1_257_19551457934_78009e3cdf_b.jpg',
 'farm1_257_20081367568_9a5e46c52d_b.jpg',
 'farm1_258_19977693740_69a64c722c_b.jpg',
 'farm1_258_19998330681_7947141b9a_b.jpg',
 'farm1_258_20153488196_8430392dfd_b.jpg',
 'farm1_260_20011542710_18512a2e49_b.jpg',
 'farm1_261_19650341253_739d80b488_b.jpg',
 'farm1_261_19661237724_ca01339159_b.jpg',
 'farm1_261_20025642938_efb5355efe_b.jpg',
 'farm1_263_20112620711_1c82b95850_b.jpg',
 'farm1_264_20112107155_d1f42b858a_b.jpg',
 'farm1_264_20191045435_20ec509328_b.jpg',
 'farm1_265_19537402954_f14313e970_b.jpg',
 'farm1_265_20001813849_5306c57b05_b.jpg',
 'farm1_265_20154984142_7b03f70b4b_b.jpg',
 'farm1_265_20182300971_b73b9544bc_b.jpg',
 'farm1_267_20002219440_4532b202c1_b.jpg',
 'farm1_268_20188568811_697ba6146b_b.jpg',
 'farm1_269_19987423131_934dde76f2_b.jpg',
 'farm1_270_20055053966_49218d4b2b_b.jpg',
 'farm1_272_20282861605_161f622564_b.jpg',
 'farm1_274_20071314641_4f74bed682_b.jpg',
 'farm1_275_19976062488_db63a20c8f_b.jpg',
 'farm1_276_20010653219_3b2fdd31bb_b.jpg',
 'farm1_277_19923603720_834f9acc23_b.jpg',
 'farm1_277_20282094821_24bc8bbb50_b.jpg',
 'farm1_278_20101808021_71ab53801c_b.jpg',
 'farm1_278_20156668495_5384aed590_b.jpg',
 'farm1_279_20082259019_0f83066639_b.jpg',
 'farm1_281_20277374661_efc9d5cbb2_b.jpg',
 'farm1_282_19639704434_344f38bc28_b.jpg',
 'farm1_282_20073012902_4f496354f3_b.jpg',
 'farm1_283_20145205466_d31c079919_b.jpg',
 'farm1_285_19794630170_6ebcce2e3f_b.jpg',
 'farm1_285_20181900336_ab89701c63_b.jpg',
 'farm1_285_20210153511_b1ca18f614_b.jpg',
 'farm1_287_20198450001_579a228325_b.jpg',
 'farm1_287_20205812381_e88e4ed07c_b.jpg',
 'farm1_288_20218571861_392e702708_b.jpg',
 'farm1_289_19474429064_c6c95ca5f8_b.jpg',
 'farm1_293_19561491214_56756bc9fa_b.jpg',
 'farm1_293_20183043961_17b5521f09_b.jpg',
 'farm1_301_20174123975_4660281e14_b.jpg',
 'farm1_304_19537298213_42a785d534_b.jpg',
 'farm1_307_20194160235_412611ca37_b.jpg',
 'farm1_310_19440925454_e17979bfe9_b.jpg',
 'farm1_310_19545127013_6f9ee1e594_b.jpg',
 'farm1_310_20008499479_ec470a400c_b.jpg',
 'farm1_310_20156587556_748d5c2a95_b.jpg',
 'farm1_313_20276894925_cdaa0aeddf_b.jpg',
 'farm1_314_20109140896_c842328513_b.jpg',
 'farm1_315_19555734304_e2c3fe5045_b.jpg',
 'farm1_316_19806425039_cd3d8d6481_b.jpg',
 'farm1_321_19928964810_58fc6afff5_b.jpg',
 'farm1_321_20268434772_a42fedb758_b.jpg',
 'farm1_322_19509281824_a28e71ac42_b.jpg',
 'farm1_322_20177766422_2a6a383a0b_b.jpg',
 'farm1_323_19512878813_df9e5f80a9_b.jpg',
 'farm1_323_19642688713_f540f8e28a_b.jpg',
 'farm1_323_20131578256_0a0035c0a4_b.jpg',
 'farm1_323_20188849836_e3d4f8c4c0_b.jpg',
 'farm1_324_19903715209_da2412b794_b.jpg',
 'farm1_325_19999445778_f9894b0ac2_b.jpg',
 'farm1_327_19995572685_80e5e27670_b.jpg',
 'farm1_328_19972035690_01ba0f3ac3_b.jpg',
 'farm1_328_19988362780_e5b8e5dbac_b.jpg',
 'farm1_329_20088721718_b4d794c353_b.jpg',
 'farm1_330_20100146912_225be2471c_b.jpg',
 'farm1_331_19969233476_f2f4eabf76_b.jpg',
 'farm1_333_20108487141_98b794abc4_b.jpg',
 'farm1_335_20016713269_0cf280cb2a_b.jpg',
 'farm1_335_20019567978_b0c1954a64_b.jpg',
 'farm1_336_19546620374_f489bf5820_b.jpg',
 'farm1_339_19708725023_b585ea2968_b.jpg',
 'farm1_340_20066952012_2b02827e07_b.jpg',
 'farm1_341_20099301301_aaee7e576f_b.jpg',
 'farm1_341_20170995106_f91c81e04a_b.jpg',
 'farm1_341_20190091625_87e842697e_b.jpg',
 'farm1_341_20204595795_fd5b05bbb5_b.jpg',
 'farm1_342_20176919346_64b1afc730_b.jpg',
 'farm1_343_19923860928_7817f0a2e4_b.jpg',
 'farm1_346_20107559161_96e716418b_b.jpg',
 'farm1_349_20155857456_423018d405_b.jpg',
 'farm1_356_20021866788_37d48f6cc4_b.jpg',
 'farm1_358_19552188594_ac3e5ddcb2_b.jpg',
 'farm1_358_20120536361_26b988773d_b.jpg',
 'farm1_362_19482800604_6b55aa36bf_b.jpg',
 'farm1_362_19513797184_9e80bf33af_b.jpg',
 'farm1_367_19574973194_0813255782_b.jpg',
 'farm1_367_19579961904_2afe3d61f2_b.jpg',
 'farm1_372_20182354826_288048e80f_b.jpg',
 'farm1_373_20051791466_465c2f0cd7_b.jpg',
 'farm1_378_19994962928_4d1e3b5273_b.jpg',
 'farm1_386_20077151338_565eca8245_b.jpg',
 'farm1_391_19798851399_148ffae7bb_b.jpg',
 'farm1_392_20149304556_66089212d3_b.jpg',
 'farm1_392_20238659696_ab64fea61c_b.jpg',
 'farm1_397_20133335455_6ac76df185_b.jpg',
 'farm1_399_19477069424_487c07b573_b.jpg',
 'farm1_403_20178595996_91aa9c535f_b.jpg',
 'farm1_411_20078230839_df5d0bc006_b.jpg',
 'farm1_411_20147625392_b9af7bf64e_b.jpg',
 'farm1_415_19918210110_b64ca0c8ba_b.jpg',
 'farm1_422_19805700400_a9a92b5640_b.jpg',
 'farm1_428_19999459841_fe1801bcb9_b.jpg',
 'farm1_430_20284364465_0d8c21e1be_b.jpg',
 'farm1_437_19967443660_7d78f0f45e_b.jpg',
 'farm1_439_20270730081_41d0a0ef74_b.jpg',
 'farm1_448_20254011462_26901f13df_b.jpg',
 'farm1_451_19647580264_fa094e4e18_b.jpg',
 'farm1_457_19998705569_768dbff33a_b.jpg',
 'farm1_470_20167424041_68c6fa7226_b.jpg',
 'farm1_477_20208306105_13f7718315_b.jpg',
 'farm1_486_20002923240_770ed2f527_b.jpg',
 'farm1_493_19470626354_474a3f8453_b.jpg',
 'farm1_514_19888436879_c91d7a9bc9_b.jpg',
 'farm1_530_20093590000_52d0b7495a_b.jpg',
 'farm1_533_20009980668_db03e9fdf0_b.jpg',
 'farm1_544_19980254768_938d15ac44_b.jpg',
 'farm1_547_19983479270_9f267ff96c_b.jpg',
 'farm1_567_19709308274_d7a0697d8a_b.jpg',
 'farm4_3668_19478079044_8b1075a121_b.jpg',
 'farm4_3675_19534648023_6b02413da6_b.jpg',
 'farm4_3676_19516813253_a6796867bc_b.jpg',
 'farm4_3750_20121023921_2493f86a5c_b.jpg',
 'farm4_3828_20273296635_0f1039c8c1_b.jpg',
 'farm4_3832_19557447513_3fc826aff4_b.jpg'}

In above, df is a pandas.DataFrame which I construct by concatenating the files for a single attribute, like DoF, so that the image_name column contains all 9958 image names.

Is there a location with the 127 "new" image files, that I should add to the 'datasetImages' folder?

I want to be sure I am getting the file names correct so I can align one CSV file which will have 9958 rows and have separate columns for each attribute score.

@spearsem spearsem reopened this Aug 27, 2017
@aimerykong
Copy link
Owner

aimerykong commented Aug 28, 2017 via email

@spearsem
Copy link
Author

spearsem commented Aug 28, 2017

Thanks again for your help. I am sorry I am missing some detail preventing me from understanding. When I look at the number of images listed in any of the "TestNew" files, it is 1000, just like for the plain "Test" files. For example:

In [4]: with open("imgListTestRegression_BalacingElements.txt", 'r') as _f:
   ...:     from_test_regression = _f.read().splitlines()
   ...:     

In [5]: with open("imgListTestNewRegression_BalacingElements.txt", 'r') as _f:
   ...:     from_test_new_regression = _f.read().splitlines()
   ...:     
   ...:     

In [6]: len(from_test_regression)
Out[6]: 1000

In [7]: len(from_test_new_regression)
Out[7]: 1000  # <-- why is this still 1000?

In [8]: with open("imgListTrainRegression_BalacingElements.txt", 'r') as _f:
   ...:     from_train_regression = _f.read().splitlines()
   ...:     
   ...:     

In [9]: with open("imgListValidationRegression_BalacingElements.txt", 'r') as _f:
   ...:     from_validation_regression = _f.read().splitlines()
   ...:     
   ...:     
   ...:     

In [10]: len(from_train_regression)
Out[10]: 8458

In [11]: len(from_validation_regression)
Out[11]: 500

In [12]: 1000 + 8458 + 500
Out[12]: 9958

So, if I use TestNewRegression_... files, I still always see 9958 images, it's just that there are a new set of 127 image labels which do not appear in the folder datasetImages.

I also see 9958 images in the datasetImages folder:

ely@nortonstowe:~/Downloads/aadb/imgListFiles_label$ ls ../datasetImages/*.jpg | wc -l
9958

In summary, whether I use the "TestNew" files or the "Test" files, the total number of rows for a fixed attribute, across the {"TestNew" or "Test"}, "Train", and "Validation" files, is always 9958 (never 9831), and the number of images inside datasetImages is also 9958.

I guess my question is: why are there 1000 entries in the "TestNew" files -- it sounds like you are saying only 783 of those entries should be valid, and the others are expected to be incorrect (missing) image file names that do not exist in datasetImages?

@aimerykong
Copy link
Owner

aimerykong commented Aug 28, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants