pilots
folder contain raw exported data from toloka for all our pilots
saliency_maps.ipynb
contains code for generating saliency mapsstage-3-fine-tuned-res50.pkl
is the pretrained model. It is loaded insaliency_maps.ipynb
notebook, use it for inference tasks or CAM generation
qc.py
includes functions for running majority voting (fixed_annotations, free_text fubcs) and crowdtruth metrics (helper func) on Toloka labeling output. Due to high variability in json outputs (free text vs fine grained vs coarse grained) different annotation extraction parts are commented out within these functions.
phase2_results
folder contains post-processed outcomes of phase 2 using CrowdTruthphase2_analysis.ipynb
contains graph generation based upon the post-processed data
phase3_results
folder contains phase 3 checkbox outcomes (exported directly from toloka): 2 files for two completed poolsphase3_aggregation.ipynb
contains aggregation based on final results
- pytorch
- fastai
- slugify
- stringcase
- matplotlib
- scikit-image
- tqdm
- pandas
- deep_translator
- CrowdTruth
Our project makes use of CrowdTruth framework
@article{CrowdTruth2,
author = {Anca Dumitrache and Oana Inel and Lora Aroyo and Benjamin Timmermans and Chris Welty},
title = {CrowdTruth 2.0: Quality Metrics for Crowdsourcing with Disagreement},
year = {2018},
url = {https://arxiv.org/abs/1808.06080},
}