Click-through rate (CTR) prediction is a critical task for various industrial applications such as online advertising, recommender systems, and sponsored search. FuxiCTR provides an open-source library for CTR prediction, with key features in configurability, tunability, and reproducibility. We hope this project could promote reproducible research and benefit both researchers and practitioners in this field.
-
Configurable: Both data preprocessing and models are modularized and configurable.
-
Tunable: Models can be automatically tuned through easy configurations.
-
Reproducible: All the benchmarks can be easily reproduced.
-
Extensible: It can be easily extended to any new models, supporting both Pytorch and Tensorflow frameworks.
We have benchmarked FuxiCTR models on a set of open datasets as follows:
- β Benchmark datasets for CTR prediction
- β Benchmark settings and running steps
- β Benchmark leaderboard for CTR prediction
FuxiCTR has the following dependencies:
- python 3.9+
- pytorch 1.10.0--2.1.2 (if using for torch models)
- tensorflow 2.1 (if using for tensorflow models)
Please install other required packages via pip install -r requirements.txt
.
-
Run the demo examples
Examples are provided in the demo directory to show some basic usage of FuxiCTR. Users can run the examples for quick start and to understand the workflow.
cd demo python example1_build_dataset_to_parquet.py python example2_DeepFM_with_parquet_input.py
-
Run a model on tiny data
Users can easily run each model in the model zoo following the commands below, which is a demo for running DCN. In addition, users can modify the dataset config and model config files to run on their own datasets or with new hyper-parameters. More details can be found in the README.
cd model_zoo/DCN/DCN_torch python run_expid.py --expid DCN_test --gpu 0 # Change `MODEL` according to the target model name cd model_zoo/MODEL python run_expid.py --expid MODEL_test --gpu 0
-
Run a model on benchmark datasets (e.g., Criteo)
Users can follow the benchmark section to get benchmark datasets and running steps for reproducing the existing results. Please see an example here: https://github.com/reczoo/BARS/tree/main/ranking/ctr/DCNv2/DCNv2_criteo_x1
-
Implement a new model
The FuxiCTR library is designed to be modularized, so that every component can be overwritten by users according to their needs. In many cases, only the model class needs to be implemented for a new customized model. If data preprocessing or data loader is not directly applicable, one can also overwrite a new one through the core APIs. We show a concrete example which implements our new model FinalMLP that has been recently published in AAAI 2023.
-
Tune hyper-parameters of a model
FuxiCTR currently support fast grid search of hyper-parameters of a model using multiple GPUs. The following example shows the grid search of 8 experiments with 4 GPUs.
cd experiment python run_param_tuner.py --config config/DCN_tiny_parquet_tuner_config.yaml --gpu 0 1 2 3 0 1 2 3
If you find our code or benchmarks helpful in your research, please cite the following papers.
- Jieming Zhu, Jinyang Liu, Shuai Yang, Qi Zhang, Xiuqiang He. Open Benchmarking for Click-Through Rate Prediction. The 30th ACM International Conference on Information and Knowledge Management (CIKM), 2021. [Bibtex]
- Jieming Zhu, Quanyu Dai, Liangcai Su, Rong Ma, Jinyang Liu, Guohao Cai, Xi Xiao, Rui Zhang. BARS: Towards Open Benchmarking for Recommender Systems. The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2022. [Bibtex]
Welcome to join our WeChat group for any question and discussion. If you are interested in research and practice in recommender systems, please reach out via our WeChat group.