3090 and following versions:
Windows 10
python 3.9.5
tensorflow 2.5
CUDA 11.2.2 (path set)
CuDNN 8.1
fp32 works:
python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=128 --model=resnet50 --variable_update=parameter_server
fp16 not:
python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=128 --model=resnet50 --variable_update=parameter_server --use_fp16
error:
Internal: cuDNN launch failure : input shape ([128,112,112,64])