Skip to content

Understanding TPU efficiency in the examples #1408

Discussion options

You must be logged in to vote

The runtime comparisons for ImageNet is what we usually see when comparing GPUs and TPUS. However, since TPU uses bfloat16 for matrix multiplications, this can in some extreme cases affect training stability, which is what probably happens in PixelCNN++. In that example we expect the test loss to be below 2.92, which requires a very precise setup, and using TPU here actually slows down training.

Please note the slowness in PixelCNN++ is a known issue (#458). Copying @j-towns's latest response on that issue:

"Based on some other generative modelling work which I've been doing on TPU lately, it seems the precision parameter to layers like Conv makes a small but noticable difference to train…

Replies: 2 comments

Comment options

You must be logged in to vote
0 replies
Answer selected by marcvanzee
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants