For the code in 11.56 and 11.57, I had been trying to replicate the results using the pm.math.softmax method, however, I was getting errors during sampling due to it producing negative values.
Adjusting p_ = at.nnet.softmax(s) to be p = pm.math.softmax(s, axis=1) resolved the problem.