I would have assumed "two random choices" would have given a better lower average load.
Given that you're already tracking latency information for epsilon-greedy, implementing two random choices should be fairly straight-forward.
For more information on "two random choices", see http://www.eecs.harvard.edu/~michaelm/postscripts/handbook2001.pdf .