It might be possible to store the Jacobian in single precision (we can still compute difference terms in double precision, and cast). It really depends on how big the Jacobian elements are (and that might vary by application). But this could reduce GPU memory a lot.