Open
Conversation
- Add support for 3-4 channels vector texture sampling - Add new function for sampling with zero outside of bounds
- Rename nonlinear_metric -> nl_metric - Disallow ncc (will restore in future) - Implement -warp and -transformed options
Avoid threaded_copy overhead when dropping alpha by forcing 2 inner loop axes for the Extract1D RGB copy path.
It doesn't work well and introduces too much shear in deformations
We compute an initial SVF from the affine transform and then feed into the nonlinear registration pipeline at the coarsest level.
The previous nonlinear LNCC update used a closed-form LCC-demons style solve. It was also seemingly too aggressive and produced deformations to with less uniform Jacobian values. Switch the LNCC update to a first-order force obtained by differentiating the local correlation directly and mapping that intensity derivative through the moving-image spatial gradient. This gives a simpler update step and produces deformations that produce same increase in overlap, but with a smaller volume change (tested on nirep and mindboggle-101 data set). We also perform an update renormalisation based on a target maximum step in voxel units, so update magnitudes remain tunable after moving to the gradient force.
- The search now evaluates a real grid of candidates instead of a thin diagonal through axis-angle space. - Previously, the search only replaced the rotation parameters and left translation unchanged. That is fine only if the search pivot happens to be the true rotation centre. The new code keeps an anchor point fixed while rotating.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This work is a from scratch implementation of non-linear scalar image registration on the GPU. The two main objectives are to:
mrregister, both in terms of the quality of the warps generated and runtime performance. The code is able to perform registration of two large images (~256^3) in well under 20 seconds on Apple M2 Max (which is at least an order of magnitude faster compared tomrregisterandANTs). It also outperformsmrregisterboth in terms of increased (DICE) overlap and also better behaved Jacobian statistics in my tests.My original intent was to introduce this PR in multiple stages to make it easier for review, but due to time constraints I'm posting this as it is.
The implementation is usable and has been tested on various machines, but it may need refinements in terms of code quality (and possibly architectural decisions) in order to be merged.
My hope is that this will be useful for someone in the future to improve upon or at least take inspiration from to use the GPU API we have in place.
The registration algorithm is an implementation of Log-Demons style approach as originally introduced by Vercauteren et al. in their Symmetric Log-Domain Diffeomorphic Registration: A Demons-based Approach paper. It supports two different metrics: SSD (sum of square differences) and LNCC (local normalised cross-correlation). I have primarily focused on tuning the code and parameters to work with LNCC as, generally speaking, one is able to achieve higher quality results than the naive SSD metric. The implementation also ensures that the generated warps are diffeomorphic and symmetric.
For reference (see the original paper for more details), here is a brief description of the algorithm:
Below are some thoughts about the implementation and its possible future extensions.
LNCC update formula
Originally, the LNCC update step was adapted from LCC-Demons: A robust and accurate symmetric diffeomorphic registration algorithm. That paper extends the original Log-Demons formulation of Vercauteren et al. by replacing SSD with a local cross-correlation metric. However, its update rule is derived in a midway-space formulation rather than the Lie-algebra averaging framework used here. In the initial version of my implementation, I therefore adapted the update expression from Appendix B of the paper to the asymmetric setting so that it could still be combined using Lie-algebra averaging.
However, despite a significant amount of tuning (and rechecking of the derivation math), I wasn't able to push the algorithm to provide the same deformation quality as the LNCC implementation in ANTs (measured via DICE overlap and Jacobian statistics of the warps using the NIREP and MindBoggle-101 datasets). I had various (untested) theories on why that would be the case, one of them being that an update mechanism based on a stationary velocity field may lack the necessary degree of freedoms of a more general framework like SyN.
However, reading Symmetric diffeomorphic image registration with cross-correlation: Evaluating automated labeling of elderly and neurodegenerative brain p.34, I came across this statement:
This inspired me to replace the second-order approximated LCC-Demons force with a first-order analytic gradient force. The implementation in$R$ , $M$ , $R^2$ , $M^2$ , and $RM$ ($R$ for reference intensity and $M$ for moving intensity), from which the local means, covariance, and variances are recovered. Writing the local correlation as
local_lncc_update.slangcomputes the local LNCC moments over a window, writing the sums ofwith$A$ the local covariance and $B$ , $C$ the local variances of the fixed and moving intensities, the update is obtained by differentiating $\rho$ with respect to the moving intensity at the centre voxel and then using the first-order relation
This gives a local ascent direction proportional to
where$R_c$ and $M_c$ are the centred fixed and moving intensities. In the code, this becomes a scalar coefficient multiplying the moving image spatial gradient in scanner coordinates. The full derivation is a bit long, but I'd happy to write it here if needed.
Surprisingly, this replacement (plus a small renormalisation step) proved to yield much better registrations, arguably similar in quality to those produced by ANTs in terms of overlap improvement and Jacobian statistics.
The old update scheme is still reachable in the git history of the changes here.
Regularisation
As mentioned above, the original log-demons approach uses two forms of regularisation: a fluid-like regularisation of the update field and a diffusion-like regularisation of the velocity field. The former acts mainly as a preconditioning step, whereas the latter can be viewed as the primary regularisation mechanism. In both cases, regularisation is implemented through Gaussian smoothing.
The recent MMORF paper shows the importance of biologically plausible warp regularisation for achieving high registration accuracy without excessive shape or volume distortion.
While Gaussian smoothing regularises the deformation field, it does so only indirectly: it encourages spatial smoothness, but does not deliberately optimise for minimal deformation or for maximal metric improvement at a given deformation cost.
I have very briefly researched an alternative regularisation scheme applied to the update step in the current implementation. In contrast to MMORF, where regularisation is defined directly on the displacement field and can therefore penalise local volume and shape changes more explicitly, doing the same in an stationary velocity field (SVF) based framework is less straightforward. In an SVF parameterisation, the optimised quantity is the velocity field rather than the final displacement, so quantities such as local volume change and shear are only related to the transformation indirectly through the exponential map.
One possible approximation in the SVF setting is to regularise first-order derivatives of the velocity update rather than Gaussian-smoothing the field directly. In particular, one can form the infinitesimal strain-rate tensor,
which captures the local symmetric part of the deformation induced by the update field$u$ . Its trace, $\nabla \cdot u$ , corresponds to infinitesimal volume change, while its deviatoric part,
captures shear-like distortion. A regulariser that penalises$(\nabla \cdot u)^2$ and $|\mathrm{dev},\varepsilon(u)|^2$ would therefore discourage local expansion/contraction and shear, respectively. Although this is only an indirect proxy in an SVF framework, it should provide a more targeted alternative to Gaussian smoothing.
A relevant article would be Constrained H^1-regularization schemes for diffeomorphic image registration by Mang and Biros.
FOD reorientation
I haven't really given too much thought on how FOD reorientation could be implemented in the current framework, but I think it's worth mentioning that the current algorithm is completely reliant on GPU 3D textures (to take advantage of hardware linear interpolation). This will obviously be a limitation when it comes to dealing with FOD images, where each voxel contains a set of spherical harmonic coefficients rather than a single scalar value. For that purpose, one would need to devise a way to upload the images as regular buffer objects on the GPU and then wire the appropriate sampling mechanisms on the shader side. For optimal performance, it might be worth considering the use of space-filling curves.
Other
There's probably bugs that need to be squashed out. The code is also missing targeted unit tests.
As mentioned earlier, the code has mainly tuned to work with LNCC as the metric. Changes might be needed to make it work best with SSD.
Multicontrast registration for nonlinear registration is not supported.
The affine code path and the non-linear code were practically developed independently. Not much thought has been given to ensure consistency in terms of naming and code structure.
Shader code is currently resident in
cpp/core/gpu/shaders/registrationand is symlinked in the build directory. A proper installation strategy is required, see Handle the installation of shader files for GPU commands #3289.For multi-modal settings, a metric like mutual information may be required. A possible approach is described in Diffeomorphic Demons using Normalised Mutual Information by Modat et al. However, given that mutual information is a global measure and non-linear registration involves local deformations, an approach such as MIND may be considered (I believe there is an implementation of this approach in niftyreg).