Abstract: Medical image processing in general and brain image processing in
particular are computationally intensive tasks. Luckily, their use can be liberalized by
means of techniques such as GPU programming. In this article we study NiftyReg, a
brain image processing library with a GPU implementation using CUDA, and analyse
different possible ways of further optimising the existing codes. We will focus on fully
using the memory hierarchy and on exploiting the computational power of the CPU. The
ideas that lead us towards the different attempts to change and optimize the code will
be shown as hypotheses, which we will then test empirically using the results obtained
from running the application. Finally, for each set of related optimizations we will study
the validity of the obtained results in terms of both performance and the accuracy of
the resulting images.