Digital Terrain Models (DTM) can be accurately obtained from clouds of LiDAR points but the corresponding
cloud processing time can be prohibitive. This paper describes several optimization techniques that have been
applied to the Overlap Window Method (OWM) that is a key component in DTM applications. OWM was
originally implemented in R which translates into serious limitations in terms of the size of the LiDAR point
cloud that can be processed. We have ported the code to C++, significantly optimized the data structure to
minimize memory accesses, and developed parallel implementations for CPU and GPU commodity devices using
oneAPI libraries and tools. This results in CPU and GPU versions that are up to 19x and 83x faster, respectively,
than an OpenMP baseline that uses eight CPU cores. Most importantly, the proposed optimizations for CPU
and GPU can be paramount to get the most out of other LiDAR-based algorithms in which the careful selection
of the right data structure, parallelization strategies and memory access reduction techniques will certainly
result in significant performance improvements.