This paper focuses on identifying people based on their gait using a non-invasive approach. Traditional methods rely on gait signatures derived from binary energy maps, which introduce noise. Instead, the authors explore the use of raw pixel data and compare different Convolutional Neural Network (CNN) architectures across three modalities: gray pixels, optical flow, and depth maps. Tested on the TUM-GAID and CASIA-B datasets, the study finds that (i) raw pixel values are competitive with traditional silhouette-based features, (ii) combining pixel data with optical flow and depth maps yields state-of-the-art results even at lower image resolutions, and (iii) the choice of CNN architecture significantly impacts performance.