A time series is a chronologically ordered set of samples of a real-valued variable that can have millions of observations. Time series analysis seeks extracting models in a large variety of domains [31] such as epidemiology, DNA analysis, economics, geophysics, speech recognition, etc. Particularly, motif [4] (similarity) and discord [13] (anomaly) discovery has become one of the most frequently used primitives in time series data mining [20], [2], [32], [7], [34], [1]. It poses the problem of solving the all-pairs-similarity-search (also known as similarity join). Specifically, given a time series broken down into subsequences, retrieve the most similar subsequences (motifs) and the most different ones (discords).
One of the state-of-the-art methods for motif and discord discovery is Matrix Profile [35]. It solves the similarity join problem and allows time-manageable computation of very large time series. In this work, we focus on this technique, which features the possibility of detecting similarities, anomalies, and predicting outcomes. It provides full joins without the need for specifying a similarity threshold, which is a very challenging task in this domain. The matrix profile is another time series representing the minimum distance subsequence for each subsequence in the time series (motifs). Maximum distance values of the profile highlight the most dissimilar subsequences (discords).