Beschreibung
In this thesis, we study the following three types of periodic string patterns and some of their variants.
Firstly, we consider maximal d-repetitions. These are substrings that are at least 2+d times as long as their minimum period.
Secondly, we consider 3-cadences. These are arithmetic subsequence of three equal characters.
Lastly, we consider maximal pairs. These are pairs of identical substrings.
Maximal d-repetitions and maximal pairs of uncompressed strings are already well-researched. However, no non-trivial upper bound for distinct occurrences of these patterns that take the compressed size of the underlying strings into account were known prior to this research.
We provide upper bounds for several variants of these two patterns that depend on the compressed size of the string, the logarithm of the string's length, the highest allowed power and d.
These results also lead to upper bounds and new insights for the compacted directed acyclic word graph and the run-length encoded Burrows-Wheeler transform.
We prove that cadences with three elements can be efficiently counted in uncompressed strings and can even be efficiently detected on grammar-compressed binary strings. We also show that even slightly more difficult variants of this problem are already NP-hard on compressed strings.
Along the way, we extend the underlying geometry of the convolution from rectangles to arbitrary polygons. We also prove that this non-rectangular convolution can still be efficiently computed.