# «Reconstructing words from a ﬁxed palindromic length sequence∗ Alexandre Blondin Mass´1, Sreˇko Brlek1, Andrea Frosini2, S´bastien Labb´1, ...»

Reconstructing words from a ﬁxed palindromic

length sequence∗

Alexandre Blondin Mass´1, Sreˇko Brlek1, Andrea Frosini2, S´bastien Labb´1,

e c e e

and Simone Rinaldi2

1 Laboratoire de Combinatoire et d’Informatique Math´matique,

e

Universit´ du Qu´bec ` Montr´al,

e e a e

C. P. 8888 Succursale “Centre-Ville”, Montr´al (QC), CANADA H3C 3P8,

e

Brlek.Srecko@uqam.ca, [blondin masse.alexandre, Labbe.Sebastien]@courrier.uqam.ca 2 Dip. di Scienze Matematiche ed Informatiche Roberto Magari, Universit` degli Studi di Siena a Pian dei Mantellini 44, 53100 Siena, Italy [frosini, rinaldi]@unisi.it Abstract. To every word w is associated a sequence Gw built by computing at each position i the length of its longest palindromic suﬃx. This sequence is then used to compute the palindromic defect of a ﬁnite word w deﬁned by D(w) = |w| + 1 − |Pal(w)| where Pal(w) is the set of its palindromic factors.

In this paper we exhibit some properties of this sequence and introduce the problem of reconstructing a word from Gw. In particular we show that up to a relabelling the solution is unique for 2-letter alphabets.

Key words: Palindromic complexity, defect, lacunas, reconstruction.

1 Introduction Among the many ways of measuring the information content of a ﬁnite word, counting the number of its distinct factors or subwords of given length has been widely used and known as its complexity. A reﬁnement of this notion amounts to restrict the factors to palindromes. The motivations for the study of palindromic complexity comes from many areas ranging from the study of Schr¨dinger operators in physics [4, 7, 20] to number theory [6] and combinao torics on words where it appears as a powerful tool for understanding the local structure of words. It has been recently studied in various classes of inﬁnite words, an account of which may be found in the survey provided by Allouche et al. [5].

In particular, the palindromic factors give an insight on the intrinsic structure, due to its connection with the usual complexity, of many classes of words.

For instance, they completely characterize Sturmian words [23], and for the class of smooth words they provide a connection with the notion of recurrence [12, 13].

∗ with the support of NSERC (Canada) 102 A. Blondin Mass´, S. Brlek, A. Frosini, S. Labb´ and S. Rinal

The problem of reconstructing words from partial information arise naturally.

**We mention a few of them that have been solved for a ﬁxed alphabet Σ:**

Some set A of factors is ﬁxed. Find the shortest words containing the set A of all factors of given length k. This leads to the De Bruijn sequences [15, 17, 19] whose construction uses a graph Gk where vertices are the given words of length k, and where edges model the scanning of the word by a window of size k. The solution is then obtained by computing all Eulerian cycles in the graph. It is worth noting that ﬁnding the lexicographically smallest such word is much easier: it is given by the lexicographic concatenation of Lyndon words on Σ whose lengths are divisible by k (See Fredericksen et al. [18]).

Some set A is ﬁxed along with some suitable hypothesis. Construct all words w such that the set of its factors A = F (w). The technique used for this problem is based on constructing a set of minimal forbidden words, that is the extensions of words in A that do not belong to A [9]. That technique was also used in [11] to construct words whose language of palindromes is a ﬁxed set P. It turns out that it is a rational language. Concerning multisets of subsequences, instead of factors we mention a general result. If the set A contains suﬃciently many subsequences of length k, then the solution is unique [26]: indeed, for a word w of lengh n 7 and k ≥ [n/2] the subsequences uniquely determine w, and for k log2 n they do not. See also an interesting combinatorial approach depending on the Burrows-Wheeler transform (See Mantaci et al. [25]).

Fixed complexity. The most famous example is that of Sturmian words (see Lothaire [22] for a substantial review) which are characterized by the complexity P (n) = n + 1 established by M. Morse [28]. Sturmian words are the discretization of lines with irrational slopes, and they are easily constructed from the continued fraction expansion corresponding to the irrational slope. The complexity is therefore not enough to characterize completely a word. However, in the case of the Thue-Morse complexity [10, 24], there are essentially only two such words [1, 2].

In this paper we introduce the problem of reconstructing a word from sequences describing its palindromic complexity. Droubay, Justin and Pirillo [16] noted that the palindrome complexity |Pal(w)| of a word w is bounded by |w| + 1, and observed that it is computed by a sequential algorithm listing the ﬁrst occurrences of longest palindromic suﬃxes, called unioccurrent in [16]. For our study we need the following two auxiliary functions on words. Given a word of length n, w : [0..(n − 1)] −→ Σ, we deﬁne two functions Gw, Hw : N −→ N by Gw (i) = |LPS(w[0..i])| and

We ﬁrst exhibit some combinatorial properties of the palindromic factors in words (Section 3) and use them in order to obtain properties of the sequences Reconstructing words from a ﬁxed palindromic length sequence 103 G and H (Section 4). Finally we study the problem of reconstructing words from given sequences, and establish conditions for unicity on 2-letter alphabets.

2 Preliminaries In what follows, Σ is a ﬁnite alphabet whose elements are called letters. By word we mean a ﬁnite sequence of letters w : [0..(n − 1)] −→ Σ, where n ∈ N. The length of w is |w| = n and w[i] or wi denote its i-th letter. The set of n-length words over Σ is denoted Σ n. By convention, the empty word is denoted ε and its length is 0. The free monoid generated by Σ is deﬁned by Σ ∗ = n≥0 Σ n.

The set of right inﬁnite words is denoted by Σ ω and we set Σ ∞ = Σ ∗ ∪ Σ ω.

Given a word w ∈ Σ ∞, a factor f of w is a word f ∈ Σ ∗ satisfying ∃x ∈ Σ ∗, ∃y ∈ Σ ∞, w = xf y.

If x = ε (resp. y = ε ) then f is called preﬁx (resp. suﬃx). The set of all factors of w is denoted by Fact(w), those of length n is Factn (w) = Fact(w) ∩ Σ n, and Pref(w) is the set of all preﬁxes of w. The number of occurrences of a factor f in w is denoted |w|f. A period of a word w is an integer p |w| such that w[i] = w[i + p], for all i |w| − p. If w = pu, with |w| = n and |p| = k, then p−1 w = w[k..(n−1)] = u is the word obtained by erasing p. A word is said to be primitive if it is not a power of another word. Two words u and v are conjugate when there are words x, y such that u = xy and v = yx. The conjugacy class of a word w is denoted by [w]; note that the length is invariant under conjugacy.

For a given word w of length n, any of its conjugates is obtained by cyclic permutation, that is σ i (w) = w[i..(n − 1)]w[0..(i − 1)].

The reversal of u = u0 u1 · · · un−1 ∈ Σ n is the word u = un−1 un−2 · · · u0, and a palindrome is a word p such that p = p. Since every word contains palindromes, the letters and ε being necessarily part of them, the set of its palindromic factors is Pal(w), and its palindromic complexity is denoted by |Pal(w)|. Conjugacy is an equivalence relation having numerous properties and for our purpose we need the following one easily obtained by induction: let p and q be two palindromes, then σ i (pq) = p′ q ′, for some palindromes p′ and q ′.

We start by quoting Lemma 1 of [8] in order to establish a useful combinatorial property.

Lemma 1 (Blondin Mass´ et al. [8]) Assume that w = xy = yz. Then for e some u, v, and some i ≥ 0 we have from [21]

(iii) w is a palindrome;

(iv) xyz is a palindrome.

Moreover, if one of the equivalent conditions above holds then (v) y is a palindrome.

As a consequence we have the following proposition.

Proposition 1 Assume that w = xp = qz where p and q are palindromes such that |q| |x|. Then w has period |x| + |z|, and xz is a product of two palindromes.

Proof. Since |q| |x|, there exists a non-empty word y such that q = xy and p = yz. It follows that

** w x = q z x = x y z x = x p x = x p x = x z y x = x z q = x z q.**

Considering qzx = xzq, we obtain from Equation (2) that |xz| is a period of wx. From Lemma 1 (iii), there exist palindromes u, v such that xz = uv. ⊓ In order to compute the palindromic complexity we need the function LPS : Σ ∗ −→ Σ ∗ which associates to any word w its longest palindromic suﬃx LPS(w).

Droubay, Justin and Pirillo [16] noted that the palindrome complexity |Pal(w)| of a word w is bounded by |w| + 1, and that ﬁnite Sturmian (and even episturmian) words realize the upper bound. Moreover they implicitly show that the palindrome complexity is computed by an algorithm listing the longest palindromic suﬃxes which amounts to compute for a word w the functions Gw, Hw : N −→ N deﬁned by

We often omit the subscript w in Gw and Hw when the context is clear. As an

**example let w = aababbaababaaabaab. Then we have the following table :**

i w a a b a b b a a b a baaabaab G H A position in the word w where H vanishes is called a lacuna in [8]. For instance the set of lacunas for w in the example above is {7, 9, 10, 17}. Equivalent words, that is words obtained by relabelling of the alphabet, have obviously the same functions G and H. For instance, on the 2-letter alphabet {a, b}, we have Gw = Gw and Hw = Hw, where ( ) is the morphism deﬁned by a = b, b = a.

Reconstructing words from a ﬁxed palindromic length sequence 105 The palindromic defect of a ﬁnite word w is deﬁned in Brlek et al. [11] by D(w) = |w| + 1 − |Pal(w)|, and words for which D(w) = 0, that is, such that H does not vanish for any index are called full. In that paper it is also shown that there exist periodic full words, and an optimal algorithm is provided to check if an inﬁnite periodic word is full or not. Moreover, a characterization by means of a rational language is given for the language LP of words whose palindromic factors belong to a ﬁxed and ﬁnite set P of palindromes.

3 Properties of the functions G and H First observe that a word w is full if and only if Gw = Hw. Now we describe the shortest words having a ﬁxed defect value d. For instance, on a 2-letter alphabet, the shortest words having one lacuna, i.e. when d = 1, are w1 = aababbaa, w2 = aabbabaa, w3 = bbabaabb and w4 = bbaababb.

Observe that this set is closed under reversal (w1 = w2 ; w3 = w4 ) and complementation (w1 = w3 ; w2 = w4 ). On the other hand, one of the shortest words having two lacunas is the following.

i w b a a b a b b a a b G H The example above extends to the inﬁnite periodic word W = (baab.ab)ω

has defect value d, we have M (k, d) ≥ d + k. Now, consider the inﬁnite periodic word w = (α1 α2 · · · αk )ω, where αi is a letter. Observe that Pal(w) = Σ, so that each preﬁx of length n ≥ k + 1 has defect value n − k. Hence M (k, d) = d + k for k ≥ 3. ⊓ Lemma 2 Let w be a nonempty word, and let W = wω. Then we have (i) Gw (0) = 1, and if Hw (i) = 1 then w[i] is the ﬁrst occurrence of a letter;

(ii) if w = pq is primitive with p, q ∈ Pal(Σ ∗ ) then limn−→∞ GW (n) = ∞;

(iii) if w is not the product of two palindromes then GW is eventually periodic.

Proof. (i) Obvious. (ii) In this case by Theorem 4 of [11] the palindromic language of W is inﬁnite. Since for all k ≥ 0, (pq)k p is a palindromic preﬁx of W, there are inﬁnitely many palindromic preﬁxes of W. Moreover, we have GW (i) = GW (i − |w|) + |w| for i ≥ 2|w|.

(iii) Here again by Theorem 4 of [11], the palindromic language of W is ﬁnite. Therefore, let u be the shortest preﬁx of W containing all the palindromes, and let k be the smallest integer such that u ∈ Pref(wk ) then we have GW (i) = GW (i + k|w|). ⊓

Moreover they are all full, since G and H coincide.

Another periodic example illustrating Lemma 2 (ii) is W = (aba.cbc)ω. Its palindromic language is inﬁnite and W has inﬁnitely many palindromic prexes, and we have <

Observe also that there are non periodic words U such that GU is periodic.

Indeed, take any nonperiodic word, for instance the Fibonacci word deﬁned as Reconstructing words from a ﬁxed palindromic length sequence 107

Deﬁne the morphism θ : {a, b} −→ {a, b, c, d}∗ by a → abcd; b → acbd. Then the word W = θ(F ) is nonperiodic, but GW = (1111)ω. Nevertheless we have the following result showing a local periodical behaviour.

Lemma 3 Let w ∈ Σ ∗. If there exists i such that G(i) = G(i + k) = l, with l ≥ k, then the factor f = w[(i − l + 1)..(i + k)] has period 2k, and any factor of length 2k of f is the product of two palindromes.

Proof. Assume that q and p are the longest palindromic suﬃxes of length l at positions respectively i and i + k. Then there exist x and z such that f = qz = xp. In the case l = k, we have f = qp and the claim is true. If l k there exists a non-empty word y such that q = xy and p = yz. It follows from Proposition 1 that 2|x| is a period of f, and xz is a product of two palindromes. Therefore, any factor of length 2k is the product of two palindromes since it is a conjugate of xz. ⊓ The function G satisﬁes the following properties

**Proposition 3 For any ﬁnite word w ∈ Σ ∗, the following properties hold :**

(i) G(i) ≤ max{|p| : p ∈ Pal(w[0..i])} ≤ i + 1 (ii) G(j) ≤ G(i) + 2(j − i), for all j ≥ i;

(iii) G(i+1) = G(i) =⇒ G(i) and G(i+1) are odd, and G(i+2) ∈ {G(i)+2, 2};

(iv) G(i + 1) = G(i) + 1 =⇒ LPS(w[0..i]) = αG(i)+1, for some α ∈ Σ;

Proof. (i) is obvious. (ii) First, note that G(i + 1) ≤ G(i) + 2 since the longest palindromic suﬃx at position i + 1 contains a palindrome of length G(i + 1) − 2 ending at position i. The result follows by induction.