# «Reconstructing words from a ﬁxed palindromic length sequence∗ Alexandre Blondin Mass´1, Sreˇko Brlek1, Andrea Frosini2, S´bastien Labb´1, ...»

(iii) Follows from Lemma 3. (iv) Let p and q be the respective palindromes at positions i and (i + 1). Then we have q = pα for some α ∈ Σ, and we conclude by using Proposition 1. ⊓ Lemma 4 Let i ≤ k. If G(k) = G(i) + 2(k − i), then G(j) = G(i) + 2(j − i) for all i ≤ j ≤ k.

Proof. G(k) − 2(k − j) ≤ G(j) ≤ G(i) + 2(j − i) and the left term is equal to

4 Reverse engineering the functions G and H Here we tackle the following problems. Given a (ﬁnite or inﬁnite) sequence s of integers, does there exist a word w such that Hw = s or Gw = s ? If such a word w exists, under which conditions is it unique up to permutation of the letters ?

We say that a ﬁnite/inﬁnite sequence s is G-consistent (resp. H-consistent) on Σ if there is at least one nonempty word w ∈ Σ ∞ such that for all i, Gw (i) = s[i] (resp. Hw (i) = s[i]). If there is only one such word (up to permutation of the

**letters) then s is said to be unambiguous. A ﬁrst simple result follows:**

Proposition 5 Let Σ be an alphabet of at most 3 letters. Then any Gconsistent sequence on Σ is unambiguous.

Proof. Let s be a G-consistent sequence. We proceed by induction on the length of s. Then s[0] = 1 so that the base of the induction is trivially satisﬁed by choosing one letter in Σ. Assume that s[0..i] is unambiguous. Then there exist

**a word w, such that Gw [0..i] = s[0..i]. Two cases arise:**

(a) s[i + 1] 1: we set w[i + 1] = w[i + 2 − s[i + 1]].

(b) s[i + 1] = 1: if |s|1 = 2 then |Σ| = 2, so that w[i + 1] ∈ Σ \ {w[0]}.

**If |s|1 2 then |Σ| = 3 and we have to consider two cases:**

- if |s[0..i]|1 = 2, then we set w[i + 1] to the remaining letter;

- if |s[0..i]|1 2, then w[0..i] = pγβ k αl where Σ = {α, β, γ}, p ∈ Σ ∗, and k, l ≥ 1, and we set w[i + 1] = γ. ⊓ Observe that for larger alphabets, that is when |Σ| 3, G-consistent sequences are not necessarily unambiguous, as shown in the following examples.

while two diﬀerent words are consistent with s[0..5], a fact that follows from Lemma 2(i).

One can easily see that the previous ambiguity is related with the presence of more than three 1’s in the sequence s. However here is a word w having four occurrences of 1, but uniquely determined by G as well.

Example. Let Σ = {a, b, c, d} and let s = [1, 1, 1, 3, 1, 3, 5, 7, 9]. There is a

**unique word which is G-consistent with s:**

Reconstructing words from a ﬁxed palindromic length sequence 109

The situation is clearly explained by the following statement Proposition 6 Let s be a G-consistent sequence. If there exist two distinct words w, w′ consistent with s, then there exists i such that Gw (i) = Gw′ (i) = 1 and Hw (i) = 0 or Hw′ (i) = 0.

Proof. Indeed, if s[i] = 1, then w[i] is either a new letter or, a previously encountered letter such that the longest palindromic LPS(w[0..i]) is the letter itself. ⊓ Consider now the same problem for the function H. Since the functions G and H coincide for full words, we have immediately the next result.

Corollary 1 Any full word (thus any Sturmian word) is uniquely determined by the function H.

So, the function Gw encodes all the information on w, but this is no longer true for the function Hw. Indeed, there exist H-consistent sequences that are not unambiguous as shown in the following example: consider the word w = abbabbbabaabb. Then, we have

The proof is similar to that of proof of Proposition 5. As an example, consider the following H-consistent sequence on Σ = {a, b} i 0 1 2 3 4 5 6 7 8 9 10 11 s 121246433006 waabbaababxyz where s1 = [ 1, 2, 1, 2, 4, 6, 4, 3, 3], m = 6, and k = 2. The last three elements of w can be uniquely determined since the factor b a b x y z has to be a palindrome, that is x = z = b, and y = a.

Note that the bound k on the length of a subsequence of 0’s in s given in Proposition 5 does not depend on the cardinality of Σ. On the other hand, observe that if | Σ | = 2 and k = 1 the sequence s is still uniquely determined.

For instance, consider the sequence s[0..12] = [1, 2, 1, 3, 3, 2, 4, 6, 5, 3, 5, 0, 3],

**with Σ = {a, b}, and therefore m = 3:**

i 0 1 2 3 4 5 6 7 8 9 10 11 12 s 1213324653503 waababbabbbaaa Here the cardinality of the alphabet allows only one possible choice of the letter consistent with Hw (11) = 0, consequently m ̸ 2k + 1 = 3, but the word w is uniquely determined as well.

4.1 Inﬁnite words In the case of inﬁnite words, the situation is similar and ambiguous H can also occur. We start by recalling some facts. From [8, 11], we know that, when analyzing the defect and the lacunas of an inﬁnite word, it can present (a) an inﬁnite palindromic complexity with a ﬁnite number of lacunas;

(b) a ﬁnite palindromic complexity with an inﬁnite number of lacunas;

(c) both inﬁnite palindromic complexity and number of lacunas.

In general, in none of the three cases the function H is unambiguous, as it is shown in the following examples.

Case (a): consider two words U and V having the same preﬁx of length 23

and since their period is the product of two palindromes, the palindromic complexity of both u and v is inﬁnite. Finally, an easy check reveals that the suﬃx sequence of the function H, for n 2, is

All the terms after position 22 are equal to 0 since the words U and V are eventually periodic, with a period which is not the product of two palindromes.

Case (c): ﬁnally, consider the words U and V deﬁned as follows (using again

**w = abbabbbabaabb):**

U = w · ab · bbaaa · baab · baabba · (baab)2 · baabba... (baab)n · baabba...

V = w · ba · bbaaa · baab · baabba · (baab)2 · baabba... (baab)n · baabba....

The sequences HU and HV coincide and their ﬁrst terms are

The two sequences have an inﬁnite number of new palindromes since the palindromic factor baab is repeated an increasing number of times at each step. At the same time the set of lacunas is inﬁnite since the factor baabba, which is not the product of two palindromes, occurs inﬁnitely many times.

5 Further work

Consistency. Deciding if a given ﬁnite sequence s of numbers is G-consistent (resp. H-consistent) may be easily achieved. Indeed, let k = |s|1. This implies that the smallest alphabet Σ we have to consider contains at most k letters (exactly k for H-consistency, by virtue of Lemma 2 (i)). Taking an order on the letters of Σ permits to restrict the study to classes of words equivalent under permutations of letters. A close look to the proof of Proposition 5 reveals all

**the information in order to construct sequentially all words consistent with s:**

indeed, it suﬃces to check at each position i, if LPS(w[0..i]) = s[i].

Random and exhaustive generation. The algorithms described above may be used for constructing trees of words. Indeed, at each step i one constructs a trie of words having height i + 1 and satisfying G[i] = s[i + 1] (resp. H[i] = s[i + 1]).

The process stops if either it is impossible to construct the next step, or ends successfully if i = |s|. In case of a successful termination it is easy to check if every non leaf node has a unique son, solving the unambiguity problem of the sequence s. These are the basic tools for constructing randomly or exhaustively many classes of words, for instance all full words of length n.

Enumeration. Counting classes of G-consistent or H-consistent sequences follows naturally. For instance, given a ﬁxed length n, it amounts to count for a ﬁxed alphabet Σ the set { Hw : w ∈ Σ n }. Indeed, a greedy algorithm can be implemented to obtain the ﬁrst values: it suﬃces to generate all words in Σ n, and to compute H for each such word.

The enumeration formula of the ﬁnite Sturmian words is known [27]. Since they are full, a closely related counting problem is that of determining a formula for the number of non-Sturmian full words on the alphabet {a, b}∗. Determining the number of words having a ﬁxed number of lacunas is also challenging.

Characterization of special classes of G or H functions. For instance, given a 2-letter alphabet Σ = {a, b} one might look for a description of the following

**sets of functions:**

** G = {Gw : w ∈ {a, b}∗} H = {Hw : w ∈ {a, b}∗ such that w is full}.**

In another direction it would be interesting to describe inﬁnite words on ﬁxed alphabets whose G (or H) sequence is automatic.

Constrained reconstruction. Given a ﬁnite set of palindromes P, how can we determine the shortest full word containing all the palindromes of P and only those palindromes? The answer is based on Theorem 1 of [11]. Indeed, the language of words having exactly P as palindromic factors is rational. Therefore there exists a deterministic minimal automaton recognizing all these words. For each palindrome q in P, there is a unique path starting from the initial state whose trace is q. Collecting the target states T(P ) of all paths computing P, it suﬃces then to compute the shortest path starting from the initial state and containing all states in T(P ). It may or may not exist, and if it does not, one might relax the conditions by allowing some extra palindromes in order to ﬁnd a solution.

Reconstructing words from a ﬁxed palindromic length sequence 113 Structure of full words. Let w be a ﬁnite full word on a 2-letter alphabet Σ.

One can easily prove that Hw and Hw have the same elements, while Hw = Hw if and only if w = w or w = w. The two sets of longest unioccurrent palindromic suﬃxes of w and w naturally deﬁne a permutation on the set {1, 2,..., |w|}.

More precisely, let p1, p2,..., p|w| be the longest palindromic suﬃxes of w in order of their ﬁrst occurrence in w and let xi be the position of the last occurrence of pi in w. We deﬁne the permutation πw on {1, 2,..., |w|} by

** πw (i) = |w| + |pi | − |xi |.**

Now, let q1, q2,..., q|w| be the longest palindromic suﬃxes of w in order of their ﬁrst occurrence in w. Then pi = qπw (i), for i = 1, 2,..., |w|. We illustrate this fact by an example: let w = ababbabab, so that w = bababbaba. Then we have the following table showing that Hw ̸= Hw, i LPSU(w) a b aba bab bb abba babbab ababbaba babab LPSU(w) b a bab aba babab bb abba babbab ababbaba and the permutation πw is (2, 1, 4, 3, 6, 7, 8, 9, 5).

We would like to study the combinatorial properties of πw in relation with those of the word w. In particular we are interested in characterizing the permutations πw associated with full words. A similar study can also be performed on arbitrary alphabets provide one replaces the ( ) operation by an arbitrary permutation of the alphabet Σ.

Acknowledgements We are grateful to the anonymous referees for their careful reading and useful comments.

References

6. Allouche, J.P., and Shallit, J. (2000) Sums of digits, overlaps, and palindromes, Disc.

Math. and Theoret. Comput. Sci. 4:1–10.

7. Baake, M. (1999) A note on palindromicity, Lett. Math. Phys. 49:217–227.

8. Blondin Mass´, A., Brlek, S., and Labb´, S. (2008) Palindromic lacunas of the Thuee e Morse word, GASCOM 2008 (To appear)

9. Fici, G., Mignosi, F., Restivo, A. and Sciortino, M. (2006) Word assembly through minimal forbidden words,Theoret. Comput. Sci., 359/1-3: 214–230.

10. Brlek, S. (1989) Enumeration of factors in the Thue-Morse word, Disc. Appl. Math.

24:83–96.

11. Brlek, S., Hamel, S., Nivat, M., and Reutenauer, C. (2004) On the Palindromic Complexity of Inﬁnite Words, in J. Berstel, J. Karhum¨ki, D. Perrin, Eds, Combinatorics on a Words with Applications, Int. J. of Found. of Comput. Sci., 15/2:293–306

12. Brlek, S., and Ladouceur, A. (2003) A note on diﬀerentiable palindromes, Theoret. Comput. Sci. 302:167–178.

13. Brlek, S., Jamet, D., and Paquin, G., (2008) Smooth Words on 2-letter alphabets having same parity, Theoret. Comput. Sci. 393/1-3:166181.

14. Brlek, S., Dulucq, S., Ladouceur, A. and Vuillon L. (2006) Combinatorial properties of smooth inﬁnite words, Theoret. Comput. Sci. 352/1-3:306–317.

15. de Bruijn N. G. (1946) A Combinatorial Problem, Koninklijke Nederlandse Akademie v. Wetenschappen 49: 758764.

16. Droubay, X., Justin, J., and Pirillo, G. (2001) Episturmian words and some constructions of de Luca and Rauzy, Theoret. Comput. Sci. 255:539–553.

17. Flye Sainte-Marie, C. (1894). Question 48, L’Intermdiaire Math. 1: 107110.

18. Fredericksen, H. and Maiorana, J. (1978) Necklaces of beads in k colors and k-ary de Bruijn sequences Disc. Math. 23/3, 207–210

19. Good, I. J. (1946) Normal recurring decimals, J. London Math. Soc. 21 (3): 167–169.

20. Hof, A., Knill, O., and Simon, B. (1995) Singular continuous spectrum for palindromic Schr¨dinger operators, Commun. Math. Phys. 174:149–159.

o

21. Lothaire M. (1983) Combinatorics on words, Addison-Wesley.

22. Lothaire, M. (2002) Algebraic Combinatorics on words, Cambridge University Press.

23. de Luca, A. (1997) Sturmian words: structure, combinatorics, and their arithmetics, Theoret. Comput. Sci. 183:45–82.

24. de Luca, A., and Varricchio, S. (1989) Some combinatorial properties of the Thue-Morse sequence, Theoret. Comput. Sci. 63:333–348.

25. Mantaci, S., Restivo, A., Rosone, G. and Sciortino, M. (2008) A New Combinatorial Approach to Sequence ComparisonTheory Comput. Syst. 42/3:411–429.

26. Manvel, B., Meyerowitz, A., Schwenk*, A., Smith, K. and Stockmeyer, P. (1991) Reconstruction of sequences, Disc. Math. 94/3: 209–219.

27. Mignosi, F. (1991) On the number of factors of Sturmian words, Theoret. Comput. Sci.

82/1: 71–84.

28. Morse, M. and Hedlund, G. (1940) Symbolic Dynamics II. Sturmian trajectories, Amer.