# ETOOBUSY đźš€ minimal blogging for the impatient

# PWC081 - Common Base String

**TL;DR**

Itâ€™s Perl Weekly Challenge time again, now on issue #081 TASK #1.

I feel like the challenges can vary *a lot* in difficulty. I just donâ€™t
know if itâ€™s me (like: I complicate my life with some solutions) or they
are indeed of very different difficulty.

# The challenge

You are given 2 strings,

`$A`

and`$B`

. Write a script to find out common base strings in`$A`

and`$B`

. A substring of a string`$S`

is called base string if repeated concatenation of the substring results in the string.

# The questions

The examples that follow the challenge make it pretty clear that the
challenge requires *every* possible *base string*, not just the shortest
one. So my first question would beâ€¦ really? Iâ€™d ask because this makes
this challenge seem like itâ€™s *two* challenges. But I digress.

Other than that, Iâ€™d only be curious whether there is a specific order that the solutions should be provided back. It seems to be â€śby lengthâ€ť from one example, but it might just as well be random.

Itâ€™s fair to assume that the two strings are already in memory, so that should not be an issue. It might be worth to ask if itâ€™s OK to create sub-copies of the strings or, for example, if theyâ€™re too long and itâ€™s better to avoid.

# The solution

The first instict is to go with the brute force. At leastâ€¦ most of the times, itâ€™s surviving. Get the job done, then evaluate. Which means that this comes to mind:

```
1 sub common_bases_brute_force ($A, $B) {
2 my ($lA, $lB) = (length($A), length($B));
3 ($A, $B, $lA, $lB) = ($B, $A, $lB, $lA) if $lA > $lB;
4 my @retval;
5 CANDIDATE:
6 for my $l (1 .. int($lA / 2), $lA) {
7 next CANDIDATE if ($lA % $l) || ($lB % $l);
8 my $base = substr $A, 0, $l;
9 for my $s ($A, $B) {
10 next CANDIDATE if $s ne $base x (length($s) / $l);
11 }
12 push @retval, $base;
13 }
14 return @retval;
15 }
```

The gist is: iterate over all possible base strings (i.e. all possible lengths for a base string) and check them against both strings.

Then you have a few refinements:

- inputs are swapped to make
`$A`

be shorter, or at most the same length as`$B`

(line 3). This is because it makes sense to look for bases for the shorter ones in the first place; - we rule out all lengths that are more than one half of the shortest string, except for the whole string. This is the sense of the values for the iteration in line 6, which goes through all possible base string lengths;
- if the length of either string is not a multiple of the candidate base
`$l`

we donâ€™t even bother checking (line 7); - if any of the two string is
*not*generated by the candidate base we skip to the next candidate (line 10)

If all checks are fine, we get the base and move on. Yay!

Then, of course, the lurking monster gets into the stage. Is this really needed? Isnâ€™t there a better, more efficient way?

The answer isâ€¦ depends. If the input strings are short and the test is performed rarely, who cares? The solution above is so easy that anything else would be a waste of precious programmer cycles. Itâ€™s even so easy that it documents itself (more or less).

But, for sake of discussionâ€¦ letâ€™s say we want to complicate our lives. Then we could think something like this:

- first, find the
*minimal*base string for both; - if these two differ, donâ€™t bother moving forward;
- otherwise, find all possible repetitions of this minimal common base that would fit both strings. This can be done without string comparisons, only by playing with numbers.

Fact is that I *suspect* this might be more efficient, but I didnâ€™t
bother doing any benchmark! Anyway, letâ€™s look at it.

```
1 sub proper_factors ($n) { grep { $n % $_ == 0} (2 .. int($n/2))}
2
3 sub min_period ($string) {
4 my $n = length $string;
5
6 CANDIDATE:
7 for my $k (1, proper_factors($n)) {
8 my $m = $n / $k; # sub-sequences we have to test
9 for my $i (0 .. $k - 1) { # sub-sequence iterator
10 my $char = substr $string, $i, 1;
11 for my $s (1 .. $m - 1) { # sequence iterator
12 next CANDIDATE if $char ne substr $string, $k * $s + $i, 1;
13 }
14 }
15 # yay!
16 return $k;
17 }
18
19 # nothing found, minimum period is the string's length
20 return $n;
21 }
```

We start with a function to find the minimal base string, which corresponds to finiding the minimum period of a string. This is quite similar to the previous brute force attack:

- we iterate over
`1`

or*proper factors*only (line 7); - we do a check that the length indeed yields a base string (lines 9 to 14);
- we return the length of the whole string if nothing better was found in the iteration (line 20).

This function `min_period`

can now be used to find (if any) the minimal
common base:

```
1 sub min_common_base ($A, $B) {
2 my $pA = min_period($A);
3 my $pB = min_period($B);
4 return if $pB != $pA; # they must be equal
5 my $candidate = substr $A, 0, $pA;
6 return $candidate if $candidate eq substr $B, 0, $pB;
7 return;
8 }
```

This is just a fancy way to say: these two minimum bases *must* be equal
or thereâ€™s no deal.

Now weâ€™re ready for the higher level function:

```
1 sub common_bases ($A, $B) {
2 defined(my $b = min_common_base($A, $B)) or return;
3
4 my $l = length $b;
5 my ($rA, $rB) = map {length($_) / $l} ($A, $B);
6 ($rA, $rB) = ($rB, $rA) if $rA > $rB;
7
8 return map { $rB % $_ ? () : $b x $_ } (1, proper_factors($rA), $rA);
9 }
```

If the two strings have no minimal common base, no need to search further (line 2).

If they *do* have the same minimal common string, though, we iterate
over all possible arrangement that would fit the shorter one and check
them against the longer one. This can be done by simply looking at the
*lengths* of the strings, because we know that any base string must fit
into the generated string an integral number of times.

So we first find the number of *repetitions* that fit the minimal common
base into both `$A`

and `$B`

(line 5), then we make sure that we work on
the shorter one (line 6, swaps the two repetition numbers to make sure
that `$rA`

is the lower of the two).

Line 8 is an iteration over all possible candidates according to
`$A`

/`$rA`

(i.e. all factors of `$rA`

, both proper and improper) and
check if they would fit also `$B`

(by a check of divisibility).

For very long stringsâ€¦ this *hopefully* goes faster than the brute
force attack!

If you want to play with the whole scriptâ€¦

```
#!/usr/bin/env perl
use 5.024;
use warnings;
use experimental qw< postderef signatures >;
no warnings qw< experimental::postderef experimental::signatures >;
sub proper_factors ($n) { grep { $n % $_ == 0} (2 .. int($n/2))}
sub min_period ($string) {
my $n = length $string;
CANDIDATE:
for my $k (1, proper_factors($n)) {
my $m = $n / $k; # sub-sequences we have to test
for my $i (0 .. $k - 1) { # sub-sequence iterator
my $char = substr $string, $i, 1;
for my $s (1 .. $m - 1) { # sequence iterator
next CANDIDATE if $char ne substr $string, $k * $s + $i, 1;
}
}
# yay!
return $k;
}
# nothing found, minimum period is the string's length
return $n;
}
sub min_common_base ($A, $B) {
my $pA = min_period($A);
my $pB = min_period($B);
return if $pB != $pA; # they must be equal
my $candidate = substr $A, 0, $pA;
return $candidate if $candidate eq substr $B, 0, $pB;
return;
}
sub common_bases ($A, $B) {
defined(my $b = min_common_base($A, $B)) or return;
my $l = length $b;
my ($rA, $rB) = map {length($_) / $l} ($A, $B);
($rA, $rB) = ($rB, $rA) if $rA > $rB;
return map { $rB % $_ ? () : $b x $_ } (1, proper_factors($rA), $rA);
}
sub common_bases_brute_force ($A, $B) {
my ($lA, $lB) = (length($A), length($B));
($A, $B, $lA, $lB) = ($B, $A, $lB, $lA) if $lA > $lB;
my @retval;
CANDIDATE:
for my $l (1 .. int($lA / 2), $lA) {
next CANDIDATE if ($lA % $l) || ($lB % $l);
my $base = substr $A, 0, $l;
for my $s ($A, $B) {
next CANDIDATE if $s ne $base x (length($s) / $l);
}
push @retval, $base;
}
return @retval;
}
for my $input (
['abcdabcd', 'abcdabcdabcdabcd'],
['aaa', 'aa'],
){
say '(', join(', ', map {qq{"$_"}} common_bases($input->@*)), ')';
}
```

I guess this is it!