The challenge

You are given two strings $S and $T. Write a script to find out count of different unique subsequences matching $T without changing the position of characters.

The questions

I have to admit that I didn’t understand the wording in the first place, thinking that a “subsequence” had somehow to be formed by consecutive letters. The examples, though, make it pretty clear that this is not the case.

The solution

I coded two different solutions to this challenge because it gave me two itches to scratch.

On the one hand, I immediately thought of a recursive implementation that should work and be reasonably easy and readable. So… why not?

On the other hand, there’s matching involved, and the TASK #1 was about matching too, with a few substitutions. So… why not?

The (explicitly) recursive solution

The following function implements the recursive solution, at least the explicit one:

sub unique_subsequence ($S, $T) {
   my $lenT = length $T or return 1;
   my $lenS = length $S or return 0;
   my $first = substr $T, 0, 1, '';
   --$lenT;
   my $s = 0; # sum
   my $p = 0; # search start position
   while (($p < $lenS) && (my $i = index $S, $first, $p) >= $p) {
      $s += unique_subsequence(substr($S, $i), $T) if $lenS - $i >= $lenT;
      $p = $i + 1;
   }
   return $s;
}

As any good recursive function, the first lines are devoted to checking the corner cases where the recursion should be stopped and a decision taken.

If we get an empty $T, it means that we were able to match all characters in the initial $T, so it’s a match - return 1.

Otherwise, if we get an empty $S… we didn’t manage to match the whole pattern $T, so it’s a failure - return 0.

At this point, we concentrate on the first character in $T, removing it. This is the next character we’re off to look for in $S; actually, we will look for all of its instances in $S, which is why the search (using index) is in a while loop.

For each one we find, we potentially recurse and accumulate the result. Here, potentially means that if we already know that we don’t have enough characters, it makes no sense to recurse.

At each loop we also advance our starting position for searching via index, i.e. $p.

At the end of the loop, $s holds all successes we got, so we can return them up.

The regular expression solution

If $T is abc, matching it actually means matching a.*b.*c in regular expressions terms. So why not use it?

Well… regular expressions are normally used to establish if one match exist, not to count how many of them are there. If only existed a way to count them all…

Well, it seems that there is a way.

Which leads us straight to our solution:

sub unique_subsequence_rx ($S, $T) {
   $T = join '.*', split m{}mxs, $T;
   my $count = 0;
   1 while $S =~ m{$T(?{++$count})(?!)};
   return $count;
}

The basic trick here is to leverage a match with a (?{}) block, which executes some Perl code (in our case, incrementing the counter) and then immediately fail with a (?!), forcing the regular expressions engine to try another attept at matching, until it will have none left.

Neat!

The whole thing

Here’s the whole script, should you be curious:

#!/usr/bin/env perl
use 5.024;
use warnings;
use experimental qw< postderef signatures >;
no warnings qw< experimental::postderef experimental::signatures >;

sub unique_subsequence ($S, $T) {
   my $lenT = length $T or return 1;
   my $lenS = length $S or return 0;
   my $first = substr $T, 0, 1, '';
   --$lenT;
   my $s = 0; # sum
   my $p = 0; # search start position
   while (($p < $lenS) && (my $i = index $S, $first, $p) >= $p) {
      $s += unique_subsequence(substr($S, $i), $T) if $lenS - $i >= $lenT;
      $p = $i + 1;
   }
   return $s;
}

sub unique_subsequence_rx ($S, $T) {
   $T = join '.*', split m{}mxs, $T;
   my $count = 0;
   1 while $S =~ m{$T(?{++$count})(?!)};
   return $count;
}

my $string = shift // 'littleit';
my $subsequence = shift // 'lit';
say unique_subsequence($string, $subsequence);
say unique_subsequence_rx($string, $subsequence);

Stay safe!

ETOOBUSY 🚀 minimal blogging for the impatient

PWC099 - Unique Subsequence