# ETOOBUSY ðŸš€ minimal blogging for the impatient

# PWC111 - Search Matrix

**TL;DR**

Here we are with TASK #1 from the Perl Weekly Challenge #111. Enjoy!

# The challenge

You are given 5x5 matrix filled with integers such that each row is sorted from left to right and the first integer of each row is greater than the last integer of the previous row.

Write a script to find a given integer in the matrix using an efficient search algorithm.

Example`Matrix: [ 1, 2, 3, 5, 7 ] [ 9, 11, 15, 19, 20 ] [ 23, 24, 25, 29, 31 ] [ 32, 33, 39, 40, 42 ] [ 45, 47, 48, 49, 50 ] Input: 35 Output: 0 since it is missing in the matrix Input: 39 Output: 1 as it exists in the matrix`

# The questions

A few questions that might pop up in an interview:

- Representation?
- Array of arrays

- Generalization to an $n \times m$ matrix?
- Not necessary but nice to have

- How many searches are we planning to do over the same matrix?
- Surprise usâ€¦

# The solution

The first *mind-reflex* to address this challenge is to implement a form
of binary search. This is possible because the matrix is just a
long, sorted array in disguise, which can be obtained back by lining up
all rows in their order.

Hence, for a matrix holding $N = n \cdot m$ elements (where $n$ is the number of rows and $m$ is the number of columns), we would end up with an asyntotic complexity of $O(log(N))$.

Incidentally, this made me think if this is better instead:

- apply a binary search on the first item of each row and pinpoint one of them, then
- look for the item in the remaining row.

because this would end up having complexity $O(log(n) + log(m))$. Now, if $n = m$, this becomes $O(2 log(n)) = O(log(n))$, that seems better than $O(log(N)) = O(log(n^2))$, right?

Wellâ€¦ wrong. As $log(n^2) = 2 log(n)$, we end up in the same exact
situation. *Well tried, youâ€™ll be luckier the next time, letâ€™s move on*.

OK, letâ€™s see this solution then:

```
sub search_matrix ($M, $x) {
my $n_rows = $M->@* or return 0;
my $n_cols = $M->[0]->@* or return 0;
my ($lo, $hi) = (0, $n_rows * $n_cols - 1);
while ('necessary') {
my $mid = int(($lo + $hi) / 2);
my $v = $M->[$mid / $n_cols][$mid % $n_cols];
return 1 if $v == $x;
return 0 if $lo == $hi;
if ($v < $x) { $lo = ($mid == $lo) ? $lo + 1 : $mid }
else { $hi = $mid }
} ## end while ('necessary')
} ## end sub search_matrix
```

Itâ€™s a basic approach to binary search, with a couple of caveats:

- weâ€™re actually handed over a matrix, so we have to think in terms of
*row*and*column*instead of a single index over an array. For this reason, we*mostly*deal with linear indices`$lo`

,`$hi`

and`$mid`

, just to transform them to a*row*/*column*pair when accessing the matrix`$M`

; - the rounding when calculating
`$mid`

always goes to the lowest number, so we might end up being stuck with`$lo`

one less than`$hi`

. For this reason, if the last checked values from`$M`

is less than the target (i.e. we have to advance`$lo`

) and`$mid`

is equal to`$lo`

, then we move`$lo`

ahead of one place (i.e. make it the same as`$hi`

), otherwise we assign`$mid`

to`$lo`

as in the normal case for binary search; - the issue above does not apply to
`$hi`

, so if the value is greater than the target we just set`$hi`

to be`$mid`

.

Now, suppose that our problem statement is later refined like follows:

Oh, by the way! You might be asked to assess the presence of multiple values in the matrix, generally around half of the elements in the matrix itself, or more.

Well wellâ€¦ this changes the thing a bit, because our overall approach
would not become $O(N \cdot log(N))$, which means thatâ€¦ *we might do
better* (*on average* and at the expense of some memory).

When we have to check for the presence of one item in the matrix
multiple times for multiple candidates, the problem quickly turns into
answering a *somewhat* different question: is this item inside this set?

Now, the best and quickest approximation that we have of a set in Perl is by using a hash, so we can code this:

```
sub matrix_searcher ($M) {
my %is_item = map { map { $_ => 1 } $_->@* } $M->@*;
return sub ($x) { exists $is_item{$x} ? 1 : 0 };
}
#...
my $ms = matrix_searcher(\@matrix);
say $ms->($_) ? "$_ is present" : "$_ is absent" for @targets;
```

Why is this *better on average*? Itâ€™s because insertion and search into
a hash table both have constant cost on average, so:

- the cost of creating the hash in
`%is_item`

in`matrix_searcher`

is $O(N)$, done at the beginning only, and - the cost of looking for all items in
`@target`

is $O(N)$

which gives us an overall complexity ofâ€¦ $O(N)$. Yay!

Should we stick to the simpler one-shot version, this is the full program:

```
#!/usr/bin/env perl
use 5.024;
use warnings;
use experimental qw< postderef signatures >;
no warnings qw< experimental::postderef experimental::signatures >;
sub search_matrix ($M, $x) {
my $n_rows = $M->@* or return 0;
my $n_cols = $M->[0]->@* or return 0;
my ($lo, $hi) = (0, $n_rows * $n_cols - 1);
while ('necessary') {
my $mid = int(($lo + $hi) / 2);
my $v = $M->[$mid / $n_cols][$mid % $n_cols];
return 1 if $v == $x;
return 0 if $lo == $hi;
if ($v < $x) { $lo = ($mid == $lo) ? $lo + 1 : $mid }
else { $hi = $mid }
} ## end while ('necessary')
} ## end sub search_matrix
my @matrix = (
[1, 2, 3, 5, 7],
[9, 11, 15, 19, 20],
[23, 24, 25, 29, 31],
[32, 33, 39, 40, 42],
[45, 47, 48, 49, 50],
);
my $target = shift || 35;
say search_matrix(\@matrix, $target)
? "$target is present"
: "$target is absent";
```

Last, all these considerations apply to the generalized $n \times m$ case. Is it reasonable?

Well, arguablyâ€¦ no. The challenge is clearly talking about a $5 \times
5$ matrix, so why bother with bigger ones? At this point, I guess, the
*hash-based* solution is still going to win if we have to check against
*a lot* of possible candidates, but Iâ€™m not 100% sure that the binary
search is still better than a simple search. I mean, there are a lot of
operations involved and they might take a noticeable toll.

Hence, the *right thing* at this point would be to make a
Benchmarkâ€¦ but Iâ€™ve inflicted enough wan**AHEM**speculations on
you poor readers and Iâ€™ll stop here. Stay safe!