ETOOBUSY đźš€ minimal blogging for the impatient
Removing loops from a path
TL;DR
An interesting and, after all, straightforward algorithm to remove loops in a path inside a graph.
Reading the Wikipedia page on algorithmic maze generation, I stumbled upon the concept of a Looperased random walk (or LERW), which are at the base of one of the algorithms.
While the random walk is surely very important for the algorithm (which we will discuss in due time), the loop erasure part intrigued me and here we are.
A data structure for the path
Our path will be defined as an ordered, finite sequence of nodes,
starting from a source node S
and ending in a target node T
.
In practical terms, we can think of an array of node identifiers (in
the following we will â€śconfuseâ€ť these identifiers with the nodes
themselves).
This path may have loops or not. In our representation, itâ€™s alway possible to detect a loop in the path, because at least one node identifier will be repeated two or more times. If all entries in our array contain a different identifier, the path is loopfree.
Erasing loops
Erasing loops in our representation means producing a new array that is loopfree, i.e. where all items are different from one another and, of course, appear in the same â€śconnectedâ€ť order as the starting path.
Any subpath that starts with a node identifier X
and ends in a node
identifier X
is a loop. Hence, all subpaths marked with arrows the
following picture are also loops:
... X ... X ... X ... X ...
   
>>>
 loop  loop  loop 
  
> 
 a loop too 
 
>
and also this...
The longest subpath that starts at the first occurrence of X
and ends
at the last occurrence of X
and is a loop itself:
+ first occurrence of X + last occurrence of X
 
v v
... w X ... X ... v X y ...
 
>
longest loop crossing at X
We can remove from the first X
up to the element immediately before the
last one:
+ first occurrence of X + last occurrence of X
 
v v
... w X ... X ... v X y ...
 
>
removed elements
\_________________________________________/


V
_________________________________________
/ \
... w X y ...
^

+ only occurrence of X left
This guarantees:
 the elimination of all loops crossing at
X
, because weâ€™re only leaving one singleX
;  that the output path is unbroken, because
w
is still followed byX
andX
is still followed byy
as in the original path.
Repeating this until each identifier only appears once will eventually guarantee that the final path is loopfree.
Sweeping the path from the beginning and erasing loops as described above is said to follow a chronological order. This terminology stems from the fact that loop erasure is often associated to random walks, in which the initial path (potentially containing loops) is generated by doing a random, stepwise visit of the graph (so thereâ€™s an association between the position of an identifier in the sequence and the time it was inserted).
Pseudocode time!
Letâ€™s see the pseudocode for the looperasure algorithm:
1 # input_path is a sequence of node identifiers, 0based
2 # output_path is a (loopfree) sequence of node identifiers, 0based
3 # i, j, N are integers
4
5 output_path = ()
6 N = size of input_path
7 i = 0
8 while i < N
9 j = i + 1
10 while j < N
11 if input_path[i] is the same as input_path[j]
12 i = j
13 j = j + 1
14 output_path = (output_path, input_path[i])
15 i = i + 1
The input_path
is sweeped from the beginning, always going on until it
hits the last element. This is done using the integer indexing variable i
,
that starts from 0
(first index inside input_path
) up to one less than
N
(number of elements in input_path
).
When we are at position i
, we are pointing at identifier X
of the
previous section. To remove loops crossing at X
, we have to find the
latest occurrence of X
, which is done in the while
in lines 9 to 13. At
the end of this loop, i
will be the index of the last occurrence of X
in
the input_path
sequence.
As we saw in the previous section, we can at this point add this last
element to the output_path
(line 14) and move on to look for other loops
starting from the following position (line 15, increasing i
by 1).
Itâ€™s easy to see that moving incrementally from the beginning of the
input_sequence
guarantees that the identifier input_path[i]
in lines 9
to 13 is not already contained in output_path
, thus ensuring that
there will be no residual loop at the end of the main while
at lines 8 to
15.
An example implementation in Perl
The following Perl subroutine implements the algorithm:
# compute the loopfree path from $input_path
# return as anonymous array
sub path_loop_erasure ($input_path) {
my @output_path;
my $i = 1;
my $N = @$input_path;
while (++$i < $N) {
# find latest occurrence of $input_path>[$i]
my $j = $i;
while (++$j < $N) {
# "advance" $i if the corresponding item is found
# later in the array
$i = $j if $input_path>[$i] eq $input_path>[$j];
}
# whatever, this item fits into the output
push @output_path, $input_path>[$i];
}
return \@output_path;
}
This implementation assumes that identifiers are strings (weâ€™re using eq
),
you might want to adapt it to your needs. It is mostly a direct translation
of the pseudocode in the previous section, apart for a couple stylistic
differences (mainly in how indexes are initialized and incremented).
Timeâ€™s up
The path looperasure algorithm can be a bit daunting when you read it in its full abstract, formal shape, but it starts to make a lot of sense as soon as you jot down a couple examples to understand what does it mean for a path to have loops. I hope the explanation above is clear, otherwise please comment!
I have to confess that this post is actually a preparation for A RANDOM Maze with Curses  providing algorithmic generation of mazes and extending the previous post A Maze with Curses. Donâ€™t miss them!