Removing loops from a path

TL;DR

An interesting and, after all, straightforward algorithm to remove loops in a path inside a graph.

Reading the Wikipedia page on algorithmic maze generation, I stumbled upon the concept of a Loop-erased random walk (or LERW), which are at the base of one of the algorithms.

While the random walk is surely very important for the algorithm (which we will discuss in due time), the loop erasure part intrigued me and here we are.

A data structure for the path

Our path will be defined as an ordered, finite sequence of nodes, starting from a source node S and ending in a target node T. In practical terms, we can think of an array of node identifiers (in the following we will “confuse” these identifiers with the nodes themselves).

This path may have loops or not. In our representation, it’s alway possible to detect a loop in the path, because at least one node identifier will be repeated two or more times. If all entries in our array contain a different identifier, the path is loop-free.

Erasing loops

Erasing loops in our representation means producing a new array that is loop-free, i.e. where all items are different from one another and, of course, appear in the same “connected” order as the starting path.

Any sub-path that starts with a node identifier X and ends in a node identifier X is a loop. Hence, all sub-paths marked with arrows the following picture are also loops:

... X ...  X ...  X ...  X ...     
    |      |      |      |
    |----->|----->|----->|
    | loop | loop | loop |
    |             |      |
    |------------>|      |
    | a loop too         |
    |                    |
    |------------------->|
       and also this...

The longest sub-path that starts at the first occurrence of X and ends at the last occurrence of X and is a loop itself:

      +-- first occurrence of X     +-- last occurrence of X
      |                             |
      v                             v
... w X     ...     X     ...     v X y ...     
      |                             |
      |---------------------------->|
        longest loop crossing at X

We can remove from the first X up to the element immediately before the last one:

      +-- first occurrence of X     +-- last occurrence of X
      |                             |
      v                             v
... w X     ...     X     ...     v X y ...     
      |                           |
      |-------------------------->|
            removed elements
\_________________________________________/

                   |
                   |
                   V
 _________________________________________
/                                         \
... w                               X y ...
                                    ^
                                    |
                                    +-- only occurrence of X left

This guarantees:

  • the elimination of all loops crossing at X, because we’re only leaving one single X;
  • that the output path is unbroken, because w is still followed by X and X is still followed by y as in the original path.

Repeating this until each identifier only appears once will eventually guarantee that the final path is loop-free.

Sweeping the path from the beginning and erasing loops as described above is said to follow a chronological order. This terminology stems from the fact that loop erasure is often associated to random walks, in which the initial path (potentially containing loops) is generated by doing a random, step-wise visit of the graph (so there’s an association between the position of an identifier in the sequence and the time it was inserted).

Pseudo-code time!

Let’s see the pseudo-code for the loop-erasure algorithm:

 1 # input_path  is a sequence of node identifiers, 0-based
 2 # output_path is a (loop-free) sequence of node identifiers, 0-based
 3 # i, j, N are integers
 4
 5 output_path = ()
 6 N = size of input_path
 7 i = 0
 8 while i < N
 9    j = i + 1
10    while j < N
11       if input_path[i] is the same as input_path[j]
12          i = j
13       j = j + 1
14    output_path = (output_path, input_path[i])
15    i = i + 1

The input_path is sweeped from the beginning, always going on until it hits the last element. This is done using the integer indexing variable i, that starts from 0 (first index inside input_path) up to one less than N (number of elements in input_path).

When we are at position i, we are pointing at identifier X of the previous section. To remove loops crossing at X, we have to find the latest occurrence of X, which is done in the while in lines 9 to 13. At the end of this loop, i will be the index of the last occurrence of X in the input_path sequence.

As we saw in the previous section, we can at this point add this last element to the output_path (line 14) and move on to look for other loops starting from the following position (line 15, increasing i by 1).

It’s easy to see that moving incrementally from the beginning of the input_sequence guarantees that the identifier input_path[i] in lines 9 to 13 is not already contained in output_path, thus ensuring that there will be no residual loop at the end of the main while at lines 8 to 15.

An example implementation in Perl

The following Perl subroutine implements the algorithm:

# compute the loop-free path from $input_path
# return as anonymous array
sub path_loop_erasure ($input_path) {
    my @output_path;
    my $i = -1;
    my $N = @$input_path;
    while (++$i < $N) {

        # find latest occurrence of $input_path->[$i]
        my $j = $i;
        while (++$j < $N) {
            # "advance" $i if the corresponding item is found
            # later in the array
            $i = $j if $input_path->[$i] eq $input_path->[$j];
        }

        # whatever, this item fits into the output
        push @output_path, $input_path->[$i];
    }
    return \@output_path;
}

This implementation assumes that identifiers are strings (we’re using eq), you might want to adapt it to your needs. It is mostly a direct translation of the pseudo-code in the previous section, apart for a couple stylistic differences (mainly in how indexes are initialized and incremented).

Time’s up

The path loop-erasure algorithm can be a bit daunting when you read it in its full abstract, formal shape, but it starts to make a lot of sense as soon as you jot down a couple examples to understand what does it mean for a path to have loops. I hope the explanation above is clear, otherwise please comment!

I have to confess that this post is actually a preparation for A RANDOM Maze with Curses - providing algorithmic generation of mazes and extending the previous post A Maze with Curses. Don’t miss them!


Comments? Octodon, , GitHub, Reddit, or drop me a line!