« Abraham Minkowski Controversy Grin PS structures »

Perl Weekly Challenge 96.

My solutions (task 1 and task 2) to the The Weekly Challenge - 096.

Task 1: Reverse Words

Submitted by: Mohammad S Anwar You are given a string $S.

Write a script to reverse the order of words in the given string. The string may contain leading/trailing spaces. The string may have more than one space between words in the string. Print the result without leading/trailing spaces and there should be only one space between words. Example 1:

Input: $S = "The Weekly Challenge"
Output: "Challenge Weekly The"
Example 2:
Input: $S = "    Perl and   Raku are  part of the same family  "
Output: "family same the of part are Raku and Perl"

This seems a very simple task using the Perl’s reverse function. Then, the task may be performed in a oneliner.

perl -E 'say "\"", join(" ", reverse split " ", $_),"\"" foreach(@ARGV)' \
"The Weekly Challenge" \
"    Perl and   Raku are  part of the same family  "

Results:

"Challenge Weekly The"
"family same the of part are Raku and Perl"

Here I used that leading spaces are removed when splitting on a whitespace character. Nevertheless, I guess not only spaces, but all punctuation (non-words) ought to be removed. Thus I guess it is better to split on /\W+/.

perl -E 'say "\"", join(" ", reverse split /\W+/, $_),"\"" foreach(@ARGV)' \
"The Weekly Challenge" \
"    Perl and   Raku are  part of the same family  "

Results:

"Challenge Weekly The"
"family same the of part are Raku and Perl "

Unfortunately, splitting this way produces a null first element when the string starts with whitespace. That is the reason the last result above ended in a space. Thus, I filter out null elements.

perl -E 'say "\"", join(" ", grep $_, reverse split /\W+/, $_),"\"" foreach(@ARGV)' \
"The Weekly Challenge" \
"    Perl and   Raku are  part of the same family  "

Results:

"Challenge Weekly The"
"family same the of part are Raku and Perl"

Thus the complete program would be

# Perl weekly challenge 096
# Task 1: Reverse words
# Print words in reverse order separated by one space
# Remove all non-word characters
# See https://wlmb.github.io/2021/01/18/PWC096/#task-1-reverse-words

use warnings;
use strict;
use v5.10;

say "\"", join(" ", grep $_, reverse split /\W+/, $_),"\"" foreach(@ARGV);

I try it out with a few examples:

./ch-1.pl \
         "The Weekly Challenge" \
         "    Perl and   Raku are  part of the same family  " \
	 " What happens with punctuation signs? "

Results:

"Challenge Weekly The"
"family same the of part are Raku and Perl"
"signs puntuation with happens What"

Task 2: Edit Distance

Submitted by: Mohammad S Anwar You are given two strings $S1 and $S2.

Write a script to find out the minimum operations required to convert $S1 into $S2. The operations can be insert, remove or replace a character. Please check out Wikipedia page for more information.

Example 1:

Input: $S1 = "kitten"; $S2 = "sitting"
Output: 3

Operation 1: replace 'k' with 's'
Operation 2: replace 'e' with 'i'
Operation 3: insert 'g' at the end

Example 2:

Input: $S1 = "sunday"; $S2 = "monday"
Output: 2

Operation 1: replace 's' with 'm'
Operation 2: replace 'u' with 'o'

This is a very interesting and useful task. As such, there are several modules in CPAN that perform it. For example, Text::Fuzzy returns the distance and the edits encoded as a string with the letters k for keep, d for delete, r for replace and i for insert. With it, the task becomes a one liner.

perl -MText::Fuzzy=distance_edits -E '($x,$y)=@ARGV; ($distance,$edits)=distance_edits($x,$y);' \
     -E 'say "Strings: $x $y\nOutput: $distance\nOperations: $edits"' kitten sitting
perl -MText::Fuzzy=distance_edits -E '($x,$y)=@ARGV; ($distance,$edits)=distance_edits($x,$y);' \
     -E 'say "Strings: $x $y\nOutput: $distance\nOperations: $edits"' sunday monday

Results:

Strings: kitten sitting
Output: 3
Operations: rkkkrki
Strings: sunday monday
Output: 2
Operations: rrkkkk

Nevertheless, it is a nice exercise to implement the algorithm from scratch. I start by reading the strings from @ARGV, splitting them and counting their characters.

# Perl weekly challenge 096
# Task 2: Edit distance.
# Calculate the number of editions required to edit a string and convert it into another.
# See https://wlmb.github.io/2021/01/11/PWC096/#task-2-edit-distance

  use warnings;
  use strict;
  use v5.10;

  # Get the two strings to compare from @ARGV
  sub usage {
      say "Usage:\n\t./ch-2.pl from_string to_string\n\tComputes the cost to edit one string and produce the other";
      exit 1;
  }
  usage() unless @ARGV==2;
  my ($from, $to)=@ARGV;
  my @from=split '', $from;
  my @to=split '', $to;
  my $from_size=@from;
  my $to_size=@to;

I follow the Wagner–Fischer algorithm from the Wikipedia. To that end I build a 2D array $costs each of whose entries ij correspond to the optimal conversion of the leading i characters of one string to the leading j characters of the other. I also build a 2D array $operations with an indicator for the last optimal operation performed to convert the corresponding substrings. I first initialize these arrays with the $costs->[$i][0]~ to convert substrings of $from to a null string, and the $costs->[0][$j] to convert a null string to substrings of $to.

my $costs;
my $operations;
# initialize the costs of $from->'' and ''->$to
$costs->[$_][0]=$_ foreach(0..$from_size); #deletions
$operations->[$_][0]='d' foreach(1..$from_size);
$costs->[0][$_]=$_ foreach(0..$to_size); #insertions
$operations->[0][$_]='i' foreach(1..$to_size);

Then I iteratively choose the optimal strategy for converting a substring of $from to a substring of $to

#Build costs matrix and choose best operations for each pair of substrings.
foreach my $j(1..$to_size){
    foreach my $i(1..$from_size){
	my $subcost=$from[$i-1] eq $to[$j-1]?0:1; # substitution cost
	my($cost,$operation);
	($cost,$operation)=($costs->[$i-1][$j-1]+$subcost,
				  $subcost?"r":"k"); #for keep or substitute
	($cost,$operation)=($costs->[$i][$j-1]+1, "i")
	    if $costs->[$i][$j-1]+1 < $cost; #insertion
	($cost,$operation)=($costs->[$i-1][$j]+1, "d")
	    if $costs->[$i-1][$j]+1 < $cost; #deletion
	$costs->[$i][$j]=$cost;
	$operations->[$i][$j]=$operation;
    }
}

The next step is to find the total cost and the sequence of operations to perform by following the $operations array backwards using the fact that replacements don’t alter the size of strings, while insertions and deletions change it by plus or minus 1.

my $total_cost=$costs->[$from_size][$to_size];
my @operations;
my ($i, $j)=($from_size, $to_size);
# Obtain optimum sequence of operations by examining the $operations array
while($i>0 || $j>0){
    my $operation=$operations->[$i][$j];
    if($operation eq 'k'){
	unshift @operations, "(Keep $from[$i-1])";
	--$i;
	--$j;
	next;
    }
    if($operation eq 'r'){
	unshift @operations, "Replace $from[$i-1] by $to[$j-1]";
	--$i;
	--$j;
	next;
    }
    if($operation eq 'i'){
	unshift @operations, "Insert $to[$j-1]";
	--$j;
	next;
    }
	if($operation eq 'd'){
	unshift @operations, "Delete $from[$i-1]";
	--$i;
	next;
    }
    die "Wrong operation!"; # Shouldn't happen
}

Finally, I output the results.

say "Inputs: \"$from\" -> \"$to\"\nOutput: $total_cost\n";
say "Operation $_: $operations[$_-1]" foreach 1..@operations;
say;

Example 1:

./ch-2.pl "kitten" "sitting"

Results:

Inputs: "kitten" -> "sitting"
Output: 3

Operation 1: Replace k by s
Operation 2: (Keep i)
Operation 3: (Keep t)
Operation 4: (Keep t)
Operation 5: Replace e by i
Operation 6: (Keep n)
Operation 7: Insert g

Example 2:

./ch-2.pl "sunday" "monday"

Results:

Inputs: "sunday" -> "monday"
Output: 2

Operation 1: Replace s by m
Operation 2: Replace u by o
Operation 3: (Keep n)
Operation 4: (Keep d)
Operation 5: (Keep a)
Operation 6: (Keep y)

Other examples to check on null strings

./ch-2.pl "" ""
./ch-2.pl "abc" ""
./ch-2.pl "" "abc"

Results:

Inputs: "" -> ""
Output: 0


Inputs: "abc" -> ""
Output: 3

Operation 1: Delete a
Operation 2: Delete b
Operation 3: Delete c

Inputs: "" -> "abc"
Output: 3

Operation 1: Insert a
Operation 2: Insert b
Operation 3: Insert c

Written on January 18, 2021