[ Jocelyn Ireson-Paine's Home Page | Publications | Dobbs Code Talk Index | Dobbs Blog Version ]

Snobol Patterns in Prolog II: Span with Count

In Arbno, the Cursor, and Snobol Patterns in Prolog, I showed how to implement Snobol pattern matching with Prolog Definite Clause Grammars (DCGs), listing the Prolog code for "any", "span", the "$" immediate-assignment operator, and two attempts at "arbno". Here is another pattern I've found useful, a "span" variant that also counts the characters matched. And I'd like to ask: what higher-order abstraction would you use to implement both this and the normal "span"?

The code for my pattern, which I've named "span_count", is a simple variation on the "span" that I listed in Arbno, the Cursor, and Snobol Patterns in Prolog, but some may find it useful. First, here are some calls to it, in SWI-Prolog:

3 ?- phrase( span_count("abc",Len), "abccba", Rest ).
Len = 6,
Rest = [].

4 ?- phrase( span_count("abc",Len,Matched), "abccba", Rest ).
Len = 6,
Matched = [97, 98, 99, 99, 98, 97],
Rest = [].

5 ?- phrase( span_count("abcXYZ",Len,Matched), "abccba", Rest ).
Len = 6,
Matched = [97, 98, 99, 99, 98, 97],
Rest = [].

6 ?- phrase( span_count("abcXYZ",Len,Matched), "ab", Rest ).
Len = 2,
Matched = [97, 98],
Rest = [].

In these calls, I'm using the built-in phrase/3 predicate to invoke "span_count". Its second argument is the complete string to be matched — what Snobol calls the subject. For example "abccba". Its third argument, Rest, is what remains unmatched.

The first argument of span_count represents a set of characters S. The third argument, if present, is the substring it matches. This is the longest non-empty initial subsequence of the subject composed only of elements of S. And span_count's second argument, Len, is the length of this subsequence.

Here is my definition of span_count. There's nothing particularly special about it, though you may wish to note that it calls an auxiliary predicate, span_count/4, which uses accumulator arguments LenIn and LenOut for the length:

span_count( Chars, Len ) -->
  span_count( Chars, Len, _Matched ).


span_count( Chars, Len, Matched ) -->
  span_count( Chars, 0, Len, Matched ).


span_count( Chars, LenIn, LenOut, [ C | RestMatched ] ) -->
  any( Chars, [C] ),
  { LenNext is LenIn + 1 },
  span_count_tail( Chars, LenNext, LenOut, RestMatched ).


span_count_tail( Chars, LenIn, LenOut, Matched ) -->
  span_count( Chars, LenIn, LenOut, Matched ), !.

span_count_tail( _Chars, Len, Len, [] ) -->
  [].

The first clause of span_count/4 calls a simple pattern named "any". This imitates the Snobol pattern function "any", which returns a pattern that matches any character from a set passed as argument. Here are some calls to my Prolog "any":

7 ?- phrase( any("abc",Matched), "ab", Rest ).
Matched = [97],
Rest = [98].

8 ?- phrase( any("abc",Matched), "cX", Rest ).
Matched = [99],
Rest = [88].

9 ?- phrase( any("abc",Matched), "dX", Rest ).
false.

And this is "any"'s code:

any( Chars ) -->
  any( Chars, _Matched ).


any( Chars, [C] ) -->
  [ C ],
  { memberchk( C, Chars ) }, !.

If you look back at my essay Arbno, the Cursor, and Snobol Patterns in Prolog, you will see that span_count is, as I said earlier, a simple variation on the "span" I showed there. So is there a single abstraction with which I could implement both? Specifically, is there a higher-order predicate — one that takes another predicate as argument — with which I could implement both? That's what I'll talk about next.