The spelling rules


next up previous
Next: Summary of changes to the MA's files
Up: Implementing morphological generation within the MA
Previous: Generating regular forms

The spelling rules

 

The final problem to do with morphological generation is what we do once we have a list of morphemes. So far, I have ignored the question of spelling rules. Despite the irregularities of English spelling, there are some regularities that can easily be stated. One is that when pluralising some nouns, or making some verbs into their third-person singular form, an e is added. Thus box/boxes; go/goes.

It would complicate the lexicon unduly if this had to be dealt with in the lexical entries, e.g. by having both an s and a es plural suffix. Instead, when analysing a word, the MA works on a form which has been ``cleaned up'' by certain spelling rules. The file d-sp contains the rules used with our lexicon. One of these, also listed in section 6.2 of [MA], is ``epenthesis'', which describes the spelling change above. On trying to segment a word into morphemes, the MA applies this rule. So, given boxes, it (in effect) finds itself trying to segment boxs. This can be done by looking up the standard suffix s.

For morphological generation, once we have generated a set of morphemes, we need to run these rules backwards. Otherwise, if we just concatenated the morphemes, we'd end up with deviant forms like (*) boxs, (*) moveed, (*) traveling. Luckily, unlike the word-grammar rules, the MA already has a function for running the spelling rules backwards. This is D-MorphemeConcat. Its first argument is a list of morphemes (which will be derived from our skeleton); the second is a list of flags.

At the present, the only flag allowed is NONULLS, which suppresses 0 characters in the result. If you omit this, you get forms like super0man returned. In the version of the MA that I started with, this flag didn't work, and I had to remove the zeros by another means. However, Alan Black has supplied a fix, which I've applied to my MA.

D-MorphemeConcat will sometimes return more than one possible concatenation. The only occasion I have seen this happen is with hyphenation. If you call it on the list ("super-" "man"), then the result includes both superman and super-man. This is fair, since hyphens are ignored on analysis, but it does mean that the Assistant must be prepared for both variants.


next up previous
Next: Summary of changes to the MA's files
Up: Implementing morphological generation within the MA
Previous: Generating regular forms



Jocelyn Ireson-Paine
Wed Feb 14 17:12:29 GMT 1996