The final problem to do with morphological generation is what we do once we have a list of morphemes. So far, I have ignored the question of spelling rules. Despite the irregularities of English spelling, there are some regularities that can easily be stated. One is that when pluralising some nouns, or making some verbs into their third-person singular form, an e is added. Thus box/boxes; go/goes.
It would complicate the lexicon unduly if this had to be dealt with in
the lexical entries, e.g. by having both an s and a es
plural suffix. Instead, when analysing a word, the MA works on a form
which has been ``cleaned up'' by certain spelling rules. The file
d-sp
contains the rules used with our lexicon. One of these, also
listed in section 6.2 of [MA], is ``epenthesis'', which
describes the spelling change above. On trying to segment a word into
morphemes, the MA applies this rule. So, given boxes, it (in
effect) finds itself trying to segment boxs. This can be done by
looking up the standard suffix s.
For morphological generation, once we have generated a set of morphemes,
we need to run these rules backwards. Otherwise, if we just concatenated
the morphemes, we'd end up with deviant forms like (*) boxs, (*)
moveed, (*) traveling. Luckily, unlike the word-grammar
rules, the MA already has a function for running the spelling rules
backwards. This is D-MorphemeConcat
. Its first argument is a list
of morphemes (which will be derived from our skeleton); the second is a
list of flags.
At the present, the only flag allowed is NONULLS
, which
suppresses 0
characters in the result. If you omit this, you get
forms like super0man
returned. In the version of the MA that I
started with, this flag didn't work, and I had to remove the zeros by
another means. However, Alan Black has supplied a fix, which I've
applied to my MA.
D-MorphemeConcat
will sometimes return more than one possible
concatenation. The only occasion I have seen this happen is with
hyphenation. If you call it on the list ("super-" "man")
, then
the result includes both superman and super-man. This is
fair, since hyphens are ignored on analysis, but it does mean that the
Assistant must be prepared for both variants.