I end this section with a few points that need checking with a linguist.
The present version of the generator does not quite do all that might be
expected. Firstly, it can't deal with multiple irregular forms. As
mentioned in section 3.2.6,
if a feature-set matches more than one lexical entry,
then an error is raised. This is as a protection against insufficiently
specific feature-sets. Presumably, any feature-set that comes straight
from D-LookUp
or D-Morpheme
is going to contain enough
features that it will match only one entry. However, in the
Assistant, feature-sets are translated to a different form, and may then
be modified before translating back. It's possible that some essential
features might be lost during this process, so a bug-check seems worth
including.
The best way to handle this is, I think, that any lexical entry for an
irregular form should, if it has an alternative irregular, be explicitly
marked, perhaps with a (GEN ALTERNATIVE-IRREGULARS)
feature. If
the generator finds that two such alternatives both match, it could then
supress the error-check; otherwise, it should raise it as it does now. I
haven't implemented this because I don't know whether there are any such
forms in English.
Secondly, it can't deal with words which have both an irregular and
a regular form, such as hung/hanged
or lay/lied. (Note that
the root does not have any meaning marked, so one can't distinguish
on the basis of word-sense.) In these cases, we would want it to
generate the irregular by scanning the IFI, and then go on to
generate a regular, by affixation. However, we can't allow it to do
this in general, or it would always generate regular forms, no
matter what the root. The solution here appears to be another
marker. If the lexical entry for an irregular form has also a
regular form, then mark it with (e.g.) a
(GEN ALTERNATIVE-REGULAR)
feature. If the generator finds such an entry, then it also performs
regular affixation, otherwise it returns the irregular form only. I
haven't implemented this because it would need a thorough
examination of the lexicon; best to check with a linguist first.
Thirdly, the IFI is bigger than necessary. It includes all the
non-inflectable words like prepositions, as well as noun singulars
and other things whose surface form is the same as their root. Since
the IFI index is held in core (regardless of the incore-flag), this
wastes space. It would be better to say that if a word can't be
found in the IFI, and no generational rules apply, then we just
return the root form unchanged. This is actually done at the moment,
for reasons discussed below. So we need a way to detect entries with
an identical surface form and root, and avoid entering them into the
IFI during a call to D-MakeLexicon
. This could be done, I think,
just by comparing the two fields and not creating an IFI entry if
they are equal.
Fourthly, derivational affixes. If the generator tries to pluralise
motherhood, it will use the skeleton ("mother" "+hood")
and take
the right-hand morpheme. Finding no irregular forms for the plural
of this, it will add a plural s. If instead it were to generate
the singular, then it would first look in the IFI for a lexical
entry for +hood which matches the feature-set for a noun. Of
course there are none, because +hood is not a noun. So it tries
the affixation rules, and none match because there are none for noun
singulars. So it returns the root unchanged: this is why I added
that default.
Is this always going to work? I think so, but again I'd like to check.