A note on multiple forms and other problems

Next: References
Up: Implementing morphological generation within the MA
Previous: Summary of changes to the lexicon and rules

A note on multiple forms and other problems

I end this section with a few points that need checking with a linguist. The present version of the generator does not quite do all that might be expected. Firstly, it can't deal with multiple irregular forms. As mentioned in section 3.2.6, if a feature-set matches more than one lexical entry, then an error is raised. This is as a protection against insufficiently specific feature-sets. Presumably, any feature-set that comes straight from D-LookUp or D-Morpheme is going to contain enough features that it will match only one entry. However, in the Assistant, feature-sets are translated to a different form, and may then be modified before translating back. It's possible that some essential features might be lost during this process, so a bug-check seems worth including.

The best way to handle this is, I think, that any lexical entry for an irregular form should, if it has an alternative irregular, be explicitly marked, perhaps with a (GEN ALTERNATIVE-IRREGULARS) feature. If the generator finds that two such alternatives both match, it could then supress the error-check; otherwise, it should raise it as it does now. I haven't implemented this because I don't know whether there are any such forms in English.

Secondly, it can't deal with words which have both an irregular and a regular form, such as hung/hanged or lay/lied. (Note that the root does not have any meaning marked, so one can't distinguish on the basis of word-sense.) In these cases, we would want it to generate the irregular by scanning the IFI, and then go on to generate a regular, by affixation. However, we can't allow it to do this in general, or it would always generate regular forms, no matter what the root. The solution here appears to be another marker. If the lexical entry for an irregular form has also a regular form, then mark it with (e.g.) a (GEN ALTERNATIVE-REGULAR) feature. If the generator finds such an entry, then it also performs regular affixation, otherwise it returns the irregular form only. I haven't implemented this because it would need a thorough examination of the lexicon; best to check with a linguist first.

Thirdly, the IFI is bigger than necessary. It includes all the non-inflectable words like prepositions, as well as noun singulars and other things whose surface form is the same as their root. Since the IFI index is held in core (regardless of the incore-flag), this wastes space. It would be better to say that if a word can't be found in the IFI, and no generational rules apply, then we just return the root form unchanged. This is actually done at the moment, for reasons discussed below. So we need a way to detect entries with an identical surface form and root, and avoid entering them into the IFI during a call to D-MakeLexicon. This could be done, I think, just by comparing the two fields and not creating an IFI entry if they are equal.

Fourthly, derivational affixes. If the generator tries to pluralise motherhood, it will use the skeleton ("mother" "+hood") and take the right-hand morpheme. Finding no irregular forms for the plural of this, it will add a plural s. If instead it were to generate the singular, then it would first look in the IFI for a lexical entry for +hood which matches the feature-set for a noun. Of course there are none, because +hood is not a noun. So it tries the affixation rules, and none match because there are none for noun singulars. So it returns the root unchanged: this is why I added that default.

Is this always going to work? I think so, but again I'd like to check.

Next: References
Up: Implementing morphological generation within the MA
Previous: Summary of changes to the lexicon and rules

Jocelyn Ireson-Paine
Wed Feb 14 17:12:29 GMT 1996