WWW.ABSTRACT.DISLIB.INFO
FREE ELECTRONIC LIBRARY - Abstracts, online materials
 
<< HOME
CONTACTS



Pages:   || 2 | 3 |

«Finite State Methods in Morphological Analysis of Runyakitara Verbs Fridah KATUSHEMERERWE Makerere University, Uganda & Thomas HANNEFORTH Potsdam ...»

-- [ Page 1 ] --

Nordic Journal of African Studies 19(1): 1–22 (2010)

Finite State Methods in Morphological

Analysis of Runyakitara Verbs

Fridah KATUSHEMERERWE

Makerere University, Uganda

&

Thomas HANNEFORTH

Potsdam University, Germany

ABSTRACT

Previously, there has been a lack for an automatic analyser and generator for the word forms

of Runyakitara. In this paper, we present a computational model for grammatical

Runyakitara verbs. This model, RUNYAGRAM, is based on freely-available open-sourced finite-state methods and, in particular, the fsm2 interpreter. It captures the morphotactic structures with non-recursive context-free grammars supported by fsm2 and morphophonological alternations with a finite composition of commonly used context-dependent string rewriting rules. Their combination results into a finite state transducer that can be exported and used in numberless software-developing platforms. The obtained transducer is an important building-block that can be employed in comprehensive morphological analysers, syntactic parsers, spell-checkers, text-to-speech synthesizers, and machine translation systems. Currently, 86% of the verb forms are recognized. It is possible to increase the coverage, or alternatively, to adapt the approach of the RUNYAGRAM system to related languages.

Keywords: morphological analysis, finite state methods, Runyakitara verb.

1. INTRODUCTION One of the core enabling technologies required in natural language processing applications is a morphological analyzer. It is an established fact in computational linguistics that a morphological analyzer is a starting point for many natural language processing applications (Pretorius & Bosch, 2003; Yona & Wintner, 2005).

Computational morphology deals with automatic word-form recognition and generation. The general challenges posed by a computational morphological

analyzer, as described by Prestorious and Bosch, (2003), are twofold:

• Morphemes that make up words cannot combine at random, but are restricted to certain combinations and orders. A morphological analyzer needs to know which combinations of morphemes (morphotactics) are valid.

Nordic Journal of African Studies

• Morphemes may be realized in different ways depending on their context.

A morphological analyzer needs to recognize the morphophonological changes between lexical and surface forms (morphophonological alternation). Automatic morphological analyzers and generators must take into consideration the above issues.

Comprehensive morphological analyzers are available for well documented languages such as English, Swedish, German, Arabic, and Finnish (Karttunen & Beesley, 2005:77). Considerable work has also been achieved in employing finite state methods for Bantu language analysis: the Kiswahili morphological analyzer (Hurskainen, 1992; 1996; 2004); the Zulu analyzer prototype (Pretorius & Bosch, 2003), Lingala verb morphology (Karttunen, 2003), Ekegusii verb morphology (Elwell, 2005), Kinyarwanda (Muhirwe & Trosterud, 2008), and Setswana verb morphology (Pretorius, Berg, & Pretorius, 2009).

However, given the fact that Bantu languages are more than five hundred in number, almost all are still not treated. Although Bantu languages are classified as largely agglutinative and exhibit significant inherent structural similarity, they differ substantially in terms of their phonological features implying that each Bantu language requires an independent morphological analyzer.

Runyakitara is one of those under-resourced Bantu languages with no computational morphology. Bernsten (1998) splits Runyakitara into four major dialects: Runyankore, Runkiga, Runyoro, and Rutooro. Guthrie (1967) groups these four dialects into two languages belonging to Narrow Bantu branch of the Niger-Congo family, Nyankore-Kiga (E.13) and Nyoro-Ganda (E.11). For purposes of this paper, Runyakitara will be taken to mean two major language clusters mentioned above: Runyoro-Rutooro and Runyankore-Rukiga, denoted by R-R in the following.

Runyakitara is spoken by approximately six and half million (6,500,000) people in nineteen districts of Western Uganda. As a major language in Uganda, some parts of Tanzania and Democratic Republic of Congo, it is important that R-R is given computational attention because it has a large number of speakers, a language of media in western Uganda (two regular newspapers – one online) and a rich history and culture which should be preserved. Besides the language is a medium of instruction in lower levels of primary education in Western Uganda and we shall consider how computational efforts may add value to the language education. The morphology of a verb in R-R, as has been stressed by other Bantu researchers, (Hurskainen, 1992; Elwell, 2005) is one of the complex morphological systems known which means that it needs special attention.

Finite State Methods in Morphological Analysis of Runyakitara Verbs

2. RUNYAKITARA VERB MORPHOLOGY AND THE

COMPUTATIONAL CHALLENGE

A verb in a typical Bantu language will take on many prefixes and suffixes. The Runyakitara verb morphology poses the following challenges to computational modeling: a. number of morphemes, b. morpheme order, c. morpheme combination, d. allomorphs, and e. vowel harmony. These are discussed in the sub-sections below.





2.1 NUMBER OF MORPHEMES INVOLVED The Bantu verb template described in many studies suggests about 8 to 15

morpheme slots as follows:

–  –  –

Table 1. Bantu Verb Template (Nurse & Philippson, 2003).

The above generic template raises many questions if one considers it with respect to R-R morphology: what is considered a morpheme on the template? If verb extension, (in Slot 7) is a morpheme, does it mean that such extensions as causative, applicative, passive, etc are allomorphs of the same morpheme? This and many other questions prompted us to devise an R-R verb template to cater more specifically for a number of morphemes present in the language.

There are many morphemes involved in the formation of R-R verbs;

therefore, it is important to expand the template. These can be broadly classified as prefixes, (morphemes left of Slot 0) root (Slot 0) and suffixes (morphemes after Slot 0). The following template shows morphemes involved in the

formation of Runyakitara verbs:

–  –  –

Table 2. Runyakitara Verb Template.

Note: Slot 0 represents root, to the right of 0 are suffixes to the root. Slot 1 is for verb extensions as: Ca – causative, Apl – applicative, Rec – reciprocal, Pas – passive, Int – intensive, Stat – stative, Rev – reversive. Slot 2 represents Verb end: Ind – indicative, subj – subjunctive, past – past tense. Slot 3 indicates post final morphemes: pf1 – post-final 1; pf2 – post-final2. On the left of zero, -1 Asp – aspect, -2 – object pronouns, -3 Tense/aspect markers, -4 – Ng2 – Negative 2, -5 Sp – subject prefix; -6 Asp – aspect; -7 Ng1 – Negative 1. For a more description and examples, refer to Appendix A.

Finite State Methods in Morphological Analysis of Runyakitara Verbs Runyakitara has typical characteristics of template morphology as it is outlined by Spencer. As observed by Spencer (1991), template morphology poses a computational challenge. According to Spencer, template morphology is a morphological system where a verb stem or root consists of obligatory affix(es) as well as a set of optional affix(es). The combinations of morphemes make automatic analysis difficult because one has to sort out first which affixes fit to the root to form specific verb forms.

Adding to the number of morphemes involved, subject and object pronouns mark agreement with the noun classes in question. In case the subject is not included, they serve as subject and object pronouns. These markers appear on the verb root as prefixes to the root. R-R has eighteen (18) noun classes, therefore subject and object pronouns add up to 18 in each case. In addition, R-R is a type 3 language according to the classification given by Maho (2007), which means that it allows two or more objects in the construction. Evidence in Runyakitara shows that the language can have a double object construction, that is, a verb can have a marker for both direct and indirect objects in the same construction. An example in this case is mu-mu-n-kwat-ire (you grab/hold him for me), where mu-n indicate double objects representing him and me. This will add to the number of morphemes, indicating that a number of morphemes is large enough to pause a challenge.

2.3 MORPHEME COMBINATION Much as some studies have been carried out on combination of morphemes in Bantu languages, (Hayman, 2007) limited research is available for Runyakitara morpheme combination. This is specifically in reference to verb extensions. As earlier noted by Hayman, (2007) verb extensions are difficult to analyze mainly because of various functions and also, they are numerous and often occur in long successions. Runyakitara has seven (7) verbal extensions which can be added to the root individually or in combination. For example, one can have a verb with

verb extensions such as:

reeb-a (see) reeb-es-a (see with), reeb-an-a (see each other), reeb-w-a (be seen), reeb-es-an-a (make each other to see), reeb-an-is-a (make to see each other), reeb-es-an-is-ibw-a (be made to make them see each other).

In the last example, [es, an, w, is, ibw] are all verb extensions playing different roles. The order of causative morphs es and is in the above example is different Nordic Journal of African Studies and there is no study available that has established the combination of verbal extensions in Runyakitara, and the order in which they can follow one another.

2.2 MORPHEME ORDER Although the Bantu verb template is presumed to present a fixed order of morphemes, and provides Slot 4 in Table 1, for example, as a slot for tense aspect markers, some morphemes in Runyakitara violate the order. Specific cases are: progressive ni, reflexive e and past ire which violate the order of Bantu template. As indicated on Runyakitara template in Table 2, ni comes before the subject marker in the construction while other tense/aspect markers follow the subject marker e.g.

ni-ba-mu-reeb-a (they are seeing him) ba-ka-mu-reeb-a (they saw him [last year or some months back]).

Ba-mu-reeb-ire (they saw him [yesterday]) In the above verb constructions, ni, ka, and ire are tense/aspect markers but appear in different positions in respect to the root.

Also, the order of verb extensions on the template does not necessarily mean that it is the order of their construction. That is to say, extensions can attach to verbs depending on the argument structure. So, there is not fixed order in which they are supposed to appear in the construction of the verb. For example, a verb root reeb-a (see) reeb-es-a (see with) reeb-an-a (see each other) reeb-es-an-a (make each other to see) reeb-an-is-a (make … to see each other) reeb-an-is-ibw-a (be made to make … see each other).

reeb-er-a (see for) reeb-er-an-a (see for each other) All this indicates that there is a lot of flexibility regarding which morphemes precede and follow one another because is and es are all causatives.

2.4 ALLOMORPHY Runyakitara has various allomorphs, that is, different realizations of the same morphemes. A case in point here is a causative morpheme which has four different realizations [es/is/iz/s/y]. Applicative, passive, stative and reversive morphemes are no exception. All these pose a challenge to computational modeling.

Finite State Methods in Morphological Analysis of Runyakitara Verbs

2.5 VOWEL HARMONY Katamba, (1984) analyses vowel harmony of verb extensions in Luganda, a language closely related to Runyakitara. His analysis, which classifies the vowels involved in harmony as mid and nonmid gives an understanding of existence of vowel harmony in the language but does not aid much when it comes to formalizing morphemes for computational purposes. The reason is that it is difficult to identify the location of mid and nonmid vowels in the string. The suggestion provided by Morris and Kirwan (1972) of a penultimate syllables is useful here. Penultimate syllable is a syllable preceding the final. Penultimate, which means before last, can easily aid one to locate the vowel in question. For example, in the word bo-ro-go-ta, (flow of water) the penultimate is ‘go’. This aided in understanding that when a penultimate syllable is e, o, the causative extension will be es. On the other hand, when the penultimate syllable is a, i or u, the causative extension will be is or iz. The same applies to applicative, intensive and stative.

Given the nature of Runyakitara morphology, it was important to carefully select the formalization approach appropriate to the structure. Therefore, a phrase structure grammar was identified to appropriately handle the concatenative nature of Runyakitara morphology. Rules proposed by Selkirk (Spencer, 1991), were applied, written as W+A for suffixing; and A+W for prefixing. However, it was clear that, the rules Selkirk proposed only account for concatenative nature of morphology. It was important therefore to also think of the way of handling morpho-phonological and orthographical processes.

However, they are helpful for Runyakitara concatenative morphology.

3. FORMALIZATION AND IMPLEMENTATION

Given the nature of Runyakitara morphology, it was important to carefully select an appropriate approach. The concatenative nature of Runyakitara morphology can be captured with Phrase Structure Grammar (PSG) along the lines of Selkirk (Spencer, 1991) who proposes phrase-structure like rules written as W+A for suffixing and A+W for prefixing. However, it was clear that the rules Selkirk proposed only account for concatenative morphology. It was important therefore to also think of the way of handling morpho-phonological and orthographical processes. Because recursion is not needed, we describe both the concatenative rules and phonological processes in the framework of finite-state acceptors (FSA) / transducers (FST). Our approach relies heavily on the closure properties of these automata under intersection, composition, and substitution (see Hopcroft & Ullman, 1979, Kaplan & Kay, 1994).



Pages:   || 2 | 3 |


Similar works:

«Domestic dogs are sensitive to a human’s perspective Juliane Kaminski1,2,3), Juliane Bräuer2), Josep Call2) & Michael Tomasello2) (1 Sub Department of Animal Behaviour, University of Cambridge, High Street, Madingley, Cambridge, CB3 8AA, UK; 2 Max Planck Institute for Evolutionary Anthropology Deutscher Platz 6, D-04103 Leipzig, Germany) (Accepted: 20 November 2008) Summary We investigated dogs’ ability to take the visual perspective of humans. In the main study, each of two toys was...»

«P1 455 12pp Intro Leafletv3:Layout 1 16/3/11 11:20 Page 2 Tourette Syndrome Want to know more? WHAT MAKES US TIC? www.tourettes-action.org.uk P1 455 12pp Intro Leafletv3:Layout 1 16/3/11 11:20 Page 3 This publication is intended as a general introduction for those recently diagnosed with Tourette Syndrome, their families and colleagues, and for those who wish to know more about the condition. Diagnosis may be a stressful time and questions about the condition may be detailed and specific. For...»

«CONSUMER PROTECTION COMMITTEE of the SUFFOLK COUNTY LEGISLATURE MINUTES A regular meeting of the Consumer Protection Committee of the Suffolk County Legislature was held in the Rose Y. Caracappa Legislative Auditorium of the William H. Rogers Legislature Building, 725 Veterans Memorial Highway, Smithtown, New York on September 7, 2010.MEMBERS PRESENT: Leg. Ricardo Montano, Chairman Leg. Jay H. Schneiderman, Vice Chair Leg. Thomas F. Barraga Leg. DuWayne Gregory Leg. Edward P. Romaine ALSO IN...»

«Trouble Shooting on e-Filing  DSC Registration Problem Description: I am not able to register my Digital Signature Certificate. Or while trying to e-File Income Tax Return using Digital Signature Certificate, the 'Select your.pfx file' or 'Select with your USB Token' buttons are not displayed or are not clickable.Corrective Action: This occurs due to the following reasons: Check if Java Runtime Environment 1.7 or above is installed in your PC. If it is not  installed, click here to...»

«VISTA Navigator™ Users Guide v1.00 Vista Navigator™ Copyright © 2011 Allen Organ Company All Rights Reserved P/N 033-00162 Revised 9/2015 CONTENTS INTRODUCTION GETTING TO KNOW VISTA NAVIGATOR Start-Up Vista Navigator: Voice Selection Screen Vista Navigator: Copying Piston Registrations Vista Navigator: Recorder a. Simple Recorder b. Advanced Recorder c. Recorder Common Controls: Vista Navigator: Capture Vista Navigator: Setup Vista Navigator: Remote Control APPENDIX A: MIDI KEY VALUE CHART...»

«State Capacity and Bureaucratic Autonomy Within National States: Mapping the Archipelago of Excellence in Brazil1 Katherine Bersch, Sérgio Praça, and Matthew M. Taylor2 Paper prepared for presentation at The Latin American Studies Association Conference Washington D.C. May 29 – June 1, 2013 Comments are welcome: mtaylor@american.edu Katherine Bersch is Ph.D. Candidate, University of Texas – Austin; Sérgio Praça is assistant professor at the Federal University of the ABC, in São Paulo,...»

«Library Mouse by Daniel Kirk A Choose to Read Ohio Toolkit About the Book Every child can be a writer—and Library Mouse will show you how! Beloved children’s books author and illustrator Daniel Kirk brings to life the story of Sam, a library mouse. Sam’s home was in a little hole in the wall in the children’s reference books section, and he thought that life was very good indeed. For Sam loved to read. He read picture books and chapter books, biographies and poetry, ghost stories and...»

«Lunch / Recess Handbook Wood Acres Elementary 5800 Cromwell Drive Bethesda, MD 20816 2011-2012 1 Table of Contents Introduction.. 3 Lunch/Recess Schedule.. 5 Cafeteria Code of Conduct (Explanation). 6 Cafeteria Guidelines.. 7 How You can Help in Cafeteria Restroom Use Appropriate Strategies to Get Students Attention Nuts-free Table guidelines.. 8–9 Nuts-free Lunch Table Q & A Signs of an Allergic Reaction Allergic Reaction – Steps to Take Discipline... 10 Key Points to Remember When...»

«CMIWorkIngPAPEr Why Firms should not always Maximize Profits Ivar Kolstad WP 2006: 11 Why Firms should not always Maximize Profits Ivar Kolstad WP 2006: 11 CMI Working Papers This series can be ordered from: Chr. Michelsen Institute P.O. Box 6033 Postterminalen, N-5892 Bergen, Norway Tel: + 47 55 57 40 00 Fax: + 47 55 57 41 66 E-mail: cmi@cmi.no www.cmi.no Price: NOK 50 ISSN 0805-505X ISBN 82-8062-154-7 This report is also available at: www.cmi.no/publications Indexing terms Profit maximization...»

«DECRETO SUPREMO N° 28168 CARLOS D. MESA GISBERT PRESIDENTE CONSTITUCIONAL DE LA REPÚBLICA C O N S I D E R A N D O: Que el derecho de las personas a la información, consistente en el ejercicio de la libertad de buscar, recibir y difundir informaciones e ideas de toda índole, sin consideración de fronteras, ya sea oralmente, por escrito, en forma impresa, artística o por cualquier otro procedimiento de su elección, se encuentra reconocido, en su aspecto individual, en el inciso b) del...»

«Olivo Barbieri Interview 7 April – 20 May 2006 Opening times Mon Sat, 11:00 18:00 Bloomberg SPACE 50 Finsbury Square London, EC2A 1HD gallery@bloomberg.net Since the end of the 19th century, as populations swelled and an urban existence was defined, the city has provided a significant subject for the modern artist. The concurrent development of the camera provided a means to create an objective accuracy, or a closeness to life, which recorded these new places – their streets, buildings,...»

«Boards, Committees and Commissions Public Version 2016 March Table of Contents ADVISORY PLANNING COMMISSION 3 AUDIT COMMITTEE 4 BOARD OF VARIANCE 5 COMMUNITY HERITAGE COMMISSION 6 ENVIRONMENT COMMITTEE 7 EXECUTIVE COMMITTEE OF COUNCIL 8 FINANCIAL MANAGEMENT COMMITTEE 9 INTERNATIONAL RELATIONS & FRIENDSHIP CITIES COMMITTEE 10 PARKS, RECREATION AND CULTURE COMMISSION 11 PLANNING AND DEVELOPMENT COMMITTEE 12 PUBLIC LIBRARY BOARD 13 PUBLIC SAFETY COMMITTEE 14 SCHOOL BOARD 15 SIMON FRASER LIAISON...»





 
<<  HOME   |    CONTACTS
2017 www.abstract.dislib.info - Abstracts, online materials

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.