background preloader

Morphology

Facebook Twitter

Morphotactics. Morphotactics. Morphotactics represent the ordering restrictions in place on the ordering of morphemes. Etymologically, it can be translated as "the set of rules that define how morphemes (morpho) can touch (tactics) each other". Example of a morphotactic rules[edit] (in English) Plural ^s follows Noun^z cannot follow Noun [meaningless - see talk page] Common morphotactic model[edit] Finite-state machine and Graph[disambiguation needed] are the two models which are often used as a [?]

References[edit] Morphology and Computation, By Richard William Sproat. Lexicon. Stemming. Stemming programs are commonly referred to as stemming algorithms or stemmers.

Stemming

Examples[edit] A stemmer for English, for example, should identify the string "cats" (and possibly "catlike", "catty" etc.) as based on the root "cat", and "stemmer", "stemming", "stemmed" as based on "stem". A stemming algorithm reduces the words "fishing", "fished", and "fisher" to the root word, "fish".

On the other hand, "argue", "argued", "argues", "arguing", and "argus" reduce to the stem "argu" (illustrating the case where the stem is not itself a word or root) but "argument" and "arguments" reduce to the stem "argument". History[edit] Porter Stemming Algorithm. Porter Stemmer description. Morpheme. Classification of morphemes[edit] Free vs. bound[edit] Every morpheme can be classified as either free or bound.[3] These categories are mutually exclusive, and as such, a given morpheme will belong to exactly one of them. Combining Morphemes. Clitic. Clitics can belong to any grammatical category, although they are commonly pronouns, determiners, or adpositions.

Clitic

Note that orthography is not always a good guide for distinguishing clitics from affixes: clitics may be written as separate words, but sometimes they are joined to the word on which they depend (like the Latin clitic que, meaning "and"), or separated by special characters such as hyphens or apostrophes (like the English clitic ’s). The word "clitic" is often used loosely for what may be better described as an affix or word.

[citation needed] Classification[edit] Clitics fall into various categories depending on their position in relation to the word to which they are connected.[2] Proclitic[edit] A proclitic appears before its host.[2] It is common in Romance languages. Enclitic[edit] An enclitic appears after its host.[2] Latin: Senatus Populusque Romanus "Senate people-and Roman" = "The Senate and people of Rome" Ancient Greek: ánthrōpoí (te) theoí te. Inflection. Inflection of the Portuguese or Spanish lexeme for "cat", which produces the forms gato, gata, gatos and gatas. Blue represents masculine gender, pink represents feminine gender, grey represents the form used for mixed gender; green represents plural number, while singular number is unmarked. Compound (linguistics) Compound formation rules vary widely across language types.

In a synthetic language, the relationship between the elements of a compound may be marked with a case or other morpheme. For example, the German compound Kapitänspatent consists of the lexemes Kapitän (sea captain) and Patent (license) joined by an -s- (originally a genitive case suffix); and similarly, the Latin lexeme paterfamilias contains the archaic genitive form familias of the lexeme familia (family). Conversely, in the Hebrew language compound, the word בֵּית סֵפֶר bet sefer (school), it is the head that is modified: the compound literally means "house-of book", with בַּיִת bayit (house) having entered the construct state to become בֵּית bet (house-of).

This latter pattern is common throughout the Semitic languages, though in some it is combined with an explicit genitive case, so that both parts of the compound are marked. Agglutinative languages tend to create very long words with derivational morphemes. Dutch: Derivation (linguistics) In linguistics, derivation is the process of forming a new word on the basis of an existing word, e.g. happiness and unhappy from happy, or determination from determine. It often involves the addition of a morpheme in the form of an affix, such as -ness, un- and -ation in the preceding examples.

Derivation stands in contrast to the process of inflection, which means the formation of grammatical variants of the same word, as with determine/determines/determining/determined.[1] Examples of English derivational patterns and their suffixes: Derivation that results in a noun may be called nominalization. Affix. Positional categories of affixes[edit] Affixes are divided into plenty of categories, depending on their position with reference to the stem. Prefix and suffix are extremely common terms.

Infix and circumfix are less so, as they are not important in European languages. The other terms are uncommon. Prefix and suffix may be subsumed under the term adfix in contrast to infix. When marking text for interlinear glossing, as in the third column in the chart above, simple affixes such as prefixes and suffixes are separated from the stem with hyphens. Lexical affixes[edit] Lexical affixes (or semantic affixes) are bound elements that appear as affixes, but function as incorporated nouns within verbs and as elements of compound nouns. Lexical affixes are relatively rare.

The lexical suffixes of these languages often show little to no resemblance to free nouns with similar meanings. Lexical suffixes when compared with free nouns often have a more generic or general meaning. Word stem. In linguistics, a stem is a part of a word. The term is used with slightly different meanings. In a slightly different usage, which is adopted in the remainder of this article, a word has a single stem, namely the part of the word that is common to all its inflected variants.[2] Thus, in this usage, all derivational affixes are part of the stem.

For example, the stem of friendships is friendship, to which the inflectional suffix -s is attached. Part-of-speech tagging. Once performed by hand, POS tagging is now done in the context of computational linguistics, using algorithms which associate discrete terms, as well as hidden parts of speech, in accordance with a set of descriptive tags.

Part-of-speech tagging

POS-tagging algorithms fall into two distinctive groups: rule-based and stochastic. E. Brill's tagger, one of the first and most widely used English POS-taggers, employs rule-based algorithms. Principle[edit] Part-of-speech tagging is harder than just having a list of words and their parts of speech, because some words can represent more than one part of speech at different times, and because some parts of speech are complex or unspoken. Finite state transducer. A finite state transducer (FST) is a finite state machine with two tapes: an input tape and an output tape.

Finite state transducer

This contrasts with an ordinary finite state automaton (or finite state acceptor), which has a single tape. Overview[edit] An automaton can be said to recognize a string if we view the content of its tape as input. In other words, the automaton computes a function that maps strings into the set {0,1}. Alternatively, we can say that an automaton generates strings, which means viewing its tape as an output tape. Each string-to-string finite state transducer defines a relation R on Σ×Γ.