Morphotactics. English[edit] Noun[edit] morphotactics (uncountable) The ordering restrictions in place on the ordering of morphemes. Related terms[edit] morphology References[edit] Morphology and Computation, By Richard William Sproat. Morphotactics. Morphotactics represent the ordering restrictions in place on the ordering of morphemes. Etymologically, it can be translated as "the set of rules that define how morphemes (morpho) can touch (tactics) each other".
Example of a morphotactic rules[edit] (in English) Plural ^s follows Noun^z cannot follow Noun [meaningless - see talk page] Common morphotactic model[edit] Finite-state machine and Graph[disambiguation needed] are the two models which are often used as a [?] References[edit] Morphology and Computation, By Richard William Sproat.
Lexicon. Formally, in linguistics, a lexicon is a language's inventory of lexemes. The word "lexicon" derives from the Greek λεξικόν (lexicon), neuter of λεξικός (lexikos) meaning "of or for words".[1] Linguistic theories generally regard human languages as consisting of two parts: a lexicon, essentially a catalogue of a language's words (its wordstock); and a grammar, a system of rules which allow for the combination of those words into meaningful sentences. The lexicon is also thought to include bound morphemes, which cannot stand alone as words (such as most affixes).
In some analyses, compound words and certain classes of idiomatic expressions and other collocations are also considered to be part of the lexicon. Size and organization[edit] Lexicalization and other mechanisms in the lexicon[edit] The mechanisms, not mutually exclusive, are:[4] In complex words, constituents may be dropped. Besides word formation, there are also mechanisms of lexeme change: New words[edit] Loan words[edit] Stemming. Stemming programs are commonly referred to as stemming algorithms or stemmers. Examples[edit] A stemmer for English, for example, should identify the string "cats" (and possibly "catlike", "catty" etc.) as based on the root "cat", and "stemmer", "stemming", "stemmed" as based on "stem".
A stemming algorithm reduces the words "fishing", "fished", and "fisher" to the root word, "fish". On the other hand, "argue", "argued", "argues", "arguing", and "argus" reduce to the stem "argu" (illustrating the case where the stem is not itself a word or root) but "argument" and "arguments" reduce to the stem "argument". History[edit] The first published stemmer was written by Julie Beth Lovins in 1968.[1] This paper was remarkable for its early date and had great influence on later work in this area. A later stemmer was written by Martin Porter and was published in the July 1980 issue of the journal Program. Algorithms[edit] Lookup algorithms[edit] The production technique[edit] Stochastic algorithms[edit]
Porter Stemming Algorithm. Porter Stemmer description. Morpheme. Classification of morphemes[edit] Free vs. bound[edit] Every morpheme can be classified as either free or bound.[3] These categories are mutually exclusive, and as such, a given morpheme will belong to exactly one of them. Bound morphemes can be further classified as derivational or inflectional. Allomorphs[edit] Content vs. function[edit] Content morphemes express a concrete meaning or content, while function morphemes have more of a grammatical role. Content morphemes include free morphemes that are nouns, adverbs, adjective, and verbs.
Additional notes[edit] First, roots are composed of only one morpheme while stems can be composed of more than one morpheme. A final factor to keep in consideration is to not be confused by monomorphemic words, which contain only one morpheme. Morphological analysis[edit] In natural language processing for Korean, Japanese, Chinese and other languages, morphological analysis is the process of segmenting a sentence into a row of morphemes. See also[edit] Combining Morphemes. Clitic. Clitics can belong to any grammatical category, although they are commonly pronouns, determiners, or adpositions. Note that orthography is not always a good guide for distinguishing clitics from affixes: clitics may be written as separate words, but sometimes they are joined to the word on which they depend (like the Latin clitic que, meaning "and"), or separated by special characters such as hyphens or apostrophes (like the English clitic ’s).
The word "clitic" is often used loosely for what may be better described as an affix or word. [citation needed] Classification[edit] Clitics fall into various categories depending on their position in relation to the word to which they are connected.[2] Proclitic[edit] A proclitic appears before its host.[2] It is common in Romance languages. Enclitic[edit] An enclitic appears after its host.[2] Latin: Senatus Populusque Romanus "Senate people-and Roman" = "The Senate and people of Rome" Ancient Greek: ánthrōpoí (te) theoí te Mesoclitic[edit] Prosody[edit]
Inflection. Inflection of the Portuguese or Spanish lexeme for "cat", which produces the forms gato, gata, gatos and gatas. Blue represents masculine gender, pink represents feminine gender, grey represents the form used for mixed gender; green represents plural number, while singular number is unmarked. In grammar, inflection or inflexion is the modification of a word to express different grammatical categories such as tense, mood, voice, aspect, person, number, gender and case.
The inflection of verbs is also called conjugation, and the inflection of nouns, adjectives and pronouns is also called declension. An inflection expresses one or more grammatical categories with a prefix, suffix or infix, or another internal modification such as a vowel change.[1] For example, the Latin verb ducam, meaning "I will lead", includes the suffix -am, expressing person (first), number (singular), and tense (future).
The use of this suffix is an inflection. Examples in English[edit] Inflectional paradigm[edit] Compound (linguistics) Compound formation rules vary widely across language types. In a synthetic language, the relationship between the elements of a compound may be marked with a case or other morpheme. For example, the German compound Kapitänspatent consists of the lexemes Kapitän (sea captain) and Patent (license) joined by an -s- (originally a genitive case suffix); and similarly, the Latin lexeme paterfamilias contains the archaic genitive form familias of the lexeme familia (family). Conversely, in the Hebrew language compound, the word בֵּית סֵפֶר bet sefer (school), it is the head that is modified: the compound literally means "house-of book", with בַּיִת bayit (house) having entered the construct state to become בֵּית bet (house-of).
This latter pattern is common throughout the Semitic languages, though in some it is combined with an explicit genitive case, so that both parts of the compound are marked. Agglutinative languages tend to create very long words with derivational morphemes. Dutch: Derivation (linguistics) In linguistics, derivation is the process of forming a new word on the basis of an existing word, e.g. happiness and unhappy from happy, or determination from determine. It often involves the addition of a morpheme in the form of an affix, such as -ness, un- and -ation in the preceding examples.
Derivation stands in contrast to the process of inflection, which means the formation of grammatical variants of the same word, as with determine/determines/determining/determined.[1] Examples of English derivational patterns and their suffixes: Derivation that results in a noun may be called nominalization. Derivation can be contrasted with inflection, in that derivation produces a new word (a distinct lexeme), whereas inflection produces grammatical variants of the same word. Derivation can be contrasted with other types of word formation such as compounding.
Jump up ^ Crystal, David (1999): The Penguin Dictionary of Language. - Penguin Books - England.Jump up ^ Sobin, Nicholas (2011). Affix. Positional categories of affixes[edit] Affixes are divided into plenty of categories, depending on their position with reference to the stem. Prefix and suffix are extremely common terms. Infix and circumfix are less so, as they are not important in European languages.
The other terms are uncommon. Prefix and suffix may be subsumed under the term adfix in contrast to infix. When marking text for interlinear glossing, as in the third column in the chart above, simple affixes such as prefixes and suffixes are separated from the stem with hyphens. Lexical affixes[edit] Lexical affixes (or semantic affixes) are bound elements that appear as affixes, but function as incorporated nouns within verbs and as elements of compound nouns.
Lexical affixes are relatively rare. The lexical suffixes of these languages often show little to no resemblance to free nouns with similar meanings. Lexical suffixes when compared with free nouns often have a more generic or general meaning. See also[edit] Word stem. In linguistics, a stem is a part of a word. The term is used with slightly different meanings. In a slightly different usage, which is adopted in the remainder of this article, a word has a single stem, namely the part of the word that is common to all its inflected variants.[2] Thus, in this usage, all derivational affixes are part of the stem. For example, the stem of friendships is friendship, to which the inflectional suffix -s is attached. Citation forms and bound morphemes[edit] In languages with very little inflection, such as English and Chinese, the stem is usually not distinct from the "normal" form of the word (the lemma, citation or dictionary form). However, in other languages, stems may rarely or never occur on their own.
In computational linguistics, a stem is the part of the word that never changes even when morphologically inflected, whilst a lemma is the base form of the word. Paradigms and suppletion[edit] tall (positive); taller (comparative); tallest (superlative) Part-of-speech tagging. Once performed by hand, POS tagging is now done in the context of computational linguistics, using algorithms which associate discrete terms, as well as hidden parts of speech, in accordance with a set of descriptive tags. POS-tagging algorithms fall into two distinctive groups: rule-based and stochastic.
E. Brill's tagger, one of the first and most widely used English POS-taggers, employs rule-based algorithms. Principle[edit] Part-of-speech tagging is harder than just having a list of words and their parts of speech, because some words can represent more than one part of speech at different times, and because some parts of speech are complex or unspoken. This is not rare—in natural languages (as opposed to many artificial languages), a large percentage of word-forms are ambiguous.
For example, even "dogs", which is usually thought of as just a plural noun, can also be a verb: The sailor dogs the hatch. History[edit] The Brown Corpus[edit] Use of Hidden Markov Models[edit] Issues[edit] Finite state transducer. A finite state transducer (FST) is a finite state machine with two tapes: an input tape and an output tape. This contrasts with an ordinary finite state automaton (or finite state acceptor), which has a single tape. Overview[edit] An automaton can be said to recognize a string if we view the content of its tape as input.
In other words, the automaton computes a function that maps strings into the set {0,1}. Alternatively, we can say that an automaton generates strings, which means viewing its tape as an output tape. On this view, the automaton generates a formal language, which is a set of strings. Each string-to-string finite state transducer defines a relation R on Σ×Γ.
Formal construction[edit] Formally, a finite transducer T is a 6-tuple (Q, Σ, Γ, I, F, δ) such that: We can view (Q, δ) as a labeled directed graph, known as the transition graph of T: the set of vertices is Q, and means that there is a labeled edge going from vertex q to vertex r. Define the extended transition relation .