background preloader

Natural language processing

Natural language processing
Natural language processing (NLP) is a field of computer science, artificial intelligence, and linguistics concerned with the interactions between computers and human (natural) languages. As such, NLP is related to the area of human–computer interaction. Many challenges in NLP involve natural language understanding, that is, enabling computers to derive meaning from human or natural language input, and others involve natural language generation. History[edit] The history of NLP generally starts in the 1950s, although work can be found from earlier periods. In 1950, Alan Turing published an article titled "Computing Machinery and Intelligence" which proposed what is now called the Turing test as a criterion of intelligence. The Georgetown experiment in 1954 involved fully automatic translation of more than sixty Russian sentences into English. Up to the 1980s, most NLP systems were based on complex sets of hand-written rules. NLP using machine learning[edit] Major tasks in NLP[edit] Parsing Related:  Natural Language ProcessingWiki

Coreference resolution HTML5 in mobile devices In mobile devices, HTML5 is often used for mobile websites and mobile applications on Mobile operating systems such as Firefox OS, Tizen, and Ubuntu Touch. It provides developers with tools such as Offline Web Storage, GeoLocation API, Canvas Drawing, CSS3, and many more. Key features for mobile devices[edit] Offline support[edit] The AppCache and database make it possible for mobile developers to store things locally on the device and interruptions in connectivity will not affect the ability for someone to get their work done.[2] Offline support helps browsers cache static pages. To provide offline support, a cache manifest file should be created to specify the offline application's resources—i.e. its pages, images, and other files needed to run offline. CACHE MANIFEST # Version 0.1 offline.html /iui/iui.js /iui/iui.css /iui/loading.gif /iui/toolbar.png /iui/whiteButton.png /images/gymnastics.jpg /images/soccer.png /images/gym.jpg /images/soccer.jpg Canvas drawing[edit] Advanced forms[edit]

Machine translation Machine translation, sometimes referred to by the abbreviation MT (not to be confused with computer-aided translation, machine-aided human translation (MAHT) or interactive translation) is a sub-field of computational linguistics that investigates the use of software to translate text or speech from one natural language to another. On a basic level, MT performs simple substitution of words in one natural language for words in another, but that alone usually cannot produce a good translation of a text because recognition of whole phrases and their closest counterparts in the target language is needed. Solving this problem with corpus and statistical techniques is a rapidly growing field that is leading to better translations, handling differences in linguistic typology, translation of idioms, and the isolation of anomalies.[1] The progress and potential of machine translation have been debated much through its history. History[edit] Translation process[edit] Approaches[edit] Rule-based[edit]

Middleware Middleware is computer software that provides services to software applications beyond those available from the operating system. It can be described as "software glue".[1] Middleware makes it easier for software developers to perform communication and input/output, so they can focus on the specific purpose of their application. Middleware in distributed applications[edit] Software architecture: Middleware The term is most commonly used for software that enables communication and management of data in distributed applications. In this more specific sense middleware can be described as the dash in client-server, or the -to- in peer-to-peer. ObjectWeb defines middleware as: "The software layer that lies between the operating system and applications on each side of a distributed computing system in a network Other examples of middleware[edit] The term middleware is used in other contexts as well. Boundaries[edit] Origins[edit] Middleware is a relatively new addition to the computing landscape.

Morphology (linguistics) The discipline that deals specifically with the sound changes occurring within morphemes is morphophonology. The history of morphological analysis dates back to the ancient Indian linguist Pāṇini, who formulated the 3,959 rules of Sanskrit morphology in the text Aṣṭādhyāyī by using a constituency grammar. The Greco-Roman grammatical tradition also engaged in morphological analysis. Studies in Arabic morphology, conducted by Marāḥ al-arwāḥ and Aḥmad b. ‘alī Mas‘ūd, date back to at least 1200 CE.[1] The term "morphology" was coined by August Schleicher in 1859.[2] Here are examples from other languages of the failure of a single phonological word to coincide with a single morphological word form. kwixʔid-i-da bəgwanəmai-χ-a q'asa-s-isi t'alwagwayu Morpheme by morpheme translation: kwixʔid-i-da = clubbed-PIVOT-DETERMINER bəgwanəma-χ-a = man-ACCUSATIVE-DETERMINER q'asa-s-is = otter-INSTRUMENTAL-3SG-POSSESSIVE t'alwagwayu = club. "the man clubbed the otter with his club." (Notation notes:

Comparison of application servers Application servers are system software upon which web applications or desktop applications run. Application Servers consist of web server connectors, computer programming languages, runtime libraries, database connectors, and the administration code needed to deploy, configure, manage, and connect these components on a web host. An application server runs behind a web Server (e.g. Apache or Microsoft IIS) and (almost always) in front of an SQL database (e.g. PostgreSQL, MySQL or Oracle). There are many application servers and the choice impacts the cost, performance, reliability, scalability, and maintainability of a web application. Proprietary application servers provide system services in a well-defined but proprietary manner. An opposite but analogous case is the Java EE platform discussed below. Java EE application servers provide system services in a well-defined, open, industry standard. BASIC[edit] C++[edit] Erlang[edit] Haskell[edit] Java[edit] JavaScript[edit] .NET[edit] Python[edit]

Natural language generation Natural Language Generation (NLG) is the natural language processing task of generating natural language from a machine representation system such as a knowledge base or a logical form. Psycholinguists prefer the term language production when such formal representations are interpreted as models for mental representations. It could be said an NLG system is like a translator that converts a computer based representation into a natural language representation. However, the methods to produce the final language are different from those of a compiler due to the inherent expressivity of natural languages. NLG may be viewed as the opposite of natural language understanding: whereas in natural language understanding the system needs to disambiguate the input sentence to produce the machine representation language, in NLG the system needs to make decisions about how to put a concept into words. Simple examples are systems that generate form letters. Example[edit] Stages[edit] Applications[edit]

Strategic business unit In business, a strategic business unit (SBU) is a profit center which focuses on product offering and market segment. SBUs typically have a discrete marketing plan, analysis of competition, and marketing campaign, even though they may be part of a larger business entity. Commonalities[edit] A SBU is generally defined by what it has in common, as well as the traditional aspects defined by McKinsey: separate competitors; and a profitability bottom line. Four commonalities include:[citation needed] Revenue SBULike Marketing Cost SBULike Operations/HR Profit SBULike sales judged on net sales not gross Success factors[edit] There are three factors that are generally seen as determining the success of an SBU:[citation needed] the degree of autonomy given to each SBU manager,the degree to which an SBU shares functional programs and facilities with other SBUs, andthe manner in which the corporation is because of new changes in market. BCG matrix[edit] References[edit]

Optical character recognition Optical Character Recognition, usually abbreviated to OCR, is the mechanical or electronic conversion of scanned or photographed images of typewritten or printed text into machine-encoded/computer-readable text. It is widely used as a form of data entry from some sort of original paper data source, whether passport documents, invoices, bank statement, receipts, business card, mail, or any number of printed records. It is a common method of digitizing printed texts so that they can be electronically edited, searched, stored more compactly, displayed on-line, and used in machine processes such as machine translation, text-to-speech, key data extraction and text mining. Early versions needed to be programmed with images of each character, and worked on one font at a time. History[edit] Blind and visually impaired users[edit] Applications[edit] It can be used for: Types[edit] Techniques[edit] Pre-processing[edit] Character recognition[edit] Post-processing[edit] In recent years,[when?]

Top-down and bottom-up design Top-down and bottom-up are both strategies of information processing and knowledge ordering, used in a variety of fields including software, humanistic and scientific theories (see systemics), and management and organization. In practice, they can be seen as a style of thinking and teaching. A top-down approach (also known as stepwise design and in some cases used as a synonym of decomposition) is essentially the breaking down of a system to gain insight into its compositional sub-systems. In a top-down approach an overview of the system is formulated, specifying but not detailing any first-level subsystems. A bottom-up approach is the piecing together of systems to give rise to more complex systems, thus making the original systems sub-systems of the emergent system. Product design and development[edit] During the design and development of new products, designers and engineers rely on both a bottom-up and top-down approach. Computer science[edit] Software development[edit] Parsing[edit]

Stemming Stemming programs are commonly referred to as stemming algorithms or stemmers. Examples[edit] A stemmer for English, for example, should identify the string "cats" (and possibly "catlike", "catty" etc.) as based on the root "cat", and "stemmer", "stemming", "stemmed" as based on "stem". A stemming algorithm reduces the words "fishing", "fished", and "fisher" to the root word, "fish". On the other hand, "argue", "argued", "argues", "arguing", and "argus" reduce to the stem "argu" (illustrating the case where the stem is not itself a word or root) but "argument" and "arguments" reduce to the stem "argument". History[edit] The first published stemmer was written by Julie Beth Lovins in 1968.[1] This paper was remarkable for its early date and had great influence on later work in this area. A later stemmer was written by Martin Porter and was published in the July 1980 issue of the journal Program. Algorithms[edit] Lookup algorithms[edit] The production technique[edit] Stochastic algorithms[edit]