background preloader

Natural language processing

Natural language processing
Natural language processing (NLP) is a field of computer science, artificial intelligence, and linguistics concerned with the interactions between computers and human (natural) languages. As such, NLP is related to the area of human–computer interaction. Many challenges in NLP involve natural language understanding, that is, enabling computers to derive meaning from human or natural language input, and others involve natural language generation. History[edit] The history of NLP generally starts in the 1950s, although work can be found from earlier periods. In 1950, Alan Turing published an article titled "Computing Machinery and Intelligence" which proposed what is now called the Turing test as a criterion of intelligence. The Georgetown experiment in 1954 involved fully automatic translation of more than sixty Russian sentences into English. Up to the 1980s, most NLP systems were based on complex sets of hand-written rules. NLP using machine learning[edit] Major tasks in NLP[edit] Parsing

Cognitive model A cognitive model is an approximation to animal cognitive processes (predominantly human) for the purposes of comprehension and prediction. Cognitive models can be developed within or without a cognitive architecture, though the two are not always easily distinguishable. History[edit] Cognitive modeling historically developed within cognitive psychology/cognitive science (including human factors), and has received contributions from the fields of machine learning and artificial intelligence to name a few. Box-and-arrow models[edit] A number of key terms are used to describe the processes involved in the perception, storage, and production of speech. Computational models[edit] A computational model is a mathematical model in computational science that requires extensive computational resources to study the behavior of a complex system by computer simulation. Symbolic[edit] . expressed in characters, usually nonnumeric, that require translation before they can be used Subsymbolic[edit]

Web sémantique Logo du W3C pour le Web sémantique Le Web sémantique, ou toile sémantique[1], est une extension du Web standardisée par le World Wide Web Consortium (W3C)[2]. Ces standards encouragent l'utilisation de formats de données et de protocoles d'échange normés sur le Web, en s'appuyant sur le modèle Resource Description Framework (RDF). Le web sémantique est par certains qualifié de web 3.0 . Alors que ses détracteurs ont mis en doute sa faisabilité, ses promoteurs font valoir que les applications réalisées par les chercheurs dans l'industrie, la biologie et les sciences humaines ont déjà prouvé la validité de ce nouveau concept[5]. Histoire[modifier | modifier le code] Tim Berners-Lee à l'origine exprimait la vision du Web sémantique comme suit : I have a dream for the Web [in which computers] become capable of analyzing all the data on the Web — the content, links, and transactions between people and computers. — Tim Berners-Lee, Weaving the Web[13] — Weaving the Web[13]

ECML/PKDD'02 Tutorial on Text Mining and Internet Content filtering José María Gómez Hidalgo Departamento de Inteligencia Artificial Universidad Europea de Madrid In the recent years, we have witnessed an impressive growth of the availability of information in electronic format, mostly in the form of text, due to the Internet and the increasing number and size of digital and corporate libraries. TM is an emerging research and development field that address the information overload problem borrowing techniques from data mining, machine learning, information retrieval, natural-language understanding, case-based reasoning, statistics, and knowledge management to help people gain rapid insight into large quantities of semi-structured or unstructured text. A prototypical application of TM techniques is Internet information filtering. Outline The goal of this tutorial is making the audience familiar to the emerging area of Text Mining, in a practical way. The tutorial is divided into two main parts. In particular, the tutorial will cover the following topics: 1.

Stochastic process Stock market fluctuations have been modeled by stochastic processes. In probability theory, a stochastic process /stoʊˈkæstɪk/, or sometimes random process (widely used) is a collection of random variables; this is often used to represent the evolution of some random value, or system, over time. This is the probabilistic counterpart to a deterministic process (or deterministic system). Instead of describing a process which can only evolve in one way (as in the case, for example, of solutions of an ordinary differential equation), in a stochastic or random process there is some indeterminacy: even if the initial condition (or starting point) is known, there are several (often infinitely many) directions in which the process may evolve. Formal definition and basic properties[edit] Definition[edit] Given a probability space and a measurable space , an S-valued stochastic process is a collection of S-valued random variables on , indexed by a totally ordered set T ("time"). where each . . . . . .

Bibliothèque OPDS - BookEden Créer un site gratuitement Close La taverne aux mille et une histoires Liste de référence des livres Comment aimeriez vous voir Book Eden fonctionner ? FantasticReading Moteur de recherche Disclaimer L'accès à cette page est réservé aux membres. Connexion au compte Créer un site gratuit avec e-monsite - Signaler un contenu illicite sur ce site montylingua :: a free, commonsense-enriched natural language understander Recent bugfixes Version 2.1 (6 Aug 2004) - includes new MontyNLGenerator component generates sentences and summaries Version 2.0.1 - fixes API bug in version 2.0 which prevents java api from being callable What is MontyLingua? [top] MontyLingua is a free*, commonsense-enriched, end-to-end natural language understander for English. Version 2.0 is substantially FASTER, MORE ACCURATE, and MORE RELIABLE than version 1.3.1. MontyLingua differs from other natural language processing tools because: MontyLingua performs the following tasks over text: MontyTokenizer - Tokenizes raw English text (sensitive to abbreviations), and resolve contractions, e.g. * free for non-commercial use. please see MontyLingua Version 2.0 License Terms of Use [top] Author: Hugo Liu <hugo@media.mit.edu> Project Page: < Documentation [top] New in version 2.0 (29 Jul 2004) Download MontyLingua [top] READ THIS if you are running ML on Mac OS X, or Unix William W. L.

Probability matching Probability matching is a suboptimal decision strategy in which predictions of class membership are proportional to the class base rates. Thus, if in the training set positive examples are observed 60% of the time, and negative examples are observed 40% of the time, then the observer using a probability-matching strategy will predict (for unlabeled examples) a class label of "positive" on 60% of instances, and a class label of "negative" on 40% of instances. The optimal Bayesian decision strategy (to maximize the number of correct predictions, see Duda, Hart & Stork (2001)) in such a case is to always predict "positive" (i.e., predict the majority category in the absence of other information), which has 60% chance of winning rather than matching which has 52% of winning (where p is the probability of positive realization, the result of matching would be , here ).

Des données libres et liées : export RDF des données Mots-clés : Données, Linked Data, RDF, Export, XML Les données d'Open Food Facts étaient déjà ouvertes et libres (en open data comme on dit), et maintenant elles sont aussi liées. Et oui, libres et liées à la fois ! Libres car la licence ouverte permet aux données d'être utilisées par tous et pour tout usage, et liées parce que les données sont maintenant reliées non seulement entre elles, mais aussi avec d'autres jeux de données, par l'intermédiaire de la base DBPedia. J'explique en français : il y maintenant un gros fichier qui contient les données d'Open Food Facts sur les produits, leurs ingrédients et leur composition nutritionnelle. Grâce à ce fichier, les données d'OFF font maintenant partie de ce qu'on appelle "le Web des Données". Bientôt les données d'Open Food Facts croisées avec plein d'autres jeux de données ? Les détails techniques : L'export RDF est ici : (en XML/RDF)

ConceptNet What is ConceptNet? [top] ConceptNet is a freely available commonsense knowledgebase and natural-language-processing toolkit which supports many practical textual-reasoning tasks over real-world documents right out-of-the-box (without additional statistical training) including topic-jisting (e.g. a news article containing the concepts, “gun,” “convenience store,” “demand money” and “make getaway” might suggest the topics “robbery” and “crime”), affect-sensing (e.g. this email is sad and angry), analogy-making (e.g. “scissors,” “razor,” “nail clipper,” and “sword” are perhaps like a “knife” because they are all “sharp,” and can be used to “cut something”), text summarization contextual expansion causal projection cold document classification and other context-oriented inferences The ConceptNet knowledgebase is a semantic network presently available in two versions: concise (200,000 assertions) and full (1.6 million assertions). Papers about ConceptNet [top]: Download ConceptNet [top] S.

ACT-R Most of the ACT-R basic assumptions are also inspired by the progress of cognitive neuroscience, and ACT-R can be seen and described as a way of specifying how the brain itself is organized in a way that enables individual processing modules to produce cognition. Inspiration[edit] What ACT-R looks like[edit] This means that any researcher may download the ACT-R code from the ACT-R website, load it into a Lisp distribution, and gain full access to the theory in the form of the ACT-R interpreter. Also, this enables researchers to specify models of human cognition in the form of a script in the ACT-R language. The language primitives and data-types are designed to reflect the theoretical assumptions about human cognition. Like a programming language, ACT-R is a framework: for different tasks (e.g., Tower of Hanoi, memory for text or for list of words, language comprehension, communication, aircraft controlling), researchers create "models" (i.e., programs) in ACT-R. Brief outline[edit]

Le modèle DISC 27 janvier 2006 5 27 /01 /janvier /2006 00:00 Comment peut-on passer un arrangement avec des individus pour travailler dans les meilleures conditions possibles, même et surtout si l'on est différents? William Marston pensait que l'être humain se comporte selon deux axes, selon qu'il a tendance à être plutôt "actif" ou "passif", et selon qu'il perçoit un environnement humain ou factuel comme hostile ou favorable. Derrière chacun de ces styles, affirme la théorie, se cachent des besoins fondamentaux et un système de valeur différents. Le dominant éprouve le besoin de prendre des décisions et d'atteindre ses objectifs. ses principaux moteurs sont la performance et la responsabilité. A l'opposé, le stable, pour sa part, veut surtout être apprécié et accepté. c'est le travail d'équipe et le dialogue qui le font bouger. L'influent ressent la nécessité impérieuse d'être reconnu et félicité. La grille de lecture du modèle DISC Pour utiliser le modèle DISC avec le modèle du management situationnel

Related: