background preloader

NLP

Facebook Twitter

LingPipe 的类似软件-共8个_fantastic. LingPipe是一个自然语言处理的Java开源工具包。

LingPipe 的类似软件-共8个_fantastic

LingPipe目前已有很丰富的功能,包括主题分类(Top Classification)、命名实体识别(Named Entity Recognition)、词性标注(Part-of Speech Tagging)、句题检测(Sentence Detection)、查询拼写检查(Query Spell Checking)、兴趣短语检测(Interseting Phrase Detection)、聚类(Clustering)、字符语言建模(Character Language Modeling)、医学文献下载/解析/索引(MEDLINE Download, Parsing and Indexing)、数据库文本挖掘(Database Text Mining)、中文分词(Chinese Word Segmentation)、情感分析(Sentiment Analysis)、语言辨别(Language Identification)等API。 LingPipe 的类似软件,共8个 中文自然语言处理工具包 FudanNLPFudanNLP主要是为中文自然语言处理而开发的工具包,也包含为实现这些任务的机器学习算法和数据集。

演示地址: FudanNLP目前实现的内容如下: 中文处理工具 中文分词 词性标注 ...更多FudanNLP信息自然语言处理工具 OpenNLPOpenNLP 是一个机器学习工具包,用于处理自然语言文本。 支持大多数常用的 NLP 任务,例如:标识化、句子切分、部分词性标注、名称抽取、组块、解析等。 参考: 加载中,请稍候...... Natural Language Processing and Text Analytics. LingPipe学习: Spelling Correction(1) - fancyerII的专栏. LingPipe是一个很好的NLP的工具,是由Alias-i公司开发的一个NLP系统,里面有很多NLP常用的东西:比如ME,CRFs,LDA,SVMs等常用模型;并且可以用它们来做很多NLP的事情:分词,词性标注,情感分析,文本分类等等。

LingPipe学习: Spelling Correction(1) - fancyerII的专栏

更为难得的是这个工具文档详尽(可能开发者确实想把它做成一个商用的系统,而不像很多科研人员提供的系统,只是用来演示论文)。 所以借这个工具,自己用来复习一下以前学习过的的东西,当然有时间的话也会尝试其它的NLP相关的工具,比如Mallet,Stanford NLP Group的一些工具。 这个blog要讨论的是Spell Correction这个问题。 Spell Correction也叫Spell Check,与之相关的任务还包括Spell Suggestion。 我们用google等常见搜索引擎的例子来了解这两个任务。 Spell Suggestion:就是用户在输入的时候提示可能的单词或者短语。 Spell Correction:用户输入可能存在错误,比如用这样的query搜索结果很少,从而推测用户输入可能有错。 比如在google搜索“brec baldwin”,google提示Did you mean “breck baldwin” 当然除了搜索引擎,Spell Correction也有其它的用途,比如word这样的文本编辑器,也会提供纠错功能。 这两个任务有些类似,不过Spell Suggestion实时性更强,它在用户输入的同时就提示可能的结果,这样可以预防用户的拼写错误,而Spell Correction就是“事后诸葛亮”了,它一般通过query log或者搜索结果数来判断用户是否可能有拼写错误。 注意百度,google中文,google英文的细微区别: 我们输入anazon(应该没有这个词,正确的应该是amazon) 百度的suggestion其实是前缀的匹配,没法纠错 而google的就“智能”一些了: 而对于中文的拼音,百度和google都类似,不过google中文的提示是汉字,而google英文的提示是拼音 到这,大家应该对这个问题的定义和一些搜索引擎的实现有了一些体验了。 把lingpipe-3.9.3.tar.gz解压好就行了。 |--- src 这里面是源代码,它的编译不依赖任何东西,(当然测试代码要依赖于JUnit4啦) 面看更方便 1.

Ontology (information science) In computer science and information science, an ontology formally represents knowledge as a hierarchy of concepts within a domain, using a shared vocabulary to denote the types, properties and interrelationships of those concepts.[1][2] Ontologies are the structural frameworks for organizing information and are used in artificial intelligence, the Semantic Web, systems engineering, software engineering, biomedical informatics, library science, enterprise bookmarking, and information architecture as a form of knowledge representation about the world or some part of it. The creation of domain ontologies is also fundamental to the definition and use of an enterprise architecture framework. The term ontology has its origin in philosophy and has been applied in many different ways. The word element onto- comes from the Greek ὤν, ὄντος, ("being", "that which is"), present participle of the verb εἰμί ("be").

According to Gruber (1993): Common components of ontologies include: What is an Ontology? This definition was originally proposed in 1992 and posted as shown below.

What is an Ontology?

See an updated definition of ontology (computer science) that accounts for the literature before and after that posting, with links to further readings. Short answer: An ontology is a specification of a conceptualization. The word "ontology" seems to generate a lot of controversy in discussions about AI.

It has a long history in philosophy, in which it refers to the subject of existence. In the context of knowledge sharing, I use the term ontology to mean a specification of a conceptualization. What is important is what an ontology is for. This definition is given in the article: T. A more detailed description is given in T. With an excerpt attached. A body of formally represented knowledge is based on a conceptualization: the objects, concepts, and other entities that are assumed to exist in some area of interest and the relationships that hold among them (Genesereth & Nilsson, 1987) .

Notes. A Translation Approach to Portable Ontology Specifications. Search, don't integrate. Ontology.