background preloader

Semantic web

Facebook Twitter

Parsely (Parsely, Inc.) Phonetic tehtar for English. Introduction | Tengwar | Tehtar Tolkien's Elvish elvish writing system (Tengwar) can be used to represent a variety of languages including English. The standard set of Tengwar letters are repurposed to represent the sounds of each particular language. Mostly the sounds of a given letter are consistent between languages but since the phonemes used in each language vary, the character-sounds are of necessity somewhat specific to the language at hand. In the Tengwar writing system there are 24 standard consonant letters, each of which may be modified by one of the accents called tehtar (signs) which represent vowel sounds. The tehtar are markings written above (and sometimes below) one of the tengwar consonants. In English there are a number of common vowel sounds not represented directly by one of the five vowels of the Latin alphabet.

An obvious approach to writing in Elvish is to map English vowels directly to their tehtar counterparts. The phonetic mode Revisions. BMC Genomics | Full text | Automated extraction and semantic analysis of mutation impacts from the biomedical literature. In order to comprehensively extract mutation impacts, the detection of several named entities and their relations, in particular mutations and protein properties, is required. As an example, consider the following text segment (formatting used: bold face: Mutation; underlined: Impact expression; underlined non-italics: Protein property; underlined bold: Physical quantity) [17]: “Several single mutants (Q15K, Q15R, W37K, and W37R), double mutants (Q15K-W37K, Q15K-W37R, Q15R-W37K, and Q15R-W37R), and triple mutants (Q15K-D36A-W37R and Q15K-D36S-W37R) were prepared and expressed as glutathione S-transferase (GST) fusion proteins in Escherichia coli and purified by GSH-agarose affinity chromatography.

Mutant Q15K-W37R and mutant Q15R-W37R showed comparable activity for NAD and NADP with an increase in activity nearly 3fold over that of the wild type.“ In this example, we need to extract increase as an impact that is caused “comparably” by two mutation pairs, Q15K-W37R and Q15R-W37R. 1. 2. Open Mutation Miner (OMM) | semanticsoftware.info. Printer-friendly version Send by email PDF version 1. Overview Mutations as sources of evolution have long been the focus of attention in the biomedical literature. Accessing the mutational information and their impacts on protein properties facilitates research in various domains, such as enzymology and pharmacology. Several single mutants (Q15K, Q15R, W37K, and W37R), double mutants (Q15K-W37K, Q15K-W37R, Q15R-W37K, and Q15R-W37R), and triple mutants (Q15K-D36A-W37R and Q15K-D36S-W37R) were prepared and expressed as glutathione S-transferase (GST) fusion proteins in Escherichia coli and purified by GSH-agarose affinity chromatography.

Here, we want to automatically extract increase as an impact that is caused "comparably" by two mutation pairs (mutation series, comprised of two SNPs each), Q15K-W37R and Q15R-W37R. OMM Impacts Annotation Example 2. Open Mutation Miner (OMM) is a component-based system that integrates multiple sub-systems. 2.1. OMM Mutations: Mutation Series Detection 2.2. The OpenCCG Homepage. SolrUIMA. Solr3.1 Solr UIMA contrib enables enhancing of Solr documents using the Unstructured Information Management Architecture (UIMA). UIMA lets you define custom pipelines of Analysis Engines which incrementally add metadata to the document via annotations. SolrUIMA UpdateRequestProcessor The SolrUIMA UpdateRequestProcessor is a custom UpdateRequestProcessor that takes document(s) being indexed, sends them to a UIMA pipeline and then returns the document(s) enriched with the specified metadata.

Installation Go to dev/solr/contrib/uima and run 'ant clean dist' get the package apache-solr-uima-4.0-SNAPSHOT.jar together with the jars under the dev/solr/contrib/uima/lib directory and paste everything inside one of the lib directories of your Solr instance (defined inside the solrconfig.xml). Configuration All the SolrUIMA configuration is placed inside a <uimaConfig> element inside the solrconfig.xml. <uimaConfig><runtimeParameters><! See SOLR-2129 UIMA components used Using other UIMA components Solrcas. Automatic metadata assignment for enterprise taxonomy and content management. Competition. LingPipe's Competition On this page, we break our competition down into academic toolkits and industrial toolkits. We only consider software that is available for linguistic processing, not companies that rely on linguistic processing in an application but do not sell that technology.

How does LingPipe compare to the below offerings? A few key points to keep in mind as you browse the offerings: We are a Geek2Geek business. Nearly every sale we have ever made was started by a programmer with a problem to solve. Academic and Open Source Competition The following is a list of ongoing large-scale, multi-function natural language toolkits that are built and distributed by academics.

ABNER is a statistical named entity recognizer "using linear-chain conditional random fields (CRFs) with a variety of orthographic and contextual features. "BANNER is a named entity recognition system, primarily intended for biomedical text. FreeLing The Dragon Toolkit Ellogon Apache Lucene Mahout MaltParser Minor Third. Dk.brics.automaton - finite-state automata and regular expressions for Java. This Java package contains a DFA/NFA (finite-state automata) implementation with Unicode alphabet (UTF16) and support for the standard regular expression operations (concatenation, union, Kleene star) and a number of non-standard ones (intersection, complement, etc.)

In contrast to many other automaton/regexp packages, this package is fast, compact, and implements real, unrestricted regular operations. It uses a symbolic representation based on intervals of Unicode characters. The full source code and documentation is available under the BSD license. View the online javadoc API specifications. The central classes are Automaton and RegExp. Mats Lindh » Blog Archive » Modifying a Lucene Snowball Stemmer.

This post is written for advanced users. If you do not know what SVN (Subversion) is or if you’re not ready to get your hands dirty, there might be something more interesting to read on Wikipedia. As usual. This is an introduction to how to get a Lucene development environment running, a Solr environment and lastly, to create your own Snowball stemmer. Read on if that seems interesting. When indexing data in Lucene (a fulltext document search library) and Solr (which uses Lucene), you may provide a stemmer (a piece of code responsible for “normalizing” words to their common form (horses => horse, indexing => index, etc)) to give your users better and more relevant results when they search. By using Snowball Lucene is able to provide a nice collection of default stemmers for several languages, and these work as they should for most selections.

One: elektriker several: elektrikere those: elektrikerene Lets compare this to another word, such as “Bus”: So far everything has gone as planned. Understanding Language Resource Components. This topic is organized as follows: About Language Resources The following table lists the actions and corresponding results for the sentence "Figure 1 illustrates the role of language resources for Windows Search during the index creation process. " The following table lists the actions and corresponding results for the query "apples, oranges, and pears.

" The expanded query terms increase the likelihood that the query will find documents that match the intent of the original query. Text that the word breaker or stemmer generates at query time is not stored on disk. Word Breaking Stemming Some languages require that inflected terms be generated at both index time and query time for both standard and variant inflections.

Normalization Noise Words Noise words, also known as stop words, are words that are not significant indicators for content. Noise words act as placeholders in phrase queries. Related topics. Data Set « The Electronic Discovery Reference Model. | | | EDRM Forensic Files Testing Project | | The EDRM Data Set Project provides industry-standard, reference data sets of electronically stored information (ESI) and software files that can be used to test various aspects of e-discovery software and services. These files may contain viruses, as can be the case with any set of files collected during discovery. Appropriate caution should be used when handling the files. These files may contain personally identifiable information, in spite of efforts to remove that information. This initiative collects, evaluates, and publishes ESI data sets for use in testing e-discovery software and services. EDRM Enron Email v1 Data Set: An updated set of Enron e-mail messages and attachments.

EDRM File Format Data Set: 381 files covering 200 file formats. EDRM Internationalization Data Set: A snapshot of selected Ubuntu localization mailing list archives covering 23 languages in 724 MB of email. Ron Bekkerman. Ron Bekkerman joined the Department of Information and Knowledge Management as a Senior Lecturer (Assistant Professor) in October 2013. In 2012-2013 he served as Chief Data Officer of Carmel Ventures, the leading Israeli VC fund. He is an Advisory Board member of a number of Israeli startup companies. Prior to that, Ron worked as a Senior Research Scientist at LinkedIn, where he was among the founding members of LinkedIn’s Data Science team. Before LinkedIn, he was a Research Scientist at HP Labs in Palo Alto, CA. Over the past 14 years, Ron’s research spanned the areas of Data Mining and Machine Learning, which aim to create novel models and algorithms suitable for a variety of practical tasks in statistical data analysis.

Ron’s earlier research work was on text categorization. While working at LinkedIn, Ron was involved in connecting data analytics and business decision making. Research Projects Improving Clustering Stability and Accuracy using Deliberation (2009) Videos L. S. R. R. R. Andrew T. Fiore :: About me. Finding Hebrew lemmas (HebMorph, part 2) « Code972. As shown in the previous post, building a Hebrew-aware search engine is not trivial. Several attempts (mainly commercial) were made to deal with that. In this post I'm going to try and draw a complete picture of what they did, and show other routes that may exist. In the next post I'll discuss HebMorph itself. Kudos to Robert Muir for several fruitful discussions and ideas.

A Few Definitions In the IR world, Precision and Recall are numbers used to measure retrieval quality of a given search engine. What to Index? This is the most important question of all. The raw term - we have already ruled that out as a possibility. There may be different variation of these, which we'll discuss later in this post.

Hebrew NLP Methods The most common approach assumes that in order to provide relevant search results the correct lemma has to be indexed. Dictionary based - the word is being looked up in a list of words compiled by hand, from a corpus, or using expanding algorithms. NLP-based Text Retrieval. Welcome to Ko[Gloss] The project devises tests and documents a language teaching method. The method puts into practice the didactical concept of interactive teaching, with the goal of learning through teaching. This enables learners from higher educational establishments and further vocational training to analyse authentic text material using professional language software, to structure the results by working together and in response to requirements, and to make these results Accessible to other learners and vocational users.

Teachers guarantee the quality and sustainability of the results. Analysis of the texts focuses on verbal constructions such as 'to commission an investigation', 'to discuss a problem', on which the argumentative coherence of a text depends. The methodological objective of the project is independent of the target languages and of the specialist fields of the text material. Summarization with Lucene. You may have noticed that over the last couple of months, I haven't been writing too much about text mining. So I got sidetracked a bit - there's life beyond text mining, y'know :-). In any case, I am back on track to working through the remaining chapters of my TMAP book. This week, I describe a summarizer application built using Lucene. Summarization involves reading a body of text, and summarizing it in your own words, and if done algorithmically, requires a fair amount of AI code and domain knowledge (about the text being summarized).

Having seen this in action (as a consumer of the end product) at my previous job at CNET, and I can tell you that a lot of work goes into something of this sort. My goals are far less ambitious - I just want to find the "best" (most relevant) few sentences from a body of text and call it my summary. A similar approach is taken by the two open-source summarizer applications I looked at, namely Classifier4J (C4J) and Open Text Summarizer (OTS). Conclusion. Apache Whirr includes Mahout support - Blog - SearchWorkings.org. In a previous blog I showed you how to use Apache Whirr to launch a Hadoop cluster in order to run Mahout jobs. This blog shows you how to use the Mahout service from the brand new Whirr 0.7.0 release to automatically install Hadoop and the Mahout binary distribution on a cloud provider such as Amazon.

Introduction If you are new to Apache Whirr checkout my previous blog which covers Whirr 0.4.0. A lot has changed since then. How to use the Mahout service The Mahout service in Whirr defines the mahout-client role. Step 1 Create a node template Create a file called mahout-cluster.properties and add the following whirr.instance-templates=1 hadoop-jobtracker+hadoop-namenode+mahout-client,2 hadoop-datanode+hadoop-tasktracker whirr.provider=aws-ec2 whirr.identity=TOP_SECRET whirr.credential=TOP_SECRET This setup configures two Hadoop datanode / tasktrackers and one Hadoop namenode / jobtracker / mahout-client node.

. * Download the binary distribution from Apache and install it under /usr/local/mahout. Zaizi present Alfresco Semantic Solution at IKS Semantic Enterprise Technology Workshop. Zaizi recently participated in the 7th IKS workshop that took place in Salzburg, Austria on 12-13 June. We joined the IKS community to share our experience in the use of Semantic Technologies within the enterprise and how IKS technology stack can enhance the functionality of Enterprise Content Management systems and make it easier to interact with and find the right content. We saw great demonstrations from IKS early adopters showing how to build semantic real-world enterprise applications using Stanbol and VIE.

It was a very interesting conference where we had the chance to network, knowledge share and go back with a lot of new ideas for our own products and solutions. For us, it was a pleasure to present our Semantic Search Tool in Alfresco that also uses Stanbol as Semantic Services provider. We are currently developing a complete Semantic Search Tool in Alfresco. Our semantic search solution involves distinct search techniques using semantic data: Using VIE in Alfresco Share. Text REtrieval Conference (TREC) 2011 Proceedings. I-SEMANTICS. The Semantic Web and the Modern Enterprise. Quasar: Quality Assurance of Semantic Annotations. C Linked Data Platform Working Group Charter. Semantic Web Case Studies and Use Cases.

BabelNet. Weka - home. Data Mining and machine learning used to help you understand the business better and also improve future performance through predictive analytics. | Weka Project: Pentaho Data Integration. Weka 3 - Data Mining with Open Source Machine Learning Software in Java. Rada Mihalcea: Downloads. Babelmonkeys.

Rdf-neo4j

Academic Video Search. Dbpedia « Griff's Graphs. Silk - A Link Discovery Framework for the Web of Data. Prof. Dr. Christian Bizer - Research Group Data and Web Science. Running UIMA Engines in Stanbol using the same JVM. Getting Started with Apache Stanbol Enhancement Engine. Iswc2011.semanticweb.org/fileadmin/iswc/Papers/Industry/iswc2011it_submission_11.pdf. Fluid Operations | flexibility comes first! Lmf - Linked Media Framework. Semantic Interaction Framework - VIE.js. Interactive Knowledge Stack - Semantic CMS - Open Source | IKS - The Semantic CMS Community - Open Source. Stanbol - Welcome to Apache Stanbol!

Welcome to the Mulgara Project! Perception | science fiction. Semantic tools. Semantic technologies. XML Editor/Validator/Designer with CAMV | Free Development software downloads. Word Sense Tutorial. Semantische Modellierung. Guiding Principles for the Open Semantic Enterprise. Recommender Systeme – Winfwiki. University of Rochester Computer Science (URCS) WordNet Browser. Wnbrowser, A Graphical WordNet Browser. Introduction - mb-pde - Step by step explanation how to use PDE including Video tutorial. - Java Software Design Pattern Detection Engine. The Protégé Ontology Editor and Knowledge Acquisition System. The Software Ontology - homepage. Vivo. VIVO | connect - share - discover. VIVO: enabling the discovery of research and scholarship. The Voice of Semantic Web Business.

Semanticommunity.info. Thema: Open Data Blog.