From the Semantic Web to the Web of Data: ten years of linking up
dcow/jPearltrees README.md jPearltrees Java Library Author: David Cowden Date: 03 July, 2012 Purpose jPearltrees is a Java library that provides an interface for handling the RDF/XML data exported from a Pearltrees account.
jPearltrees/src/me/dcow/pearltrees at master · dcow/jPearltrees
Google dataset linking strings and concepts Tim Finin, 11:02am 19 May 2012 Yesterday Google announced a very interesting resource with 175M short, unique text strings that were used to refer to one of 7.6M Wikipedia articles. This should be very useful for research on information extraction from text. “We consider each individual Wikipedia article as representing a concept (an entity or an idea), identified by its URL. Text strings that refer to concepts were collected using the publicly available hypertext of anchors (the text you click on in a web link) that point to each Wikipedia page, thus drawing on the vast link structure of the web.
Index of /pubs/crosswikis-data.tar.bz2
Discovery Hub | Beta
Content extraction with apache tika
Apache Tika - a content analysis toolkit The Apache Tika™ toolkit detects and extracts metadata and structured text content from various documents using existing parser libraries. You can find the latest release on the download page.
Semantic tools for big data
Thésaurus -> Ontology
Robust Hyperlinks Proceedings of Digital Documents and Electronic Publishing (DDEP00), Munich, Germany, 13-15 September 2000 In Springer-Verlag Lecture Notes in Computer Science. Copyright © 2000 Springer-Verlag Thomas A.
About ConceptNet ConceptNet is a semantic network containing lots of things computers should know about the world, especially when understanding text written by people. It is built from nodes representing concepts, in the form of words or short phrases of natural language, and labeled relationships between them. These are the kinds of things computers need to know to search for information better, answer questions, and understand people's goals.
UIMA - Standard for unstructured information UIMA is a component software architecture for the development, discovery, composition, and deployment of multi-modal analytics for the analysis of unstructured information and its integration with search technologies developed by IBM. The source code for a reference implementation of this framework has been made available on SourceForge, and later on the website of the Apache Software Foundation. Another use of UIMA is in systems that are used in medical contexts to analyze clinical notes, such as the Clinical Text Analysis and Knowledge Extraction System (CTAKES).
"Watson" est un superordinateur de la firme IBM. Il associe la puissance matérielle (quantitative) à la puissance logicielle (qualitative). Au plan matériel, Watson dispose d'un système d'exploitation GNU-Linux, composé de 10 racks contenant chacun 9 serveurs Power 750 montés en réseau. Chaque serveur possède 32 coeurs qui peuvent gérer un total de 128 tâches en parallèle. "Watson", qui compte donc au total 2 880 coeurs pouvant effectuer 11 520 tâches en parallèle, possède une mémoire vive de 15 000 Go (gigaoctets) et une puissance totale de 80 Tflop (téraflops). Supercalculateur sémantique