background preloader



Related:  Encyclopédies

Universalis dépose le bilan : la fin d’un accident de l’histoire ? Le stand Encyclopædia Universalis au Salon de l’éducation, à Paris, en 2004 (SIMON ISABELLE/SIPA) La société Encyclopædia Universalis est en dépôt de bilan, annonce Le Monde. Le tribunal administratif de Nanterre l’a placée en redressement judiciaire le 30 octobre dernier pour une période de six mois. Une nouvelle qui pincera le cœur des privilégiés qui possèdent une encyclopédie (pour ma part, elle constitua un cadeau de naissance et m’apparut toute mon enfance comme la preuve objective que le savoir était intimidant), et laissera dans l’indifférence ceux qui pensaient qu’elle était déjà morte depuis longtemps. L’Encyclopædia Universalis n’est pas morte, mais ça sent le roussi. Le journal du soir analyse la situation sur le plan économique : les investissements répétés de son propriétaire Jacqui Safra, via la maison mère Britannica ; les tentatives de nouveau modèle (encyclopédie en ligne sur abonnement, commercialisation des fonds aux institutions scolaires), etc.

API:Query The action=query module allows you to get most of the data stored in a wiki, including tokens for editing. The query module has many submodules (called query modules), each with a different function. There are three types of query modules: The unofficial homepage of Tim Dwyer I have a new position: Senior Lecturer and Larkins Fellow at Monash University, Australia. Dissertations Tim Dwyer (2005): "Two and a Half Dimensional Visualisation of Relational Networks", PhD Thesis, The University of Sydney. (23MB pdf) Tim Dwyer (2001): "Three Dimensional UML using Force Directed Layout", Honours Thesis, The University of Melbourne (TR download) Technical Reports T.

Hook into Wikipedia using Java and the MediaWiki API The Mediawiki API makes it possible for web developers to access, search and integrate all Wikipedia content into their applications. Given that Wikipedia is the ultimate online encyclopedia, there are dozens of use cases in which this might be useful. I used to post a lot of articles about using the webservice APIS of third party sites on this blog. Database download Wikipedia offers free copies of all available content to interested users. These databases can be used for mirroring, personal use, informal backups, offline use or database queries (such as for Wikipedia:Maintenance). All text content is multi-licensed under the Creative Commons Attribution-ShareAlike 3.0 License (CC-BY-SA) and the GNU Free Documentation License (GFDL). Images and other files are available under different terms, as detailed on their description pages.

Ways to process and use Wikipedia dumps – Prashanth Ellina Wikipedia is a superb resource for reference (taken with a pinch of salt of course). I spend hours at a time spidering through its pages and always come away amazed at how much information it hosts. In my opinion this ranks amongst the defining milestones of mankind’s advancement. Apart from being available through the data is provided for download so that you can create a mirror locally for quicker access. DataMachine - jwpl - Documentation of the JWPL DataMachine - Java-based Wikipedia Library Back to overview page. Learn about the different ways to get JWPL and choose the one that is right for you! (You might want to get fatjars with built-in dependencies instead of the download package on Google Code) Download the Wikipedia data from the Wikimedia Download Site You need 3 files: [LANGCODE]wiki-[DATE]-pages-articles.xml.bz2 OR [LANGCODE]wiki-[DATE]-pages-meta-current.xml.bz2 [LANGCODE]wiki-[DATE]-pagelinks.sql.gz [LANGCODE]wiki-[DATE]-categorylinks.sql.gz Note: If you want to add discussion pages to the database, use [LANGCODE]wiki-[DATE]-pages-meta-current.xml.bz2, otherwise [LANGCODE]wiki-[DATE]-pages-articles.xml.bz2 suffices.

Web scraping Web scraping, web harvesting, or web data extraction is data scraping used for extracting data from websites.[1] Web scraping software may access the World Wide Web directly using the Hypertext Transfer Protocol, or through a web browser. While web scraping can be done manually by a software user, the term typically refers to automated processes implemented using a bot or web crawler. It is a form of copying, in which specific data is gathered and copied from the web, typically into a central local database or spreadsheet, for later retrieval or analysis. Getting Started with HtmlUnit Introduction The dependencies page lists all the jars that you will need to have in your classpath. The class com.gargoylesoftware.htmlunit.WebClient is the main starting point. This simulates a web browser and will be used to execute all of the tests. Most unit testing will be done within a framework like JUnit so all the examples here will assume that we are using that. In the first sample, we create the web client and have it load the homepage from the HtmlUnit website.

Wikipedia crawler wikicrawler purpose wikicrawler is designed to crawl wikipedia pages. It crawls pages in the specified languages and stores them in local directory. Download