background preloader

Web-Harvest Project Home Page

Web-Harvest Project Home Page
Related:  Innovations et prospective / Les projets

Web Data Harvesting: Web Scraping Software Web scraping software is an innovative tool that makes gathering lots of information relatively easy. The program has numerous implications for anyone who has the need to search for comparable information from various locations and put it into usable context. This method of finding extensive information in a short period of time is cost effective. Applications are used everyday for business, medicine, meteorology, government, and law enforcement. The software is user friendly and can be operated by anyone from non-tech data collectors to experienced Web designers. Programs are available for purchase in stores or online. A user enters the software and begins by programming an “agent”, this is the tool that will retrieve any and all information. Web scraping software provides customer information, marketing information, and competitor information. There have been legal ramifications as some have complained about intrusion and copyright infringement. Screen Scraper

IRobotSoft -- Visual Web Scraping and Web Automation Tool for FREE XELOPES - prudsys The prudsys XELOPES (eXtEnded Library fOr Prudsys Embedded Solutions) is a platform and data source independent business intelligence library which unites classical data mining methods and new real time analytics. The library can be used as standalone software, offering pre-fabricated solutions to fundamental analytics problems; furthermore, it can be integrated into other software products, emphasising its full performance capacity as an embedded analytical tool. Especially when it comes to new and complex problems, the numerous algorithms of the prudsys XELOPES, which can be combined in modules, allow for the development of adequate solutions. Data mining standards prudsys XELOPES supports essential BI standards. Stream access Since classical data mining processes must generally handle extremely large data matrices, the streaming concept for data access was implemented in the prudsys XELOPES. Analytical functions The prudsys XELOPES combines a number of classical data mining models.

Technologies méconnues pour la vie en autarcie Nombre de personnes souhaite avoir au cas où, de quoi s’en sortir un peu si quelque chose arrive, n’importe quoi: conflit majeur, grosse panne d’électricité, catastrophe naturelle, krach financier, n’importe quoi qui puisse changer notre manière de vivre de manière radicale, et en cette période d’instabilité sur la planète, toutes les options sont envisageables… Pour d’autres, le simple fait de vivre de manière plus simple et plus traditionnelle n’est pas une option, c’est devenu un but. Pourtant, qui dit sortir de ce système (volontairement ou non) dit rencontrer des difficultés, et de ce côté, chaque piste est intéressante et mérite d’être explorée. C’est pour cela que je vous relaie cette liste, peut-être certaines idées vous paraîtront intéressantes et exploitables, et peut-être même trouverez-vous des solutions auxquelles vous n’avez jamais pensé! Allez savoir… Autarcie énergétique, source de l’illustration: Retrouversonnord.be Sur le même thème dimanche 13 décembre 2015 Impensable?

GoodRelations - semanticweb.org.edu GoodRelations is a lightweight ontology for annotating offerings and other aspects of e-commerce on the Web. GoodRelations is the only OWL DL ontology officially supported by both Google and Yahoo. It provides a standard vocabulary for expressing things like that a particular Web site describes an offer to sell cellphones of a certain make and model at a certain price, that a pianohouse offers maintenance for pianos that weigh less than 150 kg, or that a car rental company leases out cars of a certain make and model from a particular set of branches across the country. Also, most if not all commercial and functional details of e-commerce scenarios can be expressed, e.g. eligible countries, payment and delivery options, quantity discounts, opening hours, etc. The GoodRelations ontology is available under the Creative Commons Attribution 3.0 license. <?

Open Source Data Mining Tools | Elastic Web Mining | Bixo Labs Below is a report on the open source data mining tools session at the ACM data mining unconference this past Sunday (01 Nov 2009). This only covers tools that the panelists had used, so it’s not a survey of the available tools. See Jeff Dalton’s blog post on Java Open Source NLP and Text Mining tools for an example of a more complete list of a closely related group of tools. Weka Paul O’Rorke talked about Weka, a collection of machine learning algorithms for data mining tasks. An attendee mentioned MOA. R Language David Smith talked about R. Attendee asked about comparing Matlab & R, with respect to viability in a production environment. Attendee said many people use R for prototyping and generating models, but production uses something else. Paul mentioned that R provides a very compact representation of data mining tasks. Nicolas Cebron talked about KNIME (pronounced “naim”), a modular data exploration platform. Attendee asked about long-term viability of KNIME. Mahout Hadoop Bixo

Stat eXplorer Interactive Statistical Visualization using Adobe Flash Statistics eXplorer integrates many common InfoVis and GeoVis methods required to make sense of statistical data, uncover patterns of interests, gain insight, tell-a-story and finally communicate knowledge. Statistics eXplorer was developed based on a component architecture and includes a wide range of visualization techniques enhanced with various interaction techniques and interactive features to support better data exploration and analysis. It also supports multiple linked views and integrated storytelling with a snapshot mechanism for capturing discoveries made during the exploratory data analysis process which can be used for sharing gained knowledge. The eXplorer applications are available on the NCVA/LiU web site for educational and research usage only. Learn more about eXplorer through these 2 videos: Introduction to eXplorer eXplorer Data Management Explore, present and communicate Read Paper about: Statistikatlas (SCB)

The R Project for Statistical Computing Rand Hindi, l’homme qui veut faire disparaître les technologies « On se tutoie ? » Avant même la première poignée de main, Rand Hindi, tout juste 30 ans, se montre décontracté. Son jean gris, troué, et son t-shirt gris, échancré sur un pendentif argenté, accréditent ce côté cool. « Notre objectif est de faire disparaître les technologies à long terme. » Rien de moins ! « Context awareness » Ni prophète de malheur, ni gourou, ce passionné de maths, de gestion des données (big data) et d’informatique – il a fait une thèse en bio-informatique à l’University college de Londres – a réuni à Paris une équipe spécialisée en intelligence artificielle. « Le jour où les objets connectés seront suffisamment intelligents pour ne plus être intrusifs, on pourra en ajouter autant que l’on veut, cela n’augmentera plus les frictions mais apportera, au contraire, de la valeur », prévient-il. Magnétisme Son indéniable magnétisme et une absence de complexe sont ses premiers atouts pour servir cette grande idée. Décliner les gros chèques De quinze à trente-cinq salariés

Related:  RESEARCH