Data Curation

> > >

The shape of our future book — Satellite. “... the more one was lost in unfamiliar quarters of distant cities, the more one understood the other cities he had crossed to arrive there.” Marco Polo, talking to Genghis Khan Italo Calvino’s Invisible Cities Note: This essay is the extended version of the talk I gave at the dConstruct conference in Brighton, UK, September 2011. Over the last ten months of working at a startup, I’ve noticed that it’s very easy to lose perspective. That is, it's easy to forget to stick your head up and get context for the work at which your're cranking away. This is especially true while working on front-line digital content design problems. The notion of a “new,” digital kind of book scares a lot of folks because there is such a rich fabric of romanticism, nostalgia and myth built up around the physical book. As designers working with ebooks, we are at a point of special convergence: many of the promises of digital books (promises that have been spoken for decades) are coming to fruition.

And then: Philosophy | DataCurate. Only connect DataCurate’s mission is to connect people with the stuff they really want by helping content providers create and maintain better data. The rapid evolution of the web-based marketplace has transformed data needs for publishers, content providers, the publisher supply chain, and libraries. We’ve all seen the tremendous growth in information about and access to content that was hidden before. We’re also experiencing an explosion of new and converted content, often available from multiple providers and in multiple formats.

All players on the continuum, from content acquisition and production to discovery and use, struggle to manage information about content and to compete for user attention in this crowded landscape. It is actually becoming more difficult to be sure that marketing strategies are effective in connecting people to the right content for them. And it’s especially difficult to identify what isn’t found because the associated metadata wasn’t strong enough.

REGARDS SUR LE NUMERIQUE: Blog - Bernard Stiegler : l'open data est « un événement d’une ampleur comparable à l’apparition de l’alphabet » RSLNmag est édité par Microsoft et se consacre à l’analyse et au décryptage du monde numérique.. RSLN : Que représente le développement de l’open data dans la grande aventure du numérique ? Bernard Stiegler : C’est l’aboutissement d’une rupture majeure déjà largement entamée, et qui n’a rien à voir avec les précédentes. Toutes les technologies monopolisées par l’industrie de la culture, au sens large du terme, pendant un siècle, sont en train de passer entre les mains des citoyens. C’est un événement d’une ampleur comparable à l’apparition de l’alphabet qui, comme technique de publication, c’est à dire de rendu public, est au fondement de la res publica, tout comme à ce qui s’est déroulé après Gutenberg et la Réforme, généralisant l’accès à l’écriture imprimée et au savoir.

À présent, toutes les activités industrielles, culturelles et scientifiques laissent désormais une trace numérique que chacun peut exploiter grâce à des outils de plus en plus accessibles. Il s’agit d’un enjeu plus que majeur : c’est un changement d’époque. Des idéologies différentes Bernard Stiegler : C’est vrai. Digital Curation. = the process of establishing and developing long term repositories of digital assets for current and future reference by researchers, scientists, and historians, and scholars generally. "The advent of affordable global digital connectivity of unprecedented scale and scope has created opportunities not only for more effective and efficient research, but also for new, better, faster and previously impossible research.

Curation and management, of research results, are seen as the active management and appraisal of digital content during the entire life-cycle of scholarly and scientific interest; and are paramount to reproducibility and re-use for periods longer than 20 years. " ( JP Rangaswami: "Digital curation seems to be a richer form of curation than its analog equivalent. Here’s what I think it consists of: Authenticity Veracity Access Relevance Consume-ability Produce-ability. Abrams. Abstract The effective long-term curation of digital content requires expert analysis, policy setting, and decision making, and a robust technical infrastructure that can effect and enforce curation policies and implement appropriate curation activities.

Since the number, size, and diversity of content under curation management will undoubtedly continue to grow over time, and the state of curation understanding and best practices relative to that content will undergo a similar constant evolution, one of the overarching design goals of a sustainable curation infrastructure is flexibility. In order to provide the necessary flexibility of deployment and configuration in the face of potentially disruptive changes in technology, institutional mission, and user expectation, a useful design metaphor is provided by the Unix pipeline, in which complex behavior is an emergent property of the coordinated action of a number of simple independent components. Orbital Content. We are on the cusp of a complete overhaul of the way in which we interact with online content, and I think you should be a hell of a lot more excited than you currently are.

Bookmarklet apps like Instapaper, Svpply, and Readability are pointing us toward a future in which content is no longer entrenched in websites, but floats in orbit around users. This transformation of our relationship with content will force us to rethink existing reputation, distribution, and monetization models—and all for the better. Content today#section1 Most online content today is stuck. It has roots firmly planted in one of the many sites and applications around the web. In this system, the sites are the gravitational center and we, the users, orbit them, reaching out for a connection whenever we want to interact with the content. Websites have responded quickly to these new demands. Publishers have had the ability to make their content flexible for over a decade. Content shifting#section2. Le datajournalisme: vecteur de sens et de profits. Face à l'avalanche d'informations, les techniques de datamining permettent d'extraire du sens de bases de données.

La confiance devient la ressource rare, créatrice de valeur. Et les médias peuvent s'en emparer. Ce post reprend les éléments d’une réflexion amorcée avec Mirko Lorenz et Geoff McGhee dans un article intitulé Media Companies Must Become Trusted Data Hubs [en] et présentée à la conférence re:publica XI. Chaque jour, nous produisons deux ou trois exaoctets [en] de données, soit 1 million de téraoctets. Dans le même temps, Facebook et ses 600 millions d’utilisateurs produisent à eux seuls 70 téraoctets, soit à peine 0.007% du total. Pour comparer, un journal papier traditionnel pèse entre 1 et 50 mégaoctets. Si l’on veut synthétiser toute l’information produite en quelque chose de digeste pour l’utilisateur final, il faut résumer par un facteur de 100 milliards. Une fois équipé des bons outils, faire parler des masses de données devient possible. Toute information est une donnée. Et si Wikileaks était une métadonnée ?

Quel outil de curation pour votre marque ? — Curiouser. Quel outil de curation pour votre marque ? Méthodologie Pour réaliser cette infographie, nous avons privilégié les outils qui permettent à la marque de se présenter comme expert en proposant une ligne éditoriale propre, nous avons donc écarté les outils de curation automatique. En effet, ces derniers ne permettent pas à l’utilisateur de contrôler ce qui est publié. Lors de nos observations, nous avons pu constater que parmi le nombre très important d’outils existants, peu sont en réalité directement utilisables par une marque car peu proposent une réelle éditorialisation et dans un but de publication : c’est à dire la possibilité de choisir exactement les contenus, de les catégoriser, les tagger, les annoter ou les commenter mais aussi d’être maître de leur mise en forme.

Nous avons étudié les outils que nous avions jugés adéquats selon plusieurs critères : La marge de manœuvre concernant l’éditorialisation Le niveau d’interaction possible des lecteurs En curaclusion Alix pour Curiouser. Retrieval, Analysis, and Presentation Toolkit for usage of Online Resources (RAPTOR) Download final report1 Given the current economic climate and likelihood of tightening funding, understanding the usage of e- resources is becoming increasingly important as it allows an institution to understand which resources they need to keep subscribing to, and those which they may wish to unsubscribe from - potentially resulting in real-world cost savings.

This project will therefore build a software toolkit for reporting e- resource usage statistics (from Shibboleth IdPs and EZProxy) in a user-friendly manner suitable for non-technical staff. Aims and objectives Understand institutional and national accounting and reporting requirements around statistics of e-resource access via Shibboleth and Ezproxy. Build a free-to-use, open source software toolkit which aims to present statistical accounting information about e-resource usage to non-technical users, and makes basic aggregated usage information available to external organisations.

Project methodology Technology / Standards used. Business intelligence for research data curation? Gaining business intelligence from user activity data was the topic of a JISC workshop I attended this week – and a hot topic if the activity of JISC programme managers is anything to go by. I counted five, plus a large contingent of JISC service people, and of course Deputy Chair Professor David Baker who chaired this event.

The ‘business intelligence’ on the agenda was wide ranging; from Amazon-style recommendations based on other users’ online behaviour; to the potential for hard-pressed senior managers to make better decisions on what services and resources to select or dispose of by mining anonymised user activity data extracted from library systems and VLEs. There are some parallels here with the possibilities of ‘community curation’ to find new approaches to valuing datasets - approaches that rely less on the cost and expense of an expert committee. However the focus of the workshop was elsewhere. The implications for data curation are worth speculating on.

VEILLE SUR CURATION. @melbaek shared a screenshot from Flipboard.