background preloader

Bigdata

Facebook Twitter

CrateDB vs Other Databases. There are many different databases on the market.

CrateDB vs Other Databases

This general CrateDB comparison will help you understand what makes CrateDB unique and whether it’s a good fit for you. Ideal use case CrateDB is ideal for real-time machine data and other applications that require: SQL access – CrateDB is accessed via ANSI SQLHigh velocity INSERTs – Scales linearly to handle millions of inserts per second.Easy scaling – Shared-nothing architecture automatically replicates and redistributes data as the cluster grows.Data type variety – Manages structured and unstructured data in the same database.Fast distributed queries, JOINs, aggregations – Innovative query engine delivers real-time performance, even for complex SQL queries.Open source economics – CrateDB is free to use under the Apache 2.0 license. Read on if you’d like to learn more about CrateDB differentiators.

Masterless, shared-nothing architecture Some databases have a strong master-slave model. Standard SQL API. Maître Céline ATIK-ARIANE, Avocate à la cour. 20 Big Data Repositories You Should Check Out. Data Science Central 20 Big Data Repositories You Should Check Out by Mirko Krivanek Aug 4, 2015 This is an interesting listing created by Bernard Marr.

20 Big Data Repositories You Should Check Out

Operationalizing Spark Streaming (Part 1) Operationalizing Spark Streaming (Part 1) For those looking to run Spark Streaming in production, this two-part article contains tips and best practices collected from the front lines during a recent exercise in taking Spark Streaming to production.

Operationalizing Spark Streaming (Part 1)

For my use case, Spark Streaming serves as the core processing engine for a new real time Lodging Market Intelligence system used across the Lodging Shopping stack on Expedia.com, Hotels.com and other brands. The system integrates with Kafka, S3, Aurora and Redshift and processes 500 msg/sec average with spikes up to 2000 msg/sec. L’évolution des architectures décisionnelles avec Big Data. Nous vivons une époque formidable.

L’évolution des architectures décisionnelles avec Big Data

En revenant un peu sur l’histoire de l’informatique, on apprend que les capacités que cela soit de RAM, disque ou CPU sont de grands sponsors de la loi de Moore au sens commun du terme (« quelque chose » qui double tous les dix-huit mois). Ces efforts seraient vains si les prix ne suivaient pas le phénomène inverse (divisés par 200 000 en 30 ans pour le disque par exemple). Exposé comme cela, on se dit que nos envies ne peuvent connaitre de limite et qu’il suffit de changer la RAM, le disque ou le CPU pour prendre en charge l’explosion du volume de données à traiter qui globalement suit bien la loi de Moore aussi.

Figure 1 Evolutions hardware, 2011, Alors où est le problème, qu’est qui fait que nos architectures décisionnelles aujourd’hui, non contentes de coûter de plus en plus chères, sont aussi en incapacité à se projeter sur des Tera ou des Peta de données. Figure 2 Evolution du débit des disques durs, source : wikipedia. Mettre en place un projet Big Data en entreprise. Le Big Data est une opportunité pour l’entreprise.

Mettre en place un projet Big Data en entreprise

En utilisant toutes les données issues de ses réseaux sociaux, de ses sites et de ses bases de données, l’entreprise peut améliorer sa connaissance des clients et des prospects. Elle peut aussi optimiser ses coûts ou innover. Mais pour mettre en place un projet Big Data, l’entreprise doit aussi repenser son fonctionnement, adopter des solutions techniques adaptées et être prête à suivre une nouvelle stratégie.

Concevoir sa plateforme Big Data. Introduction. Quartet d'Anscombe. List of Physical Visualizations. This list currently has 254 entries.

List of Physical Visualizations

Recent additions: While data sculptures date back from the 1990s, the very first sculptures were Venus figurines: A Venus figurine is any Upper Paleolithic statuette portraying a woman with exaggerated physical features. The oldest ones are about 35,000 years old. Right image: modern versions. Also see V.S. The earliest data visualizations were likely physical: built by arranging stones or pebbles, and later, clay tokens. Quipus were complex assemblies of knotted ropes that were used in South America as a data storage device and played an important role in the Inca administration. An orrery is a mechanical model of the solar system. A seismoscope is a qualitative indicator of seismic activity — as opposed to seismographs which show quantitative data, typically through line graphs.

The first terrain/city models date back from the 16th century and were created for military purposes. Dr. One bead = one year. A 3-D graph and a time series visualization. Bokeh Docs. 50 external machine learning / data science resources and articles. Is Big Data Still a Thing? (The 2016 Big Data Landscape) – Matt Turck. In a tech startup industry that loves its shiny new objects, the term “Big Data” is in the unenviable position of sounding increasingly “3 years ago”.

Is Big Data Still a Thing? (The 2016 Big Data Landscape) – Matt Turck

While Hadoop was created in 2006, interest in the concept of “Big Data” reached fever pitch sometime between 2011 and 2014. This was the period when, at least in the press and on industry panels, Big Data was the new “black”, “gold” or “oil”. However, at least in my conversations with people in the industry, there’s an increasing sense of having reached some kind of plateau. 2015 was probably the year when the cool kids in the data world (to the extent there is such a thing) moved on to obsessing over AI and its many related concepts and flavors: machine intelligence, deep learning, etc. Beyond semantics and the inevitable hype cycle, our fourth annual “Big Data Landscape” (scroll down) is a great opportunity to take a step back, reflect on what’s happened over the last year or so and ponder the future of this industry.

Your First Machine Learning Project in Python Step-By-Step. Do you want to do machine learning using Python, but you’re having trouble getting started?

Your First Machine Learning Project in Python Step-By-Step

In this post you will complete your first machine learning project using Python. In this step-by-step tutorial you will: Download and install Python SciPy and get the most useful package for machine learning in Python.Load a dataset and understand it’s structure using statistical summaries and data visualization.Create 6 machine learning models, pick the best and build confidence that the accuracy is reliable. If you are a machine learning beginner and looking to finally get started using Python, this tutorial was designed for you.

Let’s get started! Update Jan/2017: Updated to reflect changes to the scikit-learn API in version 0.18. Your First Machine Learning Project in Python Step-By-Step. Your First Machine Learning Project in Python Step-By-Step.