background preloader

Databases

Facebook Twitter

How CockroachDB Does Distributed, Atomic Transactions. One of the headline features of CockroachDB is its full support for ACID transactions across arbitrary keys in a distributed database.

How CockroachDB Does Distributed, Atomic Transactions

CockroachDB transactions apply a set of operations to the database while maintaining some key properties: Atomicity, Consistency, Isolation, and Durability (ACID). In this post, we’ll be focusing on how CockroachDB enables atomic transactions without using locks. Atomicity can be defined as: For a group of database operations, either all of the operations are applied or none of them are applied. Without atomicity, a transaction that is interrupted may only write a portion of the changes it intended to make; this may leave your database in an inconsistent state. Strategy The strategy CockroachDB uses to provide atomic transactions follows these basic steps: Switch: Before modifying the value of any key, the transaction creates a switch, which is a writeable value distinct from any of the real values being changed in the batch.

The Detailed Transaction Process. Take Flight and Relax with DB Migrations. All Data Are Belong to AWS: Streaming upload via Fluentd. I've got a special treat for you today!

All Data Are Belong to AWS: Streaming upload via Fluentd

Kiyoto Tamura of Treasure Data wrote a really interesting guest post to introduce you to Fluentd and to show you how you can use it with a wide variety of AWS services to collect, store, and process data. -- Jeff; Data storage is Cheap. Data collection is Not! Data storage has become incredibly cheap. Cheaper storage means that our ideas are no longer bound by how much data we can store. However, data collection is still a major challenge: data does not magically originate inside storage systems or organize themselves; hence, many (ad hoc) scripts are written to parse and load data. This is the problem Fluentd tries to solve: scalable, flexible data collection in real-time.

Fluentd: Open Source Data Collector for High-volume Data Streams Fluentd is an open source data collector originally written at Treasure Data. Inputs and Outputs At the highest level, Fluentd consists of inputs and outputs. Common inputs are: Run the following command: -- Kiyoto. Bloomd: Serving those Flowers Fast. Bloomd Bloomd is a high-performance C server which is used to expose bloom filters and operations over them to networked clients.

Bloomd: Serving those Flowers Fast

It uses a simple ASCI protocol which is human readable, and similar to memcached. Features Scalable non-blocking core allows for many connected clients and concurrent operationsImplements scalable bloom filters, allowing dynamic filter sizesSupports asynchronous flushes to disk for persistenceSupports non-disk backed bloom filters for high I/OAutomatically faults cold filters out of memory to save resourcesDead simple to start and administerFAST, FAST, FAST.

OOS: Oops for ORM. OOS is an object-relational mapping (ORM) framework written in C++.

OOS: Oops for ORM

It aims to encapsulate all the database backend stuff. You don’t have to deal with database backends or sql statements neither with mapping of data types or serialization of objects. It provides an easy to use api and as a unique feature it comes with one container for all objects - the object store. Given this container one has a centralized point of storage for all objects but with the ability to create views on concrete object types, link or filter them. Features. LinkedIn: Creating a Low Latency Change Data Capture System with Databus. This is a guest post by Siddharth Anand, a senior member of LinkedIn's Distributed Data Systems team.

LinkedIn: Creating a Low Latency Change Data Capture System with Databus

Over the past 3 years, I've had the good fortune to work with many emerging NoSQL products in the context of supporting the needs of a high-traffic, customer facing web site. In 2010, I helped Netflix to successfully transition its web scale use-cases from Oracle to SimpleDB, AWS' hosted database service. On completion of that migration, we started a second migration, this time from SimpleDB to Cassandra. The first transition was key to our move from our own data center to AWS' cloud.

The second was key to our expansion from one AWS Region to multiple geographically-distributed Regions -- today Netflix serves traffic out of two AWS Regions, one in Virginia, the other in Ireland (F1). In December 2011, I moved to LinkedIn's Distributed Data Systems (DDS) team. Having observed two high-traffic web companies solve similar problems, I cannot help but notice a set of wheel-reinventions. What Makes a Good Data Scientist? Another Step-by-Step SqlAlchemy Tutorial (part 1 of 2) A long time ago (circa 2007 if Google serves me right), there was a Python programmer named Robin Munn who wrote a really nice tutorial on SqlAlchemy.

Another Step-by-Step SqlAlchemy Tutorial (part 1 of 2)

It was originally based on the 0.1 release, but updated for the newer 0.2. Then, Mr. Munn just disappeared and the tutorial was never updated. I have been kicking around the idea of releasing my own version of this tutorial for quite some time and finally decided to just do it. I hope you will find this article helpful as I found the original to be. Getting Started SqlAlchemy is usually referred to as an Object Relational Mapper (ORM), although it is much more full featured than any of the other Python ORMs that I’ve used, such as SqlObject or the one that’s built into Django.

This tutorial will be based on the latest released version of SqlAlchemy: 0.5.8. Import sqlalchemy print sqlalchemy. Note: I’ll also be using Python 2.5 on Windows for testing. Python setup.py install easy_install sqlalchemy Looking Deeper Inserting Selecting Wrapping Up. A step-by-step SQLAlchemy tutorial. About This Tutorial This tutorial is for SQLAlchemy version 0.2.

A step-by-step SQLAlchemy tutorial

You may notice that some sections are marked "New in 0.2". If this is the first time you're reading this tutorial, you can safely skip those sections. On the other hand, if you read the previous version of this tutorial and are now trying to learn SQLAlchemy 0.2, then just search for the text "New in 0.2" and you'll have a lot less reading to do this time around. (Do make sure to update all the demo code, though: every single file of the demo code has changed, and you'll get errors if you try to run the old demo code under SQLAlchemy 0.2).