Spanner

> > > > >

Rodrigo De Castro: Spanner: Google's Globally-Distributed Database. Today I read this recent paper by Google: Spanner: Google's Globally-Distributed Database As a globally-distributed database, Spanner provides several interesting features. First, the replication conﬁgurations for data can be dynamically controlled at a ﬁne grain by applications. Applications can specify constraints to control which datacenters contain which data, how far data is from its users (to control read latency), how far replicas are from each other (to control write latency), and how many replicas are maintained (to control durability, availability, and read performance).

Second, Spanner has two features that are difﬁcult to implement in a distributed database: it provides externally consistent [16] reads and writes, and globally-consistent reads across the database at a time-stamp. These features enable Spanner to support consistent backups, consistent MapReduce executions [12], and atomic schema updates, all at global scale, and even in the presence of ongoing transactions. Google Spans Entire Planet With GPS-Powered Database | Wired Enterprise. Three years ago, a top Google engineer named Vijay Gill was asked what he would do if someone gave him a magic wand. At the time, Gill helped run the massive network of data centers that underpins Google’s online empire, and he was sitting on stage at a conference in downtown San Francisco, discussing the unique challenges facing this globe-spanning operation. Jonathan Heiliger — the man who oversaw Facebook’s data centers — sat a few seats away, and it was Heiliger who asked Gill what he would add to Google’s data centers if he had a magic wand.

Gill hesitated before answering. And when he did answer, he was coy. But he seemed to say he would use that magic wand to build a single system that could automatically and instantly juggle information across all of Google’s data centers. ‘The conventional wisdom is that time synchronization like that, on a global scale, that is accurate enough for such a big distributed database … just isn’t practical.’ — Andy Gross. Alex Lloyd - Building Spanner. Research at Google. Abstract: Many of the services that are critical to Google’s ad business have historically been backed by MySQL. We have recently migrated several of these services to F1, a new RDBMS developed at Google.

F1 implements rich relational database features, including a strictly enforced schema, a powerful parallel SQL query engine, general transactions, change tracking and notiﬁcation, and indexing, and is built on top of a highly distributed storage system that scales on standard hardware in Google data centers. The store is dynamically sharded, supports transactionally-consistent replication across data centers, and is able to handle data center outages without data loss. The strong consistency properties of F1 and its storage system come at the cost of higher write latencies compared to MySQL.

Spanner : Google’s globally distributed database. Spanner is Google’s scalable, multi-version, globallydistributed, and synchronously-replicated database. It is the ﬁrst system to distribute data at global scale and support externally-consistent distributed transactions. Key features * Partitions data across many instances of Paxos state machines * Automatically repartitions data across machines as the data volume increases or new servers are added. This feature is just awesome! Say good bye to manual sharding!! * Scales up to trillions of database rows * Supports general purpose transactions * Provides a SQL based query language * Configurable replication * Externally consistent reads and writes * Globally consistent reads across the database at a timestamp Architecture A single deployment of Spanner is referred to as a universe.

In practice there is usually one universe per environment. A single spanserver controls about 100-1000 instances of a data structure called tablet. (key:string, timestamp -> string) Link to the original paper. Untitled. Spanner: Google's Globally-Distributed Database I and many others have been working for the last few years on building a large-scale storage system that can manage data across all of Google's datacenters. This system underlies Google's advertising system, among other products. We'll be presenting a paper describing the system (with 26 co-authors!) At OSDI 2012 next month. Feedback is welcome, of course. Here's the abstract of the paper: Spanner is Google's scalable, multi-version, globally-distributed, and synchronously-replicated database. Google Spanner : our understanding of concepts and implications. Research Publication: Spanner. Spanner: Google's Globally-Distributed Database James C.

Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, JJ Furman, Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter Hochschild, Wilson Hsieh, Sebastian Kanthak, Eugene Kogan, Hongyi Li, Alexander Lloyd, Sergey Melnik, David Mwaura, David Nagle, Sean Quinlan, Rajesh Rao, Lindsay Rolig, Yasushi Saito, Michal Szymaniak, Christopher Taylor, Ruth Wang, and Dale Woodford Abstract Spanner is Google's scalable, multi-version, globally-distributed, and synchronously-replicated database. It is the first system to distribute data at global scale and support externally-consistent distributed transactions. This paper describes how Spanner is structured, its feature set, the rationale underlying various design decisions, and a novel time API that exposes clock uncertainty.

Published in the Proceedings of OSDI'12: Tenth Symposium on Operating System Design and Implementation, Hollywood, CA, October, 2012. Slide 1. Spanner-osdi2012.