Cassandra - frequent mistakes | NSFAQ (Not So Frequently Asked Questions) Since we started working with Cassandra I've noted down all the mistakes we made due to our inexperience with the application, so we don't repeat them again.
I didn't talk about them much because I was really ashamed for some of them :D But recently I've seen a video talking about frequent mistakes with Cassandra, and almost all our mistakes were there! If only this video had existed when I started... *sigh* But hey, now that I've seen we are not dumb, because being wrong is part of learning Cassandra, I am not ashamed anymore, and I'll explain all the mistakes, just to help out anybody starting with cassandra right now.
Here we go! Mistake #1- Using SAN or RAID 1/10/5 These systems solve inexistent problems, because Cassandra is developed to be fault tolerant within its nodes. [CASSANDRA-3870] Internal error processing batch_mutate: java.util.ConcurrentModificationException on CounterColumn. I don't think that is the goal of that code.
We already have code for that (make sure a node don't get overwhelm writing hints locally) in sendToHintedEndpoints. Missed the totalHintsInProgress check. So I'm wondering, do we really have a strong reason for waiting for hints during writes in the first place. IMHO no, other than CL ANY. I know it's different in 1.0 but HH provides weak guarantees. Synchronizing Clocks In a Cassandra Cluster, Pt. 2: Solutions.
This article was originally written by Viliam Holub This is the second part of a two part series.
Before you read this, you should go back and read the original article, “Synchronizing Clocks In a Cassandra Cluster Pt. 1 – The Problem.” In it, I covered how important clocks are and how bad clocks can be in virtualized systems (like Amazon EC2) today. In today’s installment, I’m going to cover some disadvantages of off-the-shelf NTP installations, and how to overcome them. Synchronizing Clocks In a Cassandra Cluster, Pt. 1: The Problem. This article was originally written by Viliam Holub Cassandra is a highly-distributable NoSQL database with tunable consistency.
What makes it highly distributable makes it also, in part, vulnerable: the whole deployment must run on synchronized clocks. It’s quite surprising that, given how crucial this is, it is not covered sufficiently in literature. And, if it is, it simply refers to installation of a NTP daemon on each node which – if followed blindly – leads to really bad consequences. The History of Apache Cassandra. HBase vs Cassandra. Making Things Easier with Cassandra GUI 2.0.
Cassandra GUI evolved from its first version and new version includes bug fixes and enhanced features.
New features. Complete pagination for Row view of explorerSearch rows by their names. Cassandra-user - frequent client exceptions on 0.7.0. Hello, We were occasionally experiencing client exceptions with 0.6.3, so we upgraded to 0.7.0 a couple weeks ago, but unfortunately we now get more client exceptions, and more frequently.
Also, occasionally nodetool ring will show a node Down even though cassandra is still running and the node will be up again shortly. We run nodetool ring every half hour or so for monitoring, otherwise we probably would not have noticed. I'm trying to determine whether we are hitting some bugs, just don't have enough hardware for our application, or have made some error in configuration. I would happy to provide any more information or run tests to narrow down the problem.
Cassandra Indexing: The good, the bad and the ugly. We Recommend These Resources Within NoSQL, the operations of indexing, fetching and searching for information are intimately tied to the physical storage mechanisms.
It is important to remember that rows are stored across hosts, but a single row is stored on a single host. (with replicas) Columns families are stored in sorted order, which makes querying a set of columns efficient (provided you are spanning rows). Cloud Architecture Tutorial - Running in the Cloud (3of3) Announcing Astyanax. AstyanaxContext<Keyspace> context = new AstyanaxContext.Builder() .forCluster("ClusterName") .forKeyspace("KeyspaceName")
Compressed families not created on new node. Cassandra NYC 2011: Nathan Milford - Cassandra for System Admins. Cassandra for LOBS | Ruby Zone. Database storage is expensive.
This is especially true if you build a traditional SAN based M+N cluster. The cost of the storage array, fiber channel switches, fiber channel interfaces, drives the cost per terabyte into the thousands quite easily. And while storage costs in general are plummeting, SAN storage costs are falling at a slower rate, widening the gap between SAN and direct attached storage. Given the cost of SAN storage, it would be unfortunate to waste it which is what we discovered we were doing. Our platform makes a lot of 3rd party service calls. "Building on Quicksand" Paper for CIDR (Conference on Innovative Database Research) - PatHelland's WebLog. Upgrading Cassandra: 0.8.x to 1.0.x | DataStax Cassandra 1.0 Documentation. This section describes how to upgrade Cassandra 0.8.x to 1.0.x and how to upgrade between minor releases of Cassandra 1.0.x.
The procedures also apply to DataStax Community Edition. Best Practices for Upgrading Cassandra The following steps are recommended before upgrading Cassandra: Upgrading Cassandra: 0.8.x to 1.0.x. What’s new in Cassandra 1.0: Compression. Cassandra 1.0 introduces support for data compression on a per-ColumnFamily basis, one of the most-requested features since the project started. Compression maximizes the storage capacity of your Cassandra nodes by reducing the volume of data on disk. In addition to the space-saving benefits, compression also reduces disk I/O, particularly for read-dominated workloads. Compression benefits Besides data size, compression typically improves both read and write performance. Cassandra is able to quickly find the location of rows in the SSTable index, and only decompresses the relevant row chunks. Netflix Benchmarks on AWS Show Cassandra NoSQL Still Has the Goods. A little more than a year ago, Apache Cassandra's reputation was untouchable. It was blowing other NoSQL data stores out of the water in benchmarks and in our very own DZone popularity poll.
What else would you expect from the data solution that was originally designed to handle the data on Facebook. How could it not be the top solution out there? The Apache Cassandra Project. Intro — Hector v0.8.x documentation. Sebgiroux/Cassandra-Cluster-Admin - GitHub. Tuning Cassandra | DataStax Cassandra 0.8 Documentation. Effective tuning depends not only on the types of operations your cluster performs most frequently, but also on the shape of the data itself.
For example, Cassandra’s memtables have overhead for index structures on top of the actual data they store. If the size of the values stored in the columns is small compared to the number of columns and rows themselves (sometimes called skinny rows), this overhead can be substantial. Thus, the optimal tuning for this type of data is quite different than the optimal tuning for a small numbers of columns with more data (fat rows). Zznate/cassandra-stress - GitHub. SLF4J. NodeTool. More and more instrumentation is being added to Cassandra via standard JMX apis. The nodetool utility (nodeprobe in versions prior to 0.6) provides a simple command line interface to these exposed operations and attributes. See Operations for a more high-level view of when you would want to use the actions described here. Cassandra Write Performance – A quick look inside Application Performance.
I was looking at Cassandra, one of the major NoSQL solutions, and I was immediately impressed with its write speed even on my notebook. But I also noticed that it was very volatile in its response time, so I took a deeper look at it. First Cassandra Write Test I did the first write tests on my local machine, but I had a goal in mind. I wanted to see how fast I could insert 150K data points each consisting of 3 values. MX4J - Open Source Java Management Extensions. Linux performance basics. What’s new in Cassandra 0.7: expiring columns. Contents | DataStax Cassandra 0.8 Documentation. Clustering | DataStax Cassandra 0.7 Documentation. Operations. Cassandra load balancing. Database design - What's The Best Practice In Designing A Cassandra Data Model. Cassandra: RandomPartitioner vs OrderPreservingPartitioner « Dominic Williams.
Driftx/chiton - GitHub. Apache Cassandra Glossary. API. User Guide - GitHub. Hector – a Java Cassandra client | PrettyPrint.me. Zznate/cassandra-tutorial - GitHub. Up and running with cassandra.