Rules of Thumb for Migrating to NoSQL

27 Feb 2015 - Ethan Cerami

As covered in previous posts, it is difficult to determine if or when to migrate to a NoSQL database technology.

I have yet to find any concrete metrics on the matter. And, have therefore started my own rules of thumb:

Rule of Thumb 1: If you are on the scale of a Google or Facebook, then you should definitely consider NoSQL. For example, according to MongoDB, Facebook ingests 500 terabytes of new data a day. Much of this is likely photos and videos, and it's not clear if this truly needs to be stored in a database versus a distributed file store, but you get the idea. If you have anything approaching this scale, go with NoSQL.

Rule of Thumb 2: You are probably not Google or Facebook. Keep reading the remaining rules of thumb.

Rule of Thumb 3: Many people, vendors, developers, etc. will try to convince you that you have "big data", that relational databases do not scale, and you must therefore adopt NoSQL. Do not fall for this. Rather, accept their advice with two large grains of salt: 1) companies, such as MongoDB want to sell you something, and it's in their...

NoSQL Distilled by Pramod J. Sadalage and Martin Fowler

07 Jun 2014 - Ethan Cerami

For those of us in the bioinformatics and genomics space, the advent of NoSQL databases offer multiple opportunities for storing "Big Data". However, many of us are still grappling with the same set of questions: when (if ever) does it make sense to switch over to NoSQL? how much data does one need to justify a migration to NoSQL? what types of genomic data sets and applications are ripe for NoSQL? and, given the hundreds of NoSQL databases which now exist (see nosql-database.org), which do you go with?

None of these questions are easy, but if you are looking for a starting point to answering these questions on your own, I highly recommend NoSQL: Distilled, by Pramod J. Sadalage and Martin Fowler. Martin Fowler is the author of several well-known computer software books, including a co-author of one of my favorites: Refactoring: Improving the Design of Existing Code.

At just 192 pages, NoSQL: Distilled provides a concise,...