MongoDB and Riak

12/18/12 UPDATE

Since I am the only DevOps working on this, and there are tons of other things requiring my attention, I had to drop riak. The engineers only know mongodb anyway, and they are reluctant to learn a new nosql (riak). Crap! So this project had been killed. Too bad.

I have some python scripts that I wrote to copy mongodb collections over to riak, if I have time, I’ll open source them.

======================

I’ve been working with MongoDB at current $WORK and previous jobs. It (used-to-be) is the nice, shiny toy that everyone rushed to. I’ve run into numerous limitation in trying to scale it up. Operationally, it can be a nightmare if the architecture was not setup correctly at the beginning.

Mongo is also a PITA to scale. There are major sites that have Mongos in the thousands, but at that point it become throwing hw and money at the problem. That just seem stupid for startups.

So at current place of $WORK, they are currently testing MongoDB, but I wanted to look for an alternative solution before we become fully committed to yet another operational nightmare.

After a lot of googling, testing, experimenting, etc. I decide on trying Riak from Basho.

Googling shows a number of companies migrated from MongoDB to Riak. Their experiences was useful, but I was looking more for concrete HOWTO to move large MongoDB over to Riak.

First, of course was to get hands-on experiences with Riak. Installed, play with it, etc. Then I used the riak-python-client lib to start migrating some data over. I wrote a script to work through all collections in a Mongo DB, for each collection, create a Riak bucket and add the Mongo doc to bucket using the Mongo _id as the key.

Right away, I run into some issues with Riak. I have a 3 nodes Riak (on 3 physical CentOS 5.8 servers). The MongoDB I was copying over was large, about 2GB on disk file size and over a million records. Partway through the conversion, 2 Riak nodes crashed and died…. WTF! No matter what I do, they wouldn’t start back up (Riak log shows some kind of erlang errors, but I don’t know erlang). So I stopped the only node running, ‘rm -rf /var/lib/riak/*’, ‘killall epmd’, restarted all 3 nodes and they came back up.

I don’t have time to debug this problem, so restarted conversion with a smaller subset of Mongo data. But this crash worries me. The erl_crash.dump shows Riak run into resource issues, unable to allocate heap memory. Hmmm.

More on my adventure in evaluating Riak vs MongoDB in the future.