MongoDB replication pitfalls

Wed 23 October 2013

I use MongoDB almost daily as part of my data analytics workflow. It's a great tool for data collection and aggregation before you process the data into structured information. Until today, I was running an instance of MongoDB as a standalone server with no backups. This is mostly fine if you have other ways to back up the data, but I decided to enable replication for increased performance, reliability and peace of mind.

The process to convert a standalone instance to a replicated set is pretty straightforward, but I did encounter a few pitfalls which are mentioned below:

  • Set the priority of the instance you want to be primary higher than any other hosts. This can be specified in the call to rs.initiate() or rs.reconfig() (as part of the argument) by passing in the modified configuration document.

  • Make sure the hostname() value matches what you specify in the configuration for all hosts - primary and secondary. For example, if hostname() returns myhost and you specify myhost.somedomain.com, it is likely not going to work. A quick workaround is to modify the /etc/hosts file to make myhost point to the right IP address.

  • Re-directing a public port to a different internal port (a common occurence if you're on Windows Azure) confuses MongoDB. Make sure you map public port 27017 to private port 27017. This took me some time to figure out. By the way, MongoDB uses TCP.

  • You have to use keyFile for authentication. The credentials you created after enabling auth will still work on the primary (original standalone) but you need to disable the auth option in the configuration and restart MongoDB.

  • The file that keyFile points to must have the right permissions. Give mongodb user ownership of the file (chown mongodb /path/to/file) and lock it down by issuing chmod og-rxw /path/to/file.

  • If the instance you want as primary ends up being selected as secondary, changes the priorities as mentioned above and then do a rs.reconfig(conf, {force: true}. This will work if the (unwanted) primary is currently down. Otherwise, follow instructions in this document.

After messing with these configurations (and then some), replication seems to be in progress. Keeping my fingers crosses for smooth sailing from this point on.