MongoDB gotchas for the unaware user
I like MongoDB, mostly because it’s so simple and natural to use from dynamic languages. I’ve used it in two of my projects so far (Encode and Sparrw) and, while I’m really happy with the choice, there were a couple of issues that caught me unaware and cost me a few additional hours of head-scratching or fixing. Some of these things are no-brainer if you have multiple machines and then assign some of them for database, but my use cases are low-traffic web app on a single (virtual) server.
These are all simple and documented things, and are not bugs (well, depending on who you ask). If you’ve read all the docs, you’ve probably read of most of them at some point. So did I, but then I didn’t remember them at the correct point in time and then had to fix things.
Use 64-bit version. 32-bit version has a limit on about 2.5GB of data stored. Yeah, it’s probably enough for playing around. But when you start configuring your production (or staging) system, remember to choose 64bit flavor, since you can’t just “fix” that later on, you’ll have to reinstall everything.
Have a slave db on another machine. If your MongoDB instance crashes (or gets killed due to OOM, or the whole system crashes), there’s no guarantee about what state your data is in. You can run repair, but this is like running fsck or playing the lottery – you never know what you’re going to get. So you really want to have a slave (or a replica set), and you want that slave to be on another server. This is really cumbersone if one VPS is enough for all your (other) needs, but there’s no avoiding it, if you value your data.
MongoDB 1.8 update:From 1.8 onwards, MongoDB supports journaling, making it safe to use on a single server. Journaling is not on by default as of yet, and it’s recommended to use journaling only on a 64-bit version.
Secure it. MongoDB is by default using no authentication and is listening on all network interfaces (this is true for version you can get directly from their site; various Linux distributions, such as Debian and Ubuntu, have a sane default of binding to 127.0.0.1 only), meaning anyone can access your db from anywhere in the world. If you use it on a publicly-visible server, this is a bit of a problem. You could either set up authentication or tell MongoDB to only listen on localhost. I prefer the latter because I’m the only user on my server anyways.
Always use getLastError. Unless you need lightning speed, it pays to wait a little to be sure the database is ok with your changes, and that there were no errors modifying the data – if nothing else, then to log it in your app so you know something bad happened. Or, if you’re certain you don’t need getLastEror(), at least never mix using and not using it on the same collection. MongoDB doesn’t guarantee that commands will be executed in the order given. In my test code, I had an “async” remove() call (ie. I didn’t wait for it to finish) and was then inserting new entries, and previous remove() happiliy removed them (all of them, or some, or none, depending on the race). Those were very confusing few hours.
There’s a lot of documentation online and a lot more info can be found on various forums, but it’s also good if you can get this information in a condensed form. For this I’ve found MongoDB: The Definitive Guide book and 10gen videos very helpful – for example, the deployment strategies video is great for a start.
I hope this few tips from my experience will help you avoid the mistakes I made while trying to use MongoDB
Edit: there’s a lot of useful comments about this on Hacker News as well.