MongoDB gotchas for the unaware user

I like MongoDB, mostly because it’s so simple and natural to use from dynamic languages. I’ve used it in two of my projects so far (Encode and Sparrw) and, while I’m really happy with the choice, there were a couple of issues that caught me unaware and cost me a few additional hours of head-scratching or fixing. Some of these things are no-brainer if you have multiple machines and then assign some of them for database, but my use cases are low-traffic web app on a single (virtual) server.

These are all simple and documented things, and are not bugs (well, depending on who you ask). If you’ve read all the docs, you’ve probably read of most of them at some point. So did I, but then I didn’t remember them at the correct point in time and then had to fix things.

Use 64-bit version. 32-bit version has a limit on about 2.5GB of data stored. Yeah, it’s probably enough for playing around. But when you start configuring your production (or staging) system, remember to choose 64bit flavor, since you can’t just “fix” that later on, you’ll have to reinstall everything.

Have a slave db on another machine. If your MongoDB instance crashes (or gets killed due to OOM, or the whole system crashes), there’s no guarantee about what state your data is in. You can run repair, but this is like running fsck or playing the lottery – you never know what you’re going to get. So you really want to have a slave (or a replica set), and you want that slave to be on another server. This is really cumbersone if one VPS is enough for all your (other) needs, but there’s no avoiding it, if you value your data.

MongoDB 1.8 update:From 1.8 onwards, MongoDB supports journaling, making it safe to use on a single server. Journaling is not on by default as of yet, and it’s recommended to use journaling only on a 64-bit version.

Secure it. MongoDB is by default using no authentication and is listening on all network interfaces (this is true for version you can get directly from their site; various Linux distributions, such as Debian and Ubuntu, have a sane default of binding to 127.0.0.1 only), meaning anyone can access your db from anywhere in the world. If you use it on a publicly-visible server, this is a bit of a problem. You could either set up authentication or tell MongoDB to only listen on localhost. I prefer the latter because I’m the only user on my server anyways.

Always use getLastError. Unless you need lightning speed, it pays to wait a little to be sure the database is ok with your changes, and that there were no errors modifying the data – if nothing else, then to log it in your app so you know something bad happened. Or, if you’re certain you don’t need getLastEror(), at least never mix using and not using it on the same collection. MongoDB doesn’t guarantee that commands will be executed in the order given. In my test code, I had an “async” remove() call (ie. I didn’t wait for it to finish) and was then inserting new entries, and previous remove() happiliy removed them (all of them, or some, or none, depending on the race). Those were very confusing few hours.

There’s a lot of documentation online and a lot more info can be found on various forums, but it’s also good if you can get this information in a condensed form. For this I’ve found MongoDB: The Definitive Guide book and 10gen videos very helpful – for example, the deployment strategies video is great for a start.

I hope this few tips from my experience will help you avoid the mistakes I made while trying to use MongoDB :-)

Edit: there’s a lot of useful comments about this on Hacker News as well.

23 comments

  1. Thanks for the post. One clarification: if you send multiple operations (without getlasterror) on a single connection, they will be guaranteed to happen in order.

    The key is for a sequence of writes that go together to use the same db connection. If you have a driver that randomly gives you a connection from a pool on every operation, that may not happen automatically. What language are you working in for the most part?

    Reply

    • Thanks for the clarification

      I’m using the Python driver (pymongo), and yes I believe it does have a pool of connections so that might be the cause of it.

      Reply

  2. Thanks for this important summary.

    Reply

  3. Check out MongoHQ for a hosted MongoDB solution.

    Reply

  4. Thanks, good post.
    We’re using the php driver, I’m assuming the getLastError() you’re referring to maps to PHP’s MongoDB::lastError() method…

    I’m wondering if calling getLastError on a busy database can return you an error from another transaction? If so, it may not be safe to assume the current code’s transaction failed.

    In PHP to force synchronous updates and removes I add the ‘safe=true’ option to the calls. This should help us catch errors related to that specific transaction. (I say “should” because even though I have the code in there I haven’t done a negative test to ensure its working as desired).

    example of using ‘safe’ in php:
    $someDB->update(array(‘_id’ => $some_id),
    array(‘$push’ => array(‘msgs’ => $newMsg)),
    array(‘upsert’ => false, ‘safe’ => true) );

    As a side note we’re making heavy use of gridfs to store files.

    Any experiences you can share with gridfs you can share?

    So far I’ve been totally delighted. Putting files in and then streaming them all the way back to the browser is a piece of cake!!!

    cheers!

    Reply

    • Bindings implement their “safe” option using getLastError() internally. Eg. in Python I add safe=True to my calls, and I’m not explicitly calling getLastError (again).

      I assume the bindings are careful to reuse the same database connection for this, and as dm commented, the connection guarantees ordering, so you should be safe. If this weren’t guaranteed, it’d be pretty much useless command.

      I haven’t needed to store files using gridfs, so can’t comment on it. Looks like a great solution for networked storage though, I’d probably prefer it to NFS if I needed something like that.

      Reply

  5. If you want to risk your customer’s data and your reputation on MongoDB, then drink the MongoDB kool aid.

    64-bit OS is not a good choice for small RAM VPS. MongoDB uses more memory and is slower than MySQL on small VPS. MongoDB is good for all the million facebook killer apps out there. (sarcasm intended)

    MongoDB is impractical for small business without transactions for things like e-commerce. Data issues plague MongoDB much more so than traditional RDBMS. Schema-less is also a myth in real world use. The first thing most engineers do is create a pseud0-schema with their favorite ODM. Do we really have to worry about schemas in a framework like Rails with migrations?

    Reply

  6. Good stuff. Excellent comments too.

    I have to try MongoDB out.

    Can you recommend some good performance benchmark for it?

    cheers!

    Reply

  7. Maybe it’s just me, but when two of the 4 comments revolve around data instability… it doesn’t give the warm-n-fuzzies.

    Can’t go from 32-bit to 64-bit at will – design flaw. 32-bit is limited to 2GB databases? That’s tiny and it’s never been a problem for any other 32-bit only database I’ve ever seen or used. How is it that this isn’t considered a serious flaw? Go 64-bit from the start… ok… then why is a 32-bit version even offered? Particularly if you can’t switch later!?

    Run a slave on a separate machine so that when the master crashes you don’t lose all your data. Umm.. ok? This is normal? Why would it crashing cause you to lose all your data? That’s not a good thing for a database. :(

    Reply

    • You can go to 64-bit at will (that might or might not involve dump & import of the data, I haven’t really looked into that), but as you can’t run a 64-bit version of MongoDB on a 32-bit operating system, you have to have your OS 64-bit as well (if you do that, using 64-bit MongoDB on it is a no-brainer anyways).

      The 32-bit limit as well as the “no way of knowing which of the data in the database ok and which is corrupt” is a consequence of the way MongoDB stores the actual data – it uses memory-mapped files.

      I believe the developers are working on an alternative storage engine that will avoid these problems (but probably have some trade-offs), but that’s still a long way off.

      Reply

  8. One thing that really tripped me up was not being able to use the same property in a query more than once. The shell doesn’t throw an error which can add to the confusion.

    Reply

  1. [...] good and simple checklist for setting up MongoDB in production: http://senko.net/en/mongodb-gotchas/ #mongodb Tags: twitter Comments RSS feed blog comments powered by Disqus [...]

    Reply

  2. [...] Tips for MongoDB Beginners Senko Rašić has created a list of tips for MongoDB beginners called MongoDB gotchas for the unaware user. All of these “gotchas” are in the MongoDB documentation, but are the sorts of things [...]

    Reply

  3. [...] Raši? has created a list of tips for MongoDB beginners called MongoDB gotchas for the unaware user. All of these “gotchas” are in the MongoDB documentation, but are the sorts of things [...]

    Reply

  4. [...] Sparrw) and, while I’m really happy with the choice, there were a couple of issues that… [full post] Senko Senko Rasic's Blog databasesweb 0 0 0 0 0 [...]

    Reply

  5. [...] Essas são algumas dicas que todo o iniciante deve considerar. Esse post foi inspirado no MongoDB gotchas for the unaware user, com algumas adaptações feitas por mim, considerando os pontos que julgo mais [...]

    Reply

  6. [...] game. MongoDB gotchas for the unaware user. Just a few to get you started. For the next party, we’re going to have a MongoDB gotchas [...]

    Reply

  7. [...] Ra?i? has created a list of tips for MongoDB beginners called MongoDB gotchas for the unaware user. All of these “gotchas” are in the MongoDB documentation, but are the sorts of things [...]

    Reply

  8. [...] http://senko.net/en/mongodb-gotchas/ Leave a Comment TrackBack URI [...]

    Reply

  9. [...] = 'kurungsiku';Senko Rašic menulis beberapa tips untuk pemula MongoDB berjudul MongoDB gotchas for the unaware users. Tips ini sebenarnya ada di dokumentasi MongoDB tetapi merupakan hal sepele yang mudah anda lupakan [...]

    Reply

  10. [...] Hack – thanks! Senko Rašić has created a list of tips for MongoDB beginners called MongoDB Gotchas for the Unaware User.Here are some of his helpful tips when working with [...]

    Reply

Add comment