Quick and dirty mongodb backups

I’m using MongoLab via Heroku for data storage in current projects. It’s very easy to setup but backups aren’t free. When you’re still exploring ideas you don’t necessarily want to spend $15/mo per mongo instance for backups. And also to restore a backup you need to download and run a bunch of commands manually anyway. So I wasn’t doing backups.

But then I accidentally cleared out the tables once or twice and that was inconvenient for testers. I wanted a solution that’s cheap(er) or free. It should run most days and it can degrade performance a bit.

So I have a cronjob that runs this bash script once a day from my laptop. It saves the POLLS and USERS tables in dev and prod instances in folders with the date.

#!/bin/bash

source ~/.bashrc

SCRIPT_DIR=$(cd $(dirname "$0"); pwd)
BACKUP_ROOT=${SCRIPT_DIR}/../backups
DATE=`date +%Y-%m-%d`

function backup_mongo {
    HOST=$1
    DB=$2
    USER=$3
    PASS=$4
    LABEL=$5

    OUTDIR=${BACKUP_ROOT}/${LABEL}/${DATE}
    mkdir -p ${OUTDIR}

    for COLLECTION in "POLLS" "USERS"
    do
        mongoexport -h ${HOST} -d ${DB} -c ${COLLECTION} -u ${USER} -p ${PASS} -o ${OUTDIR}/${COLLECTION}.json
    done
}

backup_mongo "dev_host:dev_port" dev_db_name dev_backup_username dev_backup_password "dev"
backup_mongo "prod_host:prod_port" prod_db_name prod_backup_username prod_backup_password "prod"

If my laptop isn’t on, it doesn’t run. But most days I’ll get a backup. The documentation clearly doesn’t want me to use mongoexport but it doesn’t affect the kind of data I’m storing. And there are side benefits to a Json copy: it’s easier to just open in an editor or even Python.

I also made read-only users for doing backups for a tiny bit of added safety.

This isn’t a big triumph but it saves me $30/mo (2 dbs) and gives me easier access to the data.

In a nutshell the documentation clearly advises against this for good reasons: Json doesn’t perfectly store MongoDB BSON, backups aren’t guaranteed to get a consistent state of your db, and it’s doing full collection processing so it’s doing heavy load. But when things are just starting you want something just barely good enough so you can move on to the important stuff.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s