Beware of the dreaded passive arbiter!

The setup seemed simple enough: Two application servers running ASP.NET MVC, and a MongoDB replica set with two standard nodes and an arbiter.

If you’re familiar with MongoDB replica sets, feel free to skip the following paragraph:
When using MongoDB in a productive environment, two servers and an arbiter is much of the absolute minimum requirement. You cannot seriously run on a single server; MongoDB is not designed to provide good durability on just one machine. A two-node replica set provides replication of changes to a second machine. It also provides a failover server should one server go down. A replica set works much like classic master-slave replication, but the master is elected automatically: if the node that currently holds the master role (the “primary”) does not respond to a heartbeat, another node (a “secondary”) becomes the new primary. Which secondary is promoted is quite an interesting ceremony. An important aspect is that for a primary to be elected, it needs to “see” the majority of nodes in the replica set – otherwise, a broken network link could split the replica set in the middle, and there’d be two primarys (see a popular piece of fantasy literature on what happens in such a situation). A result of these rules is that a replica set with just two nodes cannot work – when one server goes down, the other won’t see a majority; even when the secondary node fails, the primary demotes itself because it finds itself alone.
This is where an arbiter comes in. An arbiter is a MongoDB instance that does not hald any data. It does not take part in replication. Its sole purpose is to provide a majority for the election of a new primary. In a two-node setup with an arbiter, one server can fail and the other still has a majority of two thirds of the replica set.
And then there’s priority. Priority is a MongoDB server setting that determines the order in which nodes are elected primary – a node with a higher priority wins. The most useful setting for priority however is 0. Setting a node’s priority to 0 makes it a “passive” node – it will never become elected the primary. This is useful if a machine is e.g. used just for backups, or if it’s in a remote data center. Otherwise, a passive node acts like a normal member of the replica set – data is replicated to it, and it can be used for running queries against it.

Alright. So we’ve got passive nodes, and arbiters. This is mutual exclusive – an arbiter cannot be a passive node. Passive nodes can be used for queries, arbiters can’t, they don’t hold data.

To query the status of a node in a replica set, you can run the command db.isMaster() against it. It returns a document that describes the shape of the replica set, and the role the queried server plays within it. Two fields that are returned are if the node is a passive, and if it’s an arbiter.

The C# driver for MongoDB (and probably other drivers as well) use the isMaster command to determine which servers are available, which is the primary, and qhich ones can be used for queries. If a query is allowed to be executed against a read-only replica of the database by setting the slaveOk setting, the driver looks for secondary and passive nodes. Which is correct. But this is where things start to go wrong – using this sequence of in itself harmless details.

MongoDB up to and including the current version 1.81 allow setting the priority on an arbiter node.

If you set the priority to 0, a node will report itself to be a passive. Yes, this includes arbiters. Ouch.

The driver looks for secondaries and passives to run queries. It does not exclude arbiters, because these normally aren’t secondaries or passives. Except when the server claims to be one, because someone set the priority to 0. Second ouch.

If you run a query against an arbiter, it will return an error that it’s neither primary nor secondary, and cannot be read from: { "$err" : "not master or secondary, can't read", "code" : 13436 } .Luckily, the driver recognizes the connection as faulty and removes it from the connection pool.

So what kept me busy for about ten working hours was this: an application server was reporting a MongoDB error once every time the IIS application pool had restarted. It turned out it was connecting to an arbiter, and the reason for that was that a DBA had set the priority on the sorry arbiter to 0, which was a leftover of its configuration before it became an arbiter.

First conclusion: don’t set the arbiter’s priority to 0. Just don’t. Arbiters don’t need a priority setting, or in fact, mostly no settings at all.

Second conclusion: I really love MongoDB for its simplicity, the data modelling possibilities, and its performance. But things like the dreaded passive arbiter really raise a question mark on its maturity – the problem would have been easily avoidable on either the server or the driver (or both, for better). What also needs to be admitted that 10gen is extremely responsive and helpful. They promised to fix the C# driver within a week(!) in release 1.2, and scheduled a fix for the server for version 1.92 – all within one day reaction time.

Advertisements
About

Christian is a software architect/developer. He lives in Germany, reads a lot, and likes cycling.

Tagged with: , ,
Posted in Coding

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: