MongoDB: when documents aren’t perfect, cope with it

MongoDB is a NoSQL database that organizes data as JSON documents in collections instead as rows in tables. One major difference between a document in a collection and a row in a table is that while all rows in a table share the same schema, each document in a collection can be completely different. With all the huge advantages this approach brings, there is a friction when strongly typed domain objects are deserialized from a collection. A document needs to match the shape of the object that it should be serialized to.

The “official” MongoDB driver for .NET – that is the one offered and supported by 10Gen, the company behind MongoDB – offers a serializer that can be told quite exactly how tolerant it needs to be with the deserialization of JSON documents in to objects:

  • When a JSON document defines extra elements that are not present in the class that it is deserialized into, the serializer can either ignore them, stash them into a “catch-all” property, or throw an exception.
  • When properties on the class are missing in the JSON document, the serializer can set the property to null, it can set a default value, or it can throw an exception.

These options are well described in the official documentation.

Handling deserialization errors in collections

The Mongo C# driver supports operations to retrieve single elements from the database as well as collections. As we’ve seen, there is a lot of flexibility in defining a level of tolerance that matches your application’s requirements for the deserialization of single objects, the policy for collections is very simple: if any element fails, the whole collection is not loaded. While this is the right behaviour for scenarios that rely on data integrity, applications that focus on availability are threatened by complete loss of functionality when just a single document in a collection does not match the deserialization requirements.

  • A scenario that relies on data integrity is a form in a line of business application that edits a list of closely related data like line items of an invoice. The line items all need to provide the same set of information, otherwise the invoice cannot reliably be calculated.
  • A scenario that focuses on availability is a search in a product catalogue in an online shop. If a single product is not correctly formatted in the database, it’s better to just not show that single product than letting the whole search page become unavailable and basically close the shop until the ill-formatted product is identified and corrected.

In the case of the search page, a better behaviour would be to exclude any document that cannot be serialized, and provide a callback that identifies the problematic documents. The approach I’ve taken is to copy the default implementation of the collection serializer, and augment the bit that iterates over the elements of the collection with an event handler that keeps the iteration running:

while (bsonReader.ReadBsonType() != BsonType.EndOfDocument)
{
	var elementType = discriminatorConvention.GetActualType(bsonReader, typeof(T));
	var serializer = BsonSerializer.LookupSerializer(elementType);

	T element;
	try
	{
		element = (T)serializer.Deserialize(bsonReader, typeof(T), elementType, null);
		list.Add(element);
	}
	catch (FileFormatException exc)
	{
		// Pass the exception to the provided callback
		if (HandleDeserializationError != null)
		{                           
			HandleDeserializationError(exc);
		}

		// Move the cursor to the next element after the faulted one
		while (bsonReader.State != BsonReaderState.EndOfDocument)
		{
			if (bsonReader.State == BsonReaderState.Value) bsonReader.SkipValue();
			if (bsonReader.State == BsonReaderState.Type) bsonReader.ReadBsonType();
		}
		bsonReader.ReadEndDocument();
	}                    
}
bsonReader.ReadEndArray();

You can download the full source from Bitbucket. Please note that it’s not production quality – it has not been used in production, but that could change over the next few weeks, and then it’s going to get an update with the real thing.

Advertisements
About

Christian is a software architect/developer. He lives in Germany, reads a lot, and likes cycling.

Tagged with:
Posted in Coding
One comment on “MongoDB: when documents aren’t perfect, cope with it
  1. mobile website builder…

    […]MongoDB: when documents aren’t perfect, cope with it « greenicicle[…]…

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: