Skip to content

MongoDB: an example document store

The name comes from a contraction of humongous, an informal word meaning very large. The reason for this is that MongoDB is primarily designed for handling very large sets of semi-structured data, typically referred to as Big Data. In addition, MongoDB is mainly intended for use in building Web applications and seeks to take advantage of existing database knowledge. It therefore uses a syntax familiar to javascript developers and concepts from the field of relational databases.

Like other NoSQL models, MongoDB relies heavily on a single key field to retrieve data efficiently. The MongoDB _id field thus acts very much like a primary key in a relational database but it is associated with a set of data values which are much less regular in structure than the tables in a relational database.

In a MongoDB database, data is stored as a collection of objects which are described using Javascript Object Notation (JSON). The code snippet below shows an example of a JSON object which represents a student.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
    { "_id" : 1,
       "matric" : 40000001,
       "name" : { "first" : "John", "last" : "Johnson" },
       "programme" : { "qualification" : "BEng", "title" : "Computing" },
       "modules" : [ {"code": "SET07102", "title": "Software Development 1", "Result": 75 },
                     {"code": "CSN07101", "title": "Computer Systems 1", "Result": 66 },
                     {"code": "SET08108", "title": "Software Development 2", "Result": "NULL"},
                     {"code": "INF08104", "title": "Database Systems", "Result": "NULL"}
                   ]
    }

The example illustrates several of the important features of JSON:

  • An object is enclosed in braces: { ... }
  • An object is composed of elements which are key - value pairs separated by a colon - eg. line 2
  • Elements are separated by commas
  • Objects can be nested - eg. line 3
  • A value can be an array enclosed by square brackets: [ ... ] - eg. lines 5 - 9

The example also illustrates the use of the MongoDB _id field (line 1) and the use of null values (lines 7 and 8).

In MongoDB, a collection of similar objects is the equivalent of a relational table. Unlike a table, however, objects in a collection do not necessarily have exactly the same structure, and there is no need to define the collection in advance of storing an object. This means there is essentially no independently defined schema.

Although MongoDB actually uses an extension of the JSON notation (Binary JSON, or BSON), the syntax makes it extremely easy to perform database operations in a javascript program. Operations are specified for a collection of objects in the current database using standard javascript commands. The code snippet below illustrates the process of inserting a new object into the students collection, querying it, updating it and finally deleting it.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
    db.students.insert( {
       matric: 40000001,
       name: {first: "John", last: "Johnson"},
       programme: {qualification: "BEng", title: "Computing"} );

    db.students.find( {matric: 40000001} );

    db.students.update( {matric: 40000001},
       {$set: programme.qualification: "BSc"} );

    db.students.remove( {matric: 40000001} );

The correspondences between the MongoDB syntax and SQL are easy to see. For example, the query (line 6), update (lines 8-9) and remove statements (line 11) all contain the string {matric: 40000001} which performs the same function as the criteria in an SQL WHERE clause. Notice the use of a special operator $set in the update command to indicate which fields should be changed.

What is not so obvious is that for every new object, MongoDB will add an _id field and populate it with a value which is unique across all objects in the collection. In addition, if the collection does not already exist then just inserting a new object will create it. Further objects with different structure (ie different key - value pairs) can be added to the same collection. In this way, MongoDB provides what it calls a flexible schema. This makes creating data very straightforward, but provides very loose control over the structure of the data.

Complex operations

Although missing in the early releases, MongoDB now provides support for several complex operations that were taken for granted in relational databases. For example, aggregation operations (group functions) are supported in the current version. Again, looking at an example, it is possible to see the correspondence with the SQL approach:

1
2
3
4
5
6
    db.students.aggregate( {
       $match: { $and: [ {programme.qualification: "BEng"},
                         {programme.title: "Computing" } ] },
       $group: { _id: "$matric",
                 average: { $avg: modules.result } }
    } )

In the example, line 2 corresponds to the WHERE clause in an SQL statement, and line 4 covers the job of the GROUP BY clause and the aggregate function. In effect, this statement creates a new set of temporary documents with the fields _id and average corresponding to the students on the BEng Computing programme.

Indexing

Another concept imported into MpngoDB from traditional database models is that of the index which works in more or less the same way. An index is a copy of some subset of the data arranged into a structure that supports efficient searching with links back to the original documents. An index is automatically created for the _id of a collection just like the primary key of a relational table is automatically indexed. Database adminstrators can also define additional indexes on single or multiple fields in a collection. Indexes may be sparse in that they only include values for documents which contain the field in question. The implementation of indexes in MongoDB therefore maintains the support for flexible schemas.

One of the advantages of MongoDB indexes is that they are meant to reside in RAM which makes queries run extremely quickly. However, this is also a limitation since in a large database it may not be possible to hold all indexes in RAM. If the index size exceeds the available RAM, some idexes will be destroyed and will have to be rebuilt when needed. This leads to a significant performance reduction. Careful management of the number of indexes and the total index size is therefore a major concern for MongoDB adminsitrators.

A cautionary tale

If you just read the marketing material from a NoSQL database provider like MongoDB, or you only focus on the theoretical advantages of the model, it will seem as if relational databases are dead and the sooner everyone migrates to a NoSQL model the better. However, it is important not to jump to conclusions. Often what seems like a good idea from a theoretical point of view does not prove to be so good in practice. The blog post by Sara Mei from 2013 provides an excellent insight into the problems of using MongoDB in what seemed like the ideal development context and highlights some of the big advantages of the relational model. The main messages that you should come away with are that no technical decision is black and white, careful thought needs to go into the choice between technological options, and there is no substitute for experience.

Further reading

What is MongoDB?

The MongoDB Manual

Big data