NoSQL in Azure: Cosmos DB

NoSQL technologies have been around for a while now; in the past I wrote about both MongoDB and Graph Databases.

Recently Microsoft introduced the Cosmos DB offer within its Azure Cloud, where Cosmos DB is not a Database, but instead a set of Common Data Services for NoSQL DBs in the Cloud (such as scalability, distribution, partitioning, etc), described as a “globally distributed database service designed to enable you to elastically and independently scale throughput and storage across any number of geographical regions with a comprehensive SLA. You can develop document, key/value, or graph databases with Cosmos DB using a series of popular APIs and programming models”.

Azure Cosmos DB currently supports the following NoSQL DBs:
  • DocumentDB
  • MongoDB
  • Table API
  • Graph API

The price unit in Cosmos DB is called Request Unit, which is defined as: 

A Request Unit (RU) is the measure of throughput in Azure Cosmos DB. 1 RU corresponds to the throughput of the GET of a 1KB item”.

There is a RU Calculator to estimate the cost of your Cosmos DB.

More information on Cosmos DB can be found here.

A nice detail is that all experimenting with Cosmos DB can be done locally with the Azure Cosmos DB Emulator, without having to spend any money on Azure (at least during the initial development).

MongoDB

MongoDB is probably the most mature NoSQL DB in the market. It has been used for years now, and offers flexibility of data storing, great performances (especially on Big Data).

MongoDB is a Document database which stores data in flexible, JSON-like (BSON) documents, meaning fields can vary from document and data structure can be changed over time.

MongoDB is now part of the Cosmos DB offer, which makes it easier to integrate it in a Microsoft environment, especially on Azure.

An example of a MongoDB document (a Row in a RDBMS) is here:
{
  "id": "WakefieldFamily",
  "parents": [
      { "familyName": "Wakefield", "givenName": "Robin" },
      { "familyName": "Miller", "givenName": "Ben" }
  ],
  "children": [
      {
        "familyName": "Merriam",
        "givenName": "Jesse",
        "gender": "female", "grade": 1,
        "pets": [
            { "givenName": "Goofy" },
            { "givenName": "Shadow" }
        ]
      },
      {
        "familyName": "Miller",
         "givenName": "Lisa",
         "gender": "female",
         "grade": 8 }
  ],
  "address": { "state": "NY", "county": "Manhattan", "city": "NY" },
  "creationDate": 1431620462,
  "isRegistered": false
}

The main advantage of using MongoDB within Cosmos DB is the better integration in terms of development and deployment; in fact, using MongoDB within Cosmos DB removes the need for a VM to host the MongoDB service, and Microsoft tools such as the Cosmos DB Emulator will make it easier to build MongoDB solutions that run both locally and within Azure in a matter of clicks.

A basic tutorial on MongoDB in C# is here.

DocumentDB

DocumentDB is the Microsoft Azure offer in alternative to MongoDB; a lot has been written about the comparison with MongoDB, and so it’s out of scope here.

The main advantage of using DocumentDB instead of MongoDB is the better integration with Microsoft tools and required development libraries, even though now that MongoDB is supported by Cosmos DB this gap gets shorter and shorter.

For the RDBMS fans out there, it’s worth mentioning that DocumentDB introduced a feature called Document DB API SQL, which allows standard SQL syntax to be used to query the DocumentDB NoSQL database (btw, see the contradiction?).

Documents (Rows of data) in DocumentDB are like those in MongoDB, except the Microsoft product works with plain JSON instead of BSON.

A basic tutorial on DocumentDB in C# is here.

Graph API

A graph is a structure that's composed of vertices and edges. Both vertices and edges can have an arbitrary number of properties. Vertices denote discrete objects such as a person, a place, or an event. Edges denote relationships between vertices. For example, a person might know another person, be involved in an event, and recently been at a location. Properties express information about the vertices and edges.

Graph Databases have been around for some time now (especially since Social Media companies such as Facebook and Twitter became popular). A notable example of a Graph Database is Neo4J.

Azure Cosmos DB offers a Graph API as the Azure Graph DB offer; the languages used to query Azure Cosmos DB are the ApacheTinkerPop graph traversal language, Gremlin, or other TinkerPop-compatible graph systems like ApacheSpark GraphX.

Again, the tooling integration for the Microsoft product is much better than its graph DB competitors; when it comes to the graphical representation of the graph data, something nicely supported by Neo4J out of the box, Microsoft offers an open source client application called Graph Explorer, which allow easy querying and displaying of the data.

An example of a graphical representation of a Graph dataset.

A Gremlin query to create a Vertex as:
g.addV('person');

A Vertex can have properties such as:
g.addV('person').property('id', 'thomas').property('firstName', 'Thomas').property('age', 44);

You can add an Edge such as “knows”, for each friend of Thomas:
g.V('thomas').addE('knows').to(g.V('ben'));

You can get all Vertex and Edges by running this query:
g.V(); g.E();

Then you can run a traversal query to show all the Friends of Thomas:
g.V('thomas').outE('knows').inV().hasLabel('person');

You can go as far as retrieving in a simple query all the Friends of Friends of Thomas:
g.V('thomas').outE('knows').inV().hasLabel('person').outE('knows').inV().hasLabel('person');

CATCH: Edges are one way directional, for example, “PersonA knows PersonB”, does not mean that “PersonB knows PersonA”, unless you add a second Edge to represent this.

For the usual RDBMS fans, you can read this introduction, and keep in mind that there is a nice document on how to “translate” SQL queries into Gremlin queries.

And here is the full Gremlin syntax documentation.

Table API



Azure Cosmos DB provides the Table API for applications that need a key-value store with flexible schema, predictable performance, global distribution, and high throughput. The Table API provides the same functionality as Azure Table storage, but leverages the benefits of the Azure Cosmos DB engine.

You can continue to use Azure Table storage for tables with high storage and lower throughput requirements. Azure Cosmos DB will introduce support for storage-optimized tables in a future update, and existing and new Azure Table storage accounts will be upgraded to Azure Cosmos DB.

More information about Table API can be found here.

Comments

Post a Comment

Popular posts from this blog

Cloud Computing using Microsoft Azure for Dummies

RabbitMQ on Kubernetes Container Cluster in Azure

AD vs AAD (Active Directory vs Azure Active Directory)