NoSQL in Azure: Cosmos DB
NoSQL
technologies have been around for a while now; in the past I wrote about both MongoDB and Graph Databases.
Recently Microsoft
introduced the Cosmos DB offer within its Azure Cloud, where Cosmos DB is not a Database,
but instead a set of Common Data Services
for NoSQL
DBs in the Cloud (such as
scalability, distribution, partitioning, etc), described as a “globally
distributed database service designed to enable you to elastically and
independently scale throughput and storage across any number of geographical
regions with a comprehensive SLA. You can develop document, key/value, or graph databases with Cosmos DB using a series of popular
APIs and programming models”.
Azure Cosmos DB currently
supports the following NoSQL DBs:
- DocumentDB
- MongoDB
- Table API
- Graph API
The price unit in Cosmos
DB is called Request Unit, which
is defined as:
“A Request Unit (RU)
is the measure of throughput in Azure Cosmos DB. 1 RU corresponds to the
throughput of the GET of a 1KB item”.
There is a RU Calculator to estimate
the cost of your Cosmos DB.
More
information on Cosmos DB can be
found here.
A nice detail is that all experimenting with Cosmos DB can be done locally with the Azure Cosmos DB Emulator, without having to spend any money on Azure (at least during the initial development).
A nice detail is that all experimenting with Cosmos DB can be done locally with the Azure Cosmos DB Emulator, without having to spend any money on Azure (at least during the initial development).
MongoDB
MongoDB
is probably the most mature NoSQL DB
in the market. It has been used for years now, and offers flexibility of data
storing, great performances (especially on Big
Data).
MongoDB is a Document database which stores data in flexible, JSON-like (BSON) documents, meaning fields can
vary from document and data structure can be changed over time.
MongoDB is now
part of the Cosmos DB offer, which
makes it easier to integrate it in a Microsoft
environment, especially on Azure.
An example of a MongoDB
document (a Row in a RDBMS) is here:
{
"id": "WakefieldFamily",
"parents": [
{ "familyName": "Wakefield", "givenName": "Robin" },
{ "familyName": "Miller", "givenName": "Ben" }
],
"children": [
{
"familyName": "Merriam",
"givenName": "Jesse",
"gender": "female", "grade": 1,
"pets": [
{ "givenName": "Goofy" },
{ "givenName": "Shadow" }
]
},
{
"familyName": "Miller",
"givenName": "Lisa",
"gender": "female",
"grade": 8 }
],
"address": { "state": "NY", "county": "Manhattan", "city": "NY" },
"creationDate": 1431620462,
"isRegistered": false
}
The main advantage of using MongoDB within Cosmos DB
is the better integration in terms of development and deployment; in fact,
using MongoDB within Cosmos DB removes the need for a VM to host the MongoDB service, and Microsoft tools such as the Cosmos DB Emulator will make it easier to
build MongoDB solutions that run
both locally and within Azure in a
matter of clicks.
A basic tutorial on MongoDB
in C# is here.
DocumentDB
DocumentDB is the Microsoft Azure offer in alternative to
MongoDB; a lot has been written
about the comparison
with MongoDB, and so it’s out of scope here.
The main advantage of using DocumentDB instead of MongoDB
is the better integration with Microsoft
tools and required development libraries, even though now that MongoDB is supported by Cosmos DB this gap gets shorter and
shorter.
For the RDBMS fans
out there, it’s worth mentioning that DocumentDB
introduced a feature called Document DB API SQL, which allows standard
SQL syntax to be used to query the DocumentDB NoSQL database (btw, see the contradiction?).
Documents (Rows of data) in DocumentDB are like those in MongoDB,
except the Microsoft product works
with plain JSON instead of BSON.
A basic tutorial on DocumentDB
in C# is here.
Graph API
A graph is a
structure that's composed of vertices and edges. Both vertices and edges can have an arbitrary number of properties. Vertices
denote discrete objects such as a
person, a place, or an event. Edges
denote relationships between
vertices. For example, a person might know another person, be involved in an
event, and recently been at a location. Properties
express information about the vertices
and edges.
Graph Databases have been around for some time now
(especially since Social Media
companies such as Facebook and Twitter became popular). A notable
example of a Graph Database is Neo4J.
Azure Cosmos DB offers
a Graph API as the Azure Graph DB offer; the languages used to query Azure Cosmos DB are the graph traversal language, ,
or other TinkerPop-compatible graph systems like .
Again, the tooling integration for the Microsoft product is much better than its graph DB competitors; when
it comes to the graphical representation of the graph data, something
nicely supported by Neo4J out of the
box, Microsoft offers an open source
client application called Graph Explorer, which allow easy
querying and displaying of the data.
An example of a graphical representation of a Graph dataset.
A Gremlin query
to create a Vertex as:
g.addV('person');
A Vertex can have
properties such as:
g.addV('person').property('id',
'thomas').property('firstName', 'Thomas').property('age', 44);
You can add an Edge
such as “knows”, for each friend of Thomas:
g.V('thomas').addE('knows').to(g.V('ben'));
You can get all Vertex
and Edges by running this query:
g.V(); g.E();
Then you can run a traversal
query to show all the Friends of
Thomas:
g.V('thomas').outE('knows').inV().hasLabel('person');
You can go as far as retrieving in a simple query all the Friends of Friends of Thomas:
g.V('thomas').outE('knows').inV().hasLabel('person').outE('knows').inV().hasLabel('person');
CATCH: Edges are
one way directional, for example, “PersonA knows PersonB”, does not mean that “PersonB
knows PersonA”, unless you add a second Edge to represent this.
For the usual RDBMS
fans, you can read this introduction,
and keep in mind that there is a nice document
on how to “translate” SQL queries
into Gremlin queries.
And here
is the full Gremlin syntax
documentation.
Table API
Azure Cosmos DB provides the Table API for applications that need a key-value store with flexible schema, predictable performance, global distribution, and high throughput. The Table API provides the same functionality as Azure Table storage, but leverages the benefits of the Azure Cosmos DB engine.
You can continue to use Azure Table storage for tables with high storage and lower throughput requirements. Azure Cosmos DB will introduce support for storage-optimized tables in a future update, and existing and new Azure Table storage accounts will be upgraded to Azure Cosmos DB.
More information about Table API can be found here.
Informative blog. Thank you for sharing.
ReplyDeleteMicrosoft Azure Online Training