Geographically Distributed Replica Set with MongoDB and Azure


The company I work for is going under an incredible growth, opening stores worldwide on a monthly basis, as well as increasing its IT workforce and infrastructure accordingly, to cope with such growth.

One important project recently implemented in the Enterprise Architecture of the company has been the Geographically Distributed Replica Set with MongoDB and Azure (ReadDB as the “simple” name)

Before this project, the main database used for this purpose was a Sql Server located in a data center in Amsterdam; for obvious reasons, stores in US and even more in Asia, experienced latency during the retrieval of data, which resulted in a non-optimal user experience.

Since the main need for such stores is to present data to the customer (orders, customer details, etc), a ReadOnly database seemed to be the best solution.

Based on the popularity that NoSql databases gained recently, after some research we opted to use MongoDB as the ReadDB, and its Replica Set feature as the way to distribute data geographically.

We also wanted to leverage the scalability and configuration features of Microsoft Azure as the underlying Cloud infrastructure.

So we started with a simple setup of the MongoDB Replica Set within one Azure Region: here we were able to successfully test the replication between the Primary and Secondary nodes of MongoDB.

The installation and configuration of MongoDB Replica Set is relatively easy as described in the official documentation.

In order to facilitate the operations for this test, the initial Azure Virtual Machine containing the MongoDB installation was prepared using SysPrep, and then reused within the same Region (the SysPrep image hardcode the Azure Region information, so it cannot be used for different Regions).

When it comes to geographically expand the Replica Set to other Azure Regions, there is quite some infrastructure to be created first on Azure:


  1. Create an Azure Virtual Network in the West EU Azure Region.
  2. Create an Azure Local Network in the West EU Azure Region, with same address space as the Virtual Network, and configure the VPN Gateway.
  3. Create a second Azure Virtual Network in the West US Azure Region.
  4. Create a second Azure Local Network in the West US Azure Region, with same address space as the Virtual Network, and configure the VPN Gateway.
  5. Create a third Azure Virtual Network in the East Asia Azure Region.
  6. Create a third Azure Local Network in the East Asia Azure Region, with same address space as the Virtual Network, and configure the VPN Gateway.
  7. Follow this tutorial to create a VNet-to-VNet connection between the West EU Virtual Network and the West US Virtual Network.
  8. Follow this tutorial to create another VNet-to-VNet connection between the East Asia Virtual Network and the West EU Virtual Network.
  9. Create an Azure Virtual Machine image template  to be reused for all MongoDB Nodes in the West EU Replica Set.
  10. Create a Primary Node Virtual Machine from the image template, and create an Azure Availability Set in the West EU Azure Region.
  11. Create an Arbiter Node Virtual Machine and add it to the set in the West EU Azure Region.
  12. Create a Secondary Node Virtual Machine and add it to the set in the West EU Azure Region.
  13. Configure and test the Replica Set.
  14. Create an Azure Virtual Machine image template to be reused for all MongoDB Nodes in the West US Replica Set.
  15. Create a second Secondary Node Virtual Machine in the West US Azure Region.
  16. Create an Azure Virtual Machine image template to be reused for all MongoDB Nodes in the East Asia Replica Set.
  17. Create a third Secondary Node Virtual Machine in the East Asia Azure Region.
  18. Configure and test the added Nodes in the Replica Set.
  19. Relax and enjoy the Replication.


After the entire Azure infrastructure is in place, we simply started to insert data in the Primary node, and saw the working replication synchronizing all the Secondary nodes:



We then performed some test (nearly 1 million rows inserted in the Primary node, and for each row a record Count was executed on each Secondary node), and we were very positively impressed with the performances of the MongoDB Replica Set: in all Secondary nodes in fact, the data was replicated nearly in real-time, with just (sometimes) a gap of milliseconds! 

So besides the known advantages of using a NoSql database for this ReadOnly solution, the MongoDB Replica Set proved to be a great choice for Geographically Distributed Data, allowing a smooth, easy to setup, near-real-time synchronization.

Comments

  1. Thanks for providing your information, it will be very useful to the users. keep share more information on Azure. Get more knowledge on Azure Online Course Bangalore

    ReplyDelete

Post a Comment

Popular posts from this blog

Cloud Computing using Microsoft Azure for Dummies

RabbitMQ on Kubernetes Container Cluster in Azure

PowerHell