Geographically Distributed Replica Set with MongoDB and Azure
The company I work for is
going under an incredible growth, opening stores worldwide on a monthly basis,
as well as increasing its IT workforce and infrastructure accordingly, to cope
with such growth.
One important project recently implemented in the Enterprise Architecture of the company
has been the Geographically Distributed Replica Set with MongoDB and Azure (ReadDB as the “simple” name)
Before this project, the main database used for this purpose
was a Sql Server located in a data center in Amsterdam; for obvious reasons, stores in US and even more
in Asia, experienced latency during the retrieval of data, which resulted in a
non-optimal user experience.
Since the main need for such stores is to present data to
the customer (orders, customer details, etc), a ReadOnly database seemed
to be the best solution.
Based on the popularity that NoSql databases gained recently, after some research we opted to
use MongoDB
as the ReadDB, and its Replica Set
feature as the way to distribute data geographically.
We also wanted to leverage the scalability and configuration
features of Microsoft Azure as the
underlying Cloud infrastructure.
So we started with a simple setup of the MongoDB Replica Set within one Azure Region: here we were able to successfully
test the replication between the Primary
and Secondary nodes of MongoDB.
The installation and configuration of MongoDB Replica Set is relatively easy as described in the official
documentation.
In order to facilitate the operations for this test, the
initial Azure Virtual Machine
containing the MongoDB installation
was prepared using SysPrep, and then reused within the
same Region (the SysPrep image hardcode the Azure Region information, so it cannot
be used for different Regions).
When it comes to geographically expand the Replica Set to other Azure Regions, there is quite some
infrastructure to be created first on Azure:
After the entire Azure infrastructure is in place, we simply started to insert data in the Primary node, and saw the working replication synchronizing all the Secondary nodes:
- Create an Azure Virtual Network in the West EU Azure Region.
- Create an Azure Local Network in the West EU Azure Region, with same address space as the Virtual Network, and configure the VPN Gateway.
- Create a second Azure Virtual Network in the West US Azure Region.
- Create a second Azure Local Network in the West US Azure Region, with same address space as the Virtual Network, and configure the VPN Gateway.
- Create a third Azure Virtual Network in the East Asia Azure Region.
- Create a third Azure Local Network in the East Asia Azure Region, with same address space as the Virtual Network, and configure the VPN Gateway.
- Follow this tutorial to create a VNet-to-VNet connection between the West EU Virtual Network and the West US Virtual Network.
- Follow this tutorial to create another VNet-to-VNet connection between the East Asia Virtual Network and the West EU Virtual Network.
- Create an Azure Virtual Machine image template to be reused for all MongoDB Nodes in the West EU Replica Set.
- Create a Primary Node Virtual Machine from the image template, and create an Azure Availability Set in the West EU Azure Region.
- Create an Arbiter Node Virtual Machine and add it to the set in the West EU Azure Region.
- Create a Secondary Node Virtual Machine and add it to the set in the West EU Azure Region.
- Configure and test the Replica Set.
- Create an Azure Virtual Machine image template to be reused for all MongoDB Nodes in the West US Replica Set.
- Create a second Secondary Node Virtual Machine in the West US Azure Region.
- Create an Azure Virtual Machine image template to be reused for all MongoDB Nodes in the East Asia Replica Set.
- Create a third Secondary Node Virtual Machine in the East Asia Azure Region.
- Configure and test the added Nodes in the Replica Set.
- Relax and enjoy the Replication.
After the entire Azure infrastructure is in place, we simply started to insert data in the Primary node, and saw the working replication synchronizing all the Secondary nodes:
We then performed some test (nearly 1 million rows inserted
in the Primary node, and for each
row a record Count was executed on
each Secondary node), and we were
very positively impressed with the performances of the MongoDB Replica Set: in all Secondary
nodes in fact, the data was replicated nearly in real-time, with just
(sometimes) a gap of milliseconds!
So besides the known advantages of
using a NoSql database for this ReadOnly
solution, the MongoDB Replica Set
proved to be a great choice for Geographically
Distributed Data, allowing a smooth, easy to setup, near-real-time
synchronization.
Thanks for providing your information, it will be very useful to the users. keep share more information on Azure. Get more knowledge on Azure Online Course Bangalore
ReplyDelete