Hire the author: Olumuyiwa A

Check out the MongoDB Replica Set Image and GitHub repo for this guide.

TL;DR

You can achieve great results in production by making sure you build your application well and model your data correctly. Moreover, this guide will help you create a well-tuned MongoDB replica set, designed for up to 10,000 customers in a production environment.

Introduction

In the modern world of rapidly evolving applications with dynamic, data-intensive, real-time requirements, MongoDB has become the default choice for web developers. It is particularly popular in the JavaScript/Node.js ecosystem. Two compelling reasons that make MongoDB stand out can attribute to this popularity:

  1. Massive Scalability with Document-Based NoSQL: MongoDB’s inherent design is a document-based NoSQL database. It offers developers a big advantage in terms of scalability. By storing data in flexible, schema-less documents, MongoDB enables seamless horizontal scaling. This allows applications to process large amounts of data and adapt to growing workloads. This scalability feature is particularly beneficial for modern applications that demand high performance and responsiveness. MongoDB becomes a preferred choice for managing complex and evolving data structures.
  2. JSON Everywhere: MongoDB seamlessly integrates with the ubiquitous JSON (JavaScript Object Notation) data format. JSON is widely used for data interchange across the web. This integration ensures a natural workflow for developers. It is particularly beneficial for those working within the JavaScript/Node.js ecosystem. With MongoDB, developers can leverage the power of JSON throughout the entire application stack. This includes the frontend, backend, and communication between different services. The consistent use of JSON simplifies data manipulation with MongoDB. It also facilitates code reusability and enhances developer productivity. This makes MongoDB an ideal fit for JavaScript-centric development environments.

By combining the benefits of massive scalability through its document-based NoSQL approach and its seamless integration with JSON, MongoDB empowers web developers to efficiently tackle the challenges posed by modern, data-intensive applications. It has become a go-to database solution for JavaScript/Node.js developers, enabling them to build robust, scalable, and flexible applications that meet the demands of today’s dynamic digital landscape.

Justification

Running MongoDB locally or using a managed service like MongoDB Atlas is a simple process. But, managing a self-managed database requires a more nuanced approach. This guide aims to provide the necessary information and resources. Its goal is to help you successfully self-manage MongoDB in a production environment.

There are numerous resources available on using MongoDB in a production environment. However, much of the information available online falls into one of the following categories:

  • outdated information
  • omitting key aspects of proper data modeling and query structuring
  • failing to lead to highly available (HA) and fault-tolerant (FT) deployments

The intention of this guide is to provide comprehensive and up-to-date information on self-managing MongoDB in production. The focus is on securing HA and FT deployments emphasizing the importance of proper data modeling and query structure.

You will learn how to deploy production-grade HA/FT/SH MongoDB replica sets. A replica set will support a web application that is scalable enough for your startup/SMB’s first 10,000 customers.

Assumptions

  • With your prior experience in web application development using Node.js, you have a solid grasp of both Node.js and MongoDB fundamentals.
  • When it comes to interacting with a MongoDB database, your preferred choice is Mongoose as the ORM.
  • Your experience extends to AWS, where you are well-versed in the basics of utilizing its services.
  • In addition to that, you recognize the valuable benefits that DevOps brings to the table in application development and deployment.

Glossary

This guide provides a working man’s definition of terms used.

  • 10k:  A magical number indicating that a startup has enough revenue to have a chance at long-term success. 
  • Ansible: A popular configuration management (CM) tool used to ensure the state of servers matches the intent expressed in the configuration files. CM is a pillar of DevOps.
  • ASG: Auto Scaling Group. An AWS resource that allows servers to be automatically provisioned or de-provisioned according to defined rules. ASG is the foundation for HA/FT in AWS.
  • AZ: Availability Zone. One or more data centers in close geographic proximity connected by high-speed data link
  • DIRTy: Data Intensive Real Time apps.
  • ELB: Elastic Load Balancers. These are AWS resources that provide highly available and infinitely scalable load-balancing services. The two primary subtypes are ALB (Application Load Balancer) and NLB (Network Load Balancer).
  • FT: Fault Tolerant. A system that can continue operating in the face of one or more component failures. It goes without saying that in the context of AWS, FT is achieved by distributing servers across multiple Availability Zones. An FT system is also HA by extension.
  • HA: Highly Available. This means that services are designed in such a way that there is no single point of failure and downtime is minimized. It goes without saying that in the context of AWS, HA is achieved by load balancing against multiple servers, usually in different availability zones. A system can be HA without being FT.
  • LT: Launch Template. An AWS EC2 instance provisioning template used by ASGs to launch EC2 instances according to the parameters defined in the template.
  • Production: This is the set of systems real users interact with, in other words, where you bet the company’s future. Companies live and die by user experience in production.
  • Region: Two or more availability zones (AZs) are logically connected but physically separated.
  • Replica set: The minimum unit of deployment for a production-grade MongoDB database. It consists of three or more nodes that have a single write (primary) node and multiple read (secondary) nodes holding identical copies of the saved data.  
  • SH: Self Healing. This means that systems are designed and deployed in such a way that failures can be automatically repaired without human intervention. By implication, an SH system is also FT/HA.
  • VPC: Virtual Private Cloud. A user-defined, logically isolated segment of a cloud within which user resources can be deployed

Step-by-step Procedure

This part of the guide has two subsections: one for application development and another for database administration. Our goal: Deploy a production application against a MongoDB replica set on AWS and support 10,000 recurring paying customers.

Section 1: Application Development

Step 1: Setting up the connections

Everything starts here. We must set up a reliable database connection as early as possible in the app lifecycle.

Pro tip: As the application will run in multiple environments, parameterize the connection URI by importing it as an environment variable.

When considering connections, there are 3 sets of environments we should be aware of and their peculiarities.

  • Local/Development: In this environment,  MongoDB runs on your local box alongside the web application. What this means is that connection is instantaneous. Everything should work perfectly, which is necessary for rapid application development. During this phase, the focus should rightly be on feature development. However, do not expect conditions to be the same outside of development.
  • Staging/Test: In this environment, MongoDB runs on a free managed service, typically MongoDB Atlas, while the web application runs locally or in the cloud. A key takeaway at this stage is that connections over the network are not as reliable as local connections. The implication for application development is that the connection process to MongoDB should be more robust than when we use it on the local box.
  • Production:  A typical production target will be a public cloud. One of the key ideas we should keep in mind is that cloud-based services are subject to failure without warning. This means that during application development, we need to ensure sensible defaults for retries, timeouts, and failed query handling. These defaults should be built into the codebase.

Below is a gist that demonstrates a reliable and robust pattern for standing up the web app backend:


'use strict';
if(process.env.NODE_ENV != 'production') {
require('dotenv').config();
}
/**
* Module Dependencies
*/
const
Http = require('http'),
Mongoose = require('mongoose'),
App = require('./app');
/**
* Module variables
*/
const
{dBURL} = process.env,
PORT = process.env.PORT || 3030;
/**
* Create Server Instance, pass App as the Listener
*/
const Server = Http.createServer(App);
/**
* Config Mongoose
*/
Mongoose.set('strictQuery', true);
/**
* Connect to MongoDB Database and initiate Server on connection success
*/
let attemptsCounter = 0;
const connectionOptions = {
autoIndex: false,
maxPoolSize: 50,
minPoolSize: 5
};
async function main() {
if(attemptsCounter == 5) {
return process.exit(1);
}
try {
await Mongoose.connect(dBURL, connectionOptions);
console.log(`Successfully connected to ${dBURL}`);
return Server.listen(PORT, () => console.log(`server UP on port: ${Server.address().port}`));
}
catch (err) {
console.error('There was a db connection error');
console.error(err.message);
return setTimeout(main, 1000);
}
}
main();
process.on('SIGINT', async () => {
await Mongoose.connection.close();
console.error('dBase connection closed due to app termination');
return process.exit(0);
});

view raw

connection.js

hosted with ❤ by GitHub

Let’s walk through the code:

  • In the first part, we make sure the environment variables are populated. This allows us to parameterize the connection URI for different environments.
  • Next, we have dependencies. While Node.js has native tools for request handling, in practice Express.js or some other framework is used. Here, we will use an imported Express.js app.
  • The next interesting section shows a workaround we need when using Mongoose. We set strictQuery: false because it removes preparatory warnings for Mongoose 7.
  • Next is connectionOptions. In this section, we define some sane defaults for Mongoose, that allow it to function efficiently across environments. autoIndex: false is required for staging/production environments, since Mongoose will automatically build indexes based on the defined schemas. This is a performance penalty we should avoid. There is also no downside to setting this flag in the development environment. The maxPoolSize/minPoolSize flags are also required because MongoDB only allows one operation per socket. We set sane defaults that work across environments. Our aim is to have just enough open sockets for app responsiveness without running into connection issues with MongoDB in production. This allows us to flexibly increase the maxPoolSize value as our customer base and app usage scales.
  • Next, we define a main function whose purpose is to attempt several connections to MongoDB. When an attempt succeeds, then the app server is spun up. If an attempt fails, we get a useful error message, and after a set number of attempts, the server fails.
  • Finally, we gracefully shut things down if the shutdown command is issued (typically ctrl + c). This is a good way of cleaning things up when the app server is running locally.

You’ll notice that we’ve discussed everything except the actual connection URI (called dBURL). We’ll dig into this later, but for now, in the Local/Dev environment, this variable typically has a value like mongodb://localhost/somedbname.

Step 2: Addressing application type concerns

Determining the type of application you are developing is crucial. It helps in identifying whether the app is read-heavy or write-heavy. This affects the format of the MongoDB connection URI. It also influences the decision between a monolith or partitioning into two or more microservices. It’s essential to address these considerations before progressing beyond the initial app foundations.

A read-heavy app is one where the data flow will have fewer writes and more simple fetches of data. Examples of read-heavy apps are e-commerce apps and social media feeds.

A write-heavy app has a more significant number of mutations to data. Examples of write-heavy apps are live tracking systems or card processing systems.

MongoDB behavior with different app types

  • Read-heavy: By default, MongoDB replica sets read from the primary node. For a read-heavy system, this approach can lead to performance degradation. The primary node becomes overwhelmed by the volume of reads and writes. The recommended solution is to set the read preference flag on the connection URI. This configuration enforces directing reads toward the secondary nodes. It effectively improves performance and alleviates the load on the primary node.
  • Write-heavy: All writes in MongoDB are processed by the primary, there’s no way around this. However, we have options to tweak system performance from an app perspective. If the writes are not time sensitive e.g. IoT or analytics systems, then we can insert a queue to hold data and trickle it into the primary at a measured rate, which maintains system responsiveness (this shouldn’t be difficult to implement for the audience, however, if you would like assistance then the author and the LD Talent team are happy to lend a hand). On the other hand, if writes are time sensitive e.g. e-commerce or credit card processing systems, then the solution is to decompose the app into two or more monoliths. Each monolith will have its own MongoDB replica set. This will reduce the overall rate of writes and ensure that the system remains responsive for the first 10,000 customers.

This example illustrates a MongoDB replica set URI format. It fulfills the requirements of both read-heavy and write-heavy applications:

mongodb://user:pwd@server0-dns,server1-dns,server2-dns/dbname?replicaSet=replicasetName&retryWrites=true&retryReads=true&w=majority&readPreference=secondaryPreferred

Let’s break this down:

  • user:pwd are credentials for the app connecting against the named database /dbname.
  • server0-dns,server1-dns,server2-dns are the three members of the replica set identified via DNS hostnames (we will discuss more on this later); MongoDB uses this list to discover the members of a replica set.
  • replicaSet=replicasetName is a flag that identifies the replica set name to be connected to.
  • retryWrites=true&retryReads=true enforces one retry of read/write; this is a basic hedge against transient network partitions in MongoDB.
  • w=majority specifies the write concern i.e. enforces writing of the data to a majority of the replica set nodes before an ack is returned from MongoDB to the app.
  • readPreference=secondaryPreferred forces MongoDB to read from secondary nodes if available rather than the primary which is the default behavior. This flag is one of the ways we can tune MongoDB for high throughput operations supporting read/write heavy apps.

By combining the flags/parameters in the MongoDB connection URI and the Mongoose connection options, we obtain a robust set of tuned parameters for MongoDB. These are suitable for most environments and use cases. With these options, you can fine-tune MongoDB to meet your specific needs and achieve optimal performance.

With the replica set and connection options in place, you can ensure high availability, fault tolerance, and optimized performance for your MongoDB deployment.

This sets the stage for a reliable and resilient application that can handle a wide range of use cases and scale with your business needs.

The next stage is proper Data Modeling.

Step 3: Designing the data model

The first step in performance optimization is to understand your application’s query patterns so that you design your data model and select the appropriate indexes accordingly.

MongoDB team

It is essential to store related data together, whenever feasible, by utilizing embedded documents. Mastering this fundamental principle plays a pivotal role. It helps in assessing the performance of both applications and databases in a production environment. A flawed data model can lead to performance issues within the application, even if the database is finely tuned.

Let’s consider a photo gallery app as an example to provide a context for the discussion. This app enables users to upload any number of photos and save them to AWS S3. By default, the user dashboard displays a gallery with thumbnails of the 10 most recent photos. Users can access pagination/infinite scrolling to pull additional photos from the database. Users can view a full-size version of photos by clicking on the thumbnails and can delete or replace photos.

Basic data model

Keeping the app requirements in mind, an intuitive data model that seems to meet the app constraints would look like this:


'use strict';
/**
* Module dependencies
*/
const
Mongoose = require('mongoose'),
argon2 = require('argon2');
/**
* User Schema
*/
const UserSchema = new Mongoose.Schema({
username: {
type: String,
required: true,
unique: true
},
email: {
type: String,
required: true,
unique: true
},
password: {
type: String,
required: true
},
hasPhotos: {
type: Boolean,
required: true,
default: false
},
photos: [{
filename: {
type: String,
default: ''
},
createdAt: {
type: Date,
default: Date.now,
required: true
},
fullsize: {
type: String
},
thumbnail: {
type: String
}
}]
});
/**
* User Schema Methods
*/
UserSchema.methods.generateHash = async function (password) {
return await argon2.hash(password);
};
UserSchema.methods.validatePassword = async function (candidate) {
return await argon2.verify(this.password, candidate);
};
/**
* Create Schema Secondary Indexes
*/
UserSchema.index({"photos.filename": 1});
/**
* Compile Schema to Model
*/
const UserModel = Mongoose.model('User', UserSchema);
/**
* Export UserModel
*/
module.exports = UserModel;

view raw

naive.js

hosted with ❤ by GitHub

Basic model review

At first glance, this data model may appear to be suitable for the requirements of the photo gallery app. The username and email fields are unique, this not only avoids duplication but also creates indexes on these fields improving read performance. Additionally, a  photo is an array of objects. A secondary index has been made against the filename property of objects within the photo array to improve searching. What could be wrong?

The primary issue is that the photos array is unbounded. This means that there is potential in this area to exceed the size limit of MongoDB. 

Model refactoring

Let’s improve the model by refactoring it. Move the photos into a separate collection, (let’s call this Albums). Also, use references (a type of JOIN) as needed to populate the photos field on the Users model. Below, you can find the improved models:

Users:


'use strict';
/**
* Module dependencies
*/
const
Mongoose = require('mongoose'),
argon2 = require('argon2');
/**
* User Schema
*/
const UserSchema = new Mongoose.Schema({
username: {
type: String,
required: true,
unique: true
},
email: {
type: String,
required: true,
unique: true
},
password: {
type: String,
required: true
},
hasPhotos: {
type: Boolean,
required: true,
default: false
},
album: [{
type: Schema.Types.ObjectId,
ref: 'Album'
}]
});
/**
* User Schema Methods
*/
UserSchema.methods.generateHash = async function (password) {
return await argon2.hash(password);
};
UserSchema.methods.validatePassword = async function (candidate) {
return await argon2.verify(this.password, candidate);
};
/**
* Compile Schema to Model
*/
const UserModel = Mongoose.model('User', UserSchema);
/**
* Export UserModel
*/
module.exports = UserModel;

view raw

user-refs.js

hosted with ❤ by GitHub

Albums:


'use strict';
/**
* Module dependencies
*/
const Mongoose = require('mongoose');
/**
* Albums Schema
*/
const AlbumSchema = new Mongoose.Schema({
owner: {
type: Schema.Types.ObjectId,
ref: 'User'
},
photos: [{
filename: {
type: String,
default: ''
},
createdAt: {
type: Date,
default: Date.now,
required: true
},
fullsize: {
type: String
},
thumbnail: {
type: String
}
}]
});
/**
* Create Schema Secondary Indexes
*/
AlbumSchema.index({"photos.filename": 1 });
/**
* Compile Schema to Model
*/
const AlbumModel = Mongoose.model('Albums', AlbumSchema);
/**
* Export AlbumModel
*/
module.exports = AlbumModel;

view raw

album-refs.js

hosted with ❤ by GitHub

Model refactoring review

The User model references the Album model (Mongoose translates the models into Users and Albums collections respectively on MongoDB) by tracking the _id field of each album document in the photos field. Correspondingly, the Album model references the User model by tracking the _id field of the user document in the owner field. 

You might think that the job is complete, right? But, we still have some work left to do. The problem of unbounded document size persists because it has merely shifted from User to Album. To make matters worse, MongoDB now has to make two queries to return the user document containing album data. From a performance point of view, this extra query is suboptimal.

Optimal data model

The following two points lead to the solution:

  • By default, the app shows the 10 most recent photos
  • MongoDB’s preference for many small documents vs a few large ones

These points lead us to an optimal set of data models:

Users:


'use strict';
/**
* Module dependencies
*/
const
Mongoose = require('mongoose'),
argon2 = require('argon2');
/**
* User Schema
*/
const UserSchema = new Mongoose.Schema({
username: {
type: String,
required: true,
unique: true
},
email: {
type: String,
required: true,
unique: true
},
password: {
type: String,
required: true
},
hasPhotos: {
type: Boolean,
required: true,
default: false
},
photos: [{
filename: {
type: String,
default: ''
},
createdAt: {
type: Date,
default: Date.now,
required: true
},
fullsize: {
type: String
},
thumbnail: {
type: String
}
}]
});
/**
* User Schema Methods
*/
UserSchema.methods.generateHash = async function (password) {
return await argon2.hash(password);
};
UserSchema.methods.validatePassword = async function (candidate) {
return await argon2.verify(this.password, candidate);
};
/**
* Compile Schema to Model
*/
const UserModel = Mongoose.model('User', UserSchema);
/**
* Export UserModel
*/
module.exports = UserModel;

view raw

user-optimal.js

hosted with ❤ by GitHub

Albums:


'use strict';
/**
* Module dependencies
*/
const Mongoose = require('mongoose');
/**
* Albums Schema
*/
const AlbumSchema = new Mongoose.Schema({
owner: {
type: String,
required: true
},
photos: {
filename: {
type: String,
default: ''
},
createdAt: {
type: Date,
default: Date.now,
required: true
},
fullsize: {
type: String
},
thumbnail: {
type: String
}
}
});
/**
* Create Schema Secondary Indexes
*/
AlbumSchema.index({owner: 1, "photos.filename": 1});
/**
* Compile Schema to Model
*/
const AlbumModel = Mongoose.model('Albums', AlbumSchema);
/**
* Export AlbumModel
*/
module.exports = AlbumModel;

Optimal data model review

While it may not be immediately apparent, a closer examination reveals that the optimized data models have significantly improved performance.

The User model retains the photos field, allowing a single GET request to return both the user information and their 10 most recent photos. As a result, this satisfies the basic constraints of the app and follows a fundamental design pattern in MongoDB.

To improve the efficiency of the Album model, we have replaced the array of photos with a single photo document. A compound index on both the photo owner and filename fields, resulting in faster and easier retrieval of photos specific to a user. These changes are in line with best practices in database design. They represent a significant improvement in the app’s performance and user experience.

To retrieve older photos in our app, users can initiate a single GET request to the album collection. Although it may take a little longer, users anticipate that older photos will become available after a short delay. The slight delay is unlikely to affect their experience. 

By implementing this approach, we are able to efficiently manage and retrieve a large volume of photos. We can do so while still maintaining a seamless user experience.

This approach results in a compact working set (indexes and the most frequently accessed data). This fits inside RAM ensuring that database performance is speedy and reliable.

Data modeling conclusion

The preceding covers the basics for performant data modeling with Mongoose/MongoDB. Here are the example queries that satisfy the different types of models:


const UserModel = require('<path to>/models/users');
//query for https://gist.github.com/oakinogundeji/dd36aa05972fb5d59b3dc3bd5834891d
const USER = await UserModel.findOne({username: 'username'}, {photos: 1, _id: 0}).sort({"photos.createdAt": 'desc'}).limit(10);
// the above will search the Users collection for the matching document, return only the photos array while suppressing the _id field, sort by the most recently created photos and limit data to the 10 most recent matches
// the query to retriev the next n photos in the proper order is left to the reader as an exercise
//query for https://gist.github.com/oakinogundeji/bc1259a30233db082c5f91b1f582f8ff
const USER = await UserModel.findOne({username: 'username'}).populate('album', 'photos', {sort: {"photos.createdAt": 'desc'}}).limit(10);
// the above will search the Users collection for the matching document, populate the 'album' field using only the 'photos' data, sort by the most recently created photos and limit data to the 10 most recent matches
// the query to retriev the next n photos in the proper order is left to the reader as an exercise
//query for https://gist.github.com/oakinogundeji/448b09c5aa810e545d801a17947a8667
const USER = await UserModel.findOne({username: 'username'});
// the above will search the Users collection for the matching document, by design the 'photos' array will hold a max of 10 most recent photos

view raw

queries.js

hosted with ❤ by GitHub

With the preceding information, you will have a solid foundation. This foundation will enable you to build a responsive, scalable app deployed against MongoDB in production.

Pro tips:
1. Read-heavy app tips: Caching, setting read preference to secondaries, and increasing the number of secondaries will boost the performance of your read-heavy app against MongoDB.
2. Write-heavy app tips: Using message queues for write logs and decomposing your app into microservices will help boost the performance of your write-heavy app against MongoDB.
3. Use a Process manager for your Node.js app e.g. PM2, this is especially useful when running your app on EC2. Note: For a containerized app, a process manager is not required.

Section 2: Database Administration

An efficient data model that supports well-architected apps, provides the foundation for high-performing MongoDB deployments.

In production, MongoDB is never deployed as a single-node service. The 2 recommended architectures are:

  • replica sets
  • sharded clusters

In order to support our initial customer base of 10,000 users, I strongly recommend prioritizing replica sets over sharded clusters. By focusing on replica sets, we can simplify our architecture. This ensures that we are able to provide a reliable, high-performance experience to our users. 

As explained by MongoDB’s Chief Solutions Architect, opting for a single replica set often represents the optimal choice for production environments.

Let’s learn about how to make this happen step by step.

Step 1: Reviewing our needs

To support our app’s development and deployment, DevOps will be a critical component of this section. We will focus on achieving the following goals:

  • Deploy replica set members as AWS EC2 instances on Ubuntu 22.04LTS AMIs.
  • Use a permanent DNS hostname to identify replica set members. This eliminates the need for constantly updating the connection URI when server instances are changed, especially in cloud environments.
  • Ensure that EC2 instances are auto-configured with Mongodb and can automatically join the replica set if a failure occurs.
  • Enforce regular data backups.
  • When an instance is re-provisioned, it should automatically pull existing data to reduce replication traffic and synchronize with the existing members of the replica set.
  • Receive notifications for new nodes coming online, backups, and restores.
  • Automate everything such that after the initial setup human intervention is minimal.
  • Ensure that best practices are adhered to by design.
  • Additionally, we need to make sure we can version control deployments to benefit from the immutable infrastructure.

Step 2: Understanding the tools of the trade

To meet the requirements outlined above, we will leverage the following tools:

  • EC2 Launch Templates
  • EC2 ASGs
  • AWS Security Groups
  • AWS NLBs
  • IAM Roles
  • AWS S3
  • AWS SES
  • Bash scripts
  • Ansible
  • A Node.js notifier script

The goal of using these tools is to design and deploy a HA/FT SH MongoDB replica set. We will achieve this by leveraging AWS ASGs and AWS NLBs to achieve HA/FT. Ansible, EC2 LTs, AWS S3, and bash scripts will be used to achieve SH. AWS SES and the node.js script will be used for notifications.

Let’s dive in with the configuration of the tools.

Step 3: Setting up an S3 and a custom IAM Role

To lay the foundations, we first create an S3 bucket to save backups. Following that, the next step is to create a custom IAM Role. This role must provide SES access to send emails and S3 access to read/put objects into the backup bucket. You can perform these tasks in the AWS Console.

Step 4: Identifying requirements for the replica set

After the setup is complete, this Phase begins by identifying requirements for the replica set. Specifically, we want to set up a 3-member replica set distributed across 3 AWS AZs. The roles will be primary, secondary, and hidden.

  • The primary accepts writes to the replica set.
  • The purpose of the secondary is to serve as a read target to support high read/write throughput.
  • Finally, the hidden node is to be a dedicated backup node.

To support this architecture, we will need the following:

  • 2 Ansible playbooks, one to configure the primary/secondary. The other is to configure the hidden node. The playbooks will contain all the configuration commands to set up a production-grade MongoDB instance using sane defaults and embedded best practices.
  • 2 Bash scripts, one for backup, and the other for restoration. The Ansible playbooks will auto-invoke the restore script to ensure that a fresh instance is brought up to date by pulling backed-up data from S3. On the hidden node, the Ansible playbook will create a cron job that will run the backup script to intermittently back up the database to S3.
  • Config files, for tuning the replica set node for production. The files will include a production version of the MongoDB config file i.e. mongodb.conf, an SSH Keyfile  to enforce secure intra-replica set member communication, and a systemd service file  to disable transparent hugepages.
  • 2 AWS EC2 user-data scripts, which will be used by the EC2 LT to install dependencies and auto-execute the Ansible playbooks for the primary/secondary nodes and the hidden node.

Step 5: Creating AWS EC2 NLBs and TGs

With the above step successfully completed, the next step is to create three sets of AWS EC2 NLBs and TGs.

In this deployment, we have opted to use NLBs instead of ALBs because MongoDB traffic operates on port 27017. ALBs only support HTTP/HTTPS traffic on ports 80/443. By using NLBs, we can balance and route network traffic to our MongoDB instances more efficiently.

This ensures reliable and efficient communication between our application and its database. This approach helps optimize performance and scalability for your MongoDB deployment. The result is an application that can handle larger volumes of traffic and data with ease.

Follow these steps:

  • First, create 3 target groups. Each TG must be configured as an Instance type, the protocol is TCP and the port is 27017, use the default VPC. For Health Checks, select Advanced Health Check settings, under the port option, select override and input 27017, set the unhealthy threshold to 5, and set both timeout and interval to 60. Create the TGs. The reason for these settings is that MongoDB traffic is on 27017, if the TG pings the instance on any other port it won’t get an ack and will flag the instance as unhealthy. Since it takes a few minutes to configure MongoDB we provide suitable parameters for the health checks to give instances enough time to be healthy.
  • After creating the TGs, the next step is to create the NLBs. Each NLB must be set as internet-facing (for this exercise) and deployed to the default VPC. I recommend that the AWS region used has at least 3 AZs. Each NLB will be assigned to a single AZ. Configure the NLB Listener to use TCP 27017 and select one of the previously created TGs to forward traffic to. Finally, create a security group with 2 rules, the first allowing SSH and the second Custom TCP with port set to 27017. Set the source to 0.0.0.0/0.

Pro tip: Use the same naming scheme for the TG and NLB eg. mongo-pri-tg/mongo-pri-nlb, mongo-sec-tg/mongo-sec-nlb, etc

Step 6: Creating EC2 LTs

In the previous step, we ensured that the NLB and TG pairs are properly configured so that they are ready to be bound to the ASG. Additionally, we confirmed that the SG for the instances was set up properly. The major task in this step is to create 2 EC2 LTs (AWS is deprecating Launch Configs in favor of LTs). The procedure for the LTs follows:

  • Provide a descriptive name (skip the version).
  • Select auto-scaling guidance.
  • In the application and OS images section, select quickstart and choose Ubuntu (ensure the selected architecture is 64bit/x86).
  • Choose t2.micro as the instance type.
  • Select the key pair.
  • Under network settings select the SG created in Step 5.
  • Skip down to Advanced Details. In the Advanced Details section, select the IAM instance profile created in Step 4.
  • Still, within the Advanced Details section, drop down to User Data. paste the appropriate user data for the LT either hidden or primary/secondary user data.
  • Create the LT.

Step 7: Creating ASGs

To create the replica set nodes, we will define three ASGs. One for the primary node, one for the secondary node, and one for the hidden node. By using ASGs, we can automatically scale our infrastructure to meet changing demands. Also, this approach ensures that we have enough capacity to maintain high availability and performance.

Use descriptive names for the ASGs, such as “mongo-pri-asg” for the primary node ASG. Using clear and consistent naming conventions simplifies managing and troubleshooting of the infrastructure over time.

  • Select the appropriate LT from the drop-down (remember there should be 2 usable LTs, one for both the primary/secondary nodes and the other for the hidden node), and click next.
  • Use the default VPC and choose one of the available AZs, and click next.
  • Select attach to an existing load balancer and choose the appropriate TG from the drop-down.
  • Under health checks ensure Turn on Elastic Load Balancing health checks is selected, and click next.
  • Ensure the ASG is set to maintain 1 instance for the desired, minimum, and maximum capacity.
  • Skip to review and create the ASG.

Repeat for the other 2 ASGs. At the end of this step, we will provision three MongoDB nodes. You will receive an email notification from SES when the nodes are fully set up.

Pro tip: Adopt a naming convention that associates each ASG and NLB with the same AZ. An example: “mongo-pri-nlb“/”mongo-pri-asg” for the NLB/ASG pair in “eu-west-1-a” AZ. This convention will make it easier to identify and manage resources as you scale your infrastructure over time.

Step 8: Configuring the replica set and creating an initial database

At the end of step 7, 3 Mongodb nodes were provisioned and configured for production. Now all that’s left is to set up the replica set (a one-off exercise).

  • The first step is to get the DNS names of the NLBs. If you followed the pro tips, you will have something like mongo-pri-nlb*, mongo-sec-nlb*,  and mongo-hid-nlb* corresponding to the NLBs for the primary, secondary, and hidden nodes.
  • Get the IP address of the primary node (select the primary NLB TG, select the TG’s target instance and you will be able to retrieve the IP address). SSH into the primary node.
  • Follow the steps here to initiate the replica set, and ensure that you replace the root user and pwd with your values.
  • Next, follow these steps to add the other replica set members using the proper values for the NLB DNS names for the secondary and hidden nodes.

When configuring the hidden node, it’s important to follow the naming convention described above to avoid errors. Additionally, configuring the hidden node for its role requires extra steps beyond those needed for the primary and secondary nodes. Furthermore, it’s crucial to update the hostname of the primary node with the DNS name of the NLB. This ensures that the DNS for the primary node is publicly resolvable, making the replica set available for writes. If this step is not taken, the default private DNS name for the node will be used, resulting in an unidentifiable primary node. This will lead to write errors and potential downtime or data loss.

During this step, we also create an initial database for our application and a user with appropriate permissions. In preparation for the next step of testing, we also injected some dummy data into the database.

Step 9: Testing replica set performance

In this concluding step, we test the replica set performance:

  • First test: This app will allow testing against the replica set by evaluating three conditions.
    • We can connect to the replica set  (this confirms that the replica set is configured ok).
    • Data can be written to the replica set (this confirms that the primary is configured properly).
    • Data can be read from the replica set (this confirms that the URI is properly formatted to read from the secondary node).
  • Second test: When the first test is over,  you should have received an email notification that the backup was successful. We expected this because the hidden-node playbook was configured to run backups as cron jobs every 5 minutes. When this email is received, check S3 to ensure that the expected file is visible in the bucket.
  • Third test: In the EC2 dashboard, firstly, terminate any one of the instances. After a few minutes, you should see a new instance standing up. You will receive an email notification regarding the startup of the new node. Additionally, you will receive a second email that data has been restored. Finally, run the test app again to confirm that the URI does not need to be changed and the 3 test conditions are satisfied (affirmation of SH).
  • Fourth test: Modify the test app to run at intervals (between 120s and 480s). Terminate different EC2 instances at intervals, making sure to start a new node before terminating the next one. Check the test app output to confirm that the replica set always responds to queries (affirmation of HA/FT).

Thanks for making it all the way here.

This guide has provided the process for architecting the backend of your application for high performance against MongoDB in production. By following along, you have learned how to deploy a production-grade MongoDB replica set on AWS EC2. The replica set and app meet the constraints of high availability and fault tolerance. So, your application remains accessible and responsive even in the face of unexpected failures.

Learning Tools

If you want to gain deeper context on the topics covered in this guide, the following resources may be helpful:

HA/FT

SH

MongoDB Connection URI

MongoDB Schema Design

MongoDB Nodejs Developer Course

Learning Strategy

In 2018, a client assigned me the task of setting up a production-grade MongoDB replica set on AWS EC2.

To revise this guide for 2023, I had to read a lot of articles, documentation, and Q&As. I also put in considerable effort to decipher unclear instructions or outdated information. The Learning Tools section above features some of the most helpful resources for my assignment. 

Despite the difficulties, I believe this guide provides a comprehensive and up-to-date approach. It focuses on setting up a production-grade MongoDB replica set on AWS EC2.

Reflective Analysis

There is a significant difference between the approach I have taken for this guide and my prior work.

Previously, I employed an IP address approach to handle URI immutability. This required me to intercept ASG lifecycle events and bind static ENIs to newly spawned EC2 instances. The MongoDB URIs were bound to ENI IP addresses.

A major drawback was the need to SSH into the current primary instance. Then reconfigure the replica set to bind the DNS of the new instance to the IP address.

Trust me, it was a hassle.

This time around, the use of NLBs significantly reduces database admin tasks since the setup is now a one-time task. The DNS is always the same (NLBs are HA by default). So, there is no need for human intervention to add a new member to the replica set. This approach is vastly superior to my previous method.

Conclusions and Future Directions

In summary, MongoDB can serve as a reliable production database for various application types. Deploying it on AWS EC2 can lead to significant cost savings over time. This guide demonstrates how to achieve this with confidence.

An enhancement is to use infrastructure as code to auto-create the AWS resources and deploy the replica set. This would be a true DevOps approach. Terraform would be an excellent tool for achieving this.

Another improvement is to establish a custom VPC with public and private subnets to deploy the app and replica set. This would enhance the security of both the database and the app.

The author and LD Talent are available to deliver these enhancements at your request. Remember to check out the GitHub repository to see all the code used in this guide in one place. It’s a good idea to review the README before you start.

Hire the author: Olumuyiwa A