Image Source: https://morioh.com/p/b34ff29f7207
Introduction
Recently there was an Amazon sale and I wanted to buy a product that I had been checking on for a long time. But, as I was ready to buy it, I noticed that the price had increased and I wondered if the sale was even legitimate. Moreover, I also noticed that most of the amazon price trackers do not offer a way to visualize the data found and manipulate it into several uses. So, I figured developing an application that would enable users to visualize the data as it increases and decreases over time would be a project that I am passionate about and it would increase my fluency in Python as a programming language and MongoDB.
We will develop an application that uses MongoDB / PyMongo to Store Data. This project can be later used to conduct Machine Learning with the data which can then be helpful in various real-world processes such as price comparison, job listings, research, and development, and much more.
In this article, I’ll show you how you can build an application that uses MongoDB using PyMongo To Store Data and enable users to visualize the data as it increases and decreases over time. Let’s get started. You can find the code and config files on Github to follow throughout the tutorial. You can also check out my previous article to get a glimpse of what ill be working on as a continuation from the previous article and also we are going to import some of the files from our previous project here
Glossary
- Python – Python is an interpreted, object-oriented, high-level programming language with dynamic semantics. Its high-level built-in data structures, combined with dynamic typing and dynamic binding, make it very attractive for Rapid Application Development, as well as for use as a scripting or glue language to connect existing components.
- MongoDB – MongoDB is a general-purpose, document-based, distributed database built for modern application developers and for the cloud era.
- PyMongo – is a Python distribution containing tools for working with MongoDB, and is the recommended way to work with MongoDB from Python.
Prerequisites
An application that uses MongoDB using PyMongo To Store Data is what I’ll be developing. We’ll be using the Visual Studio Code in this project. Ensure you have Python installed in your system or any coding platform you wish to use, but I would recommend using Visual Studio Code because of its simplicity. You can navigate to extensions in the Visual Studio Code and search for Python and then install the recommended Python Extensions in Visual Studio Code that would have been highlighted.
Step 1 – Creating some files for the project
- Open whichever directory you like and create a folder, name it Mongo_and_Pymongo, or just anything you want.
- Now open the folder and create two files
tracker.py
anddb.py
. - I’ll explain
tracker.py
later on, for now, opendb.py
which is where we will code to store and retrieve the data to/from the database.
Now, import some libraries that are required.
import datetime
import pymongo
pymongo
is for connecting to MongoDB and datetime
is to get a timestamp.
Next, we will connect to MongoDB and select amazon
as our database
client = pymongo.MongoClient("mongodb://localhost:27017/")
db = client["amazon"]
Note: you may want to have NoSQLBooster or MongoDB Compass as an interface to monitor or modify the database easily since they are GUI based software.
Step 2 – Installing the required packages
- Run this command to install pymongo (a driver to access MongoDB)
$ pip install pymongo
Step 3 – Create a function to add product details to the database
OK so before we begin let us look at the structure or format of how we will store the details.
We will have a collection (a table in SQL terms, read here)for products and each product will be a document (a tuple in SQL terms, read here).
Here is an example of a document.
{ | |
'asin': 'B077Q42J32', | |
'details': [ | |
{ | |
'name': 'Nokia 8.1 (Iron, 4GB RAM, 64GB Storage)', | |
'price': 19139.0, | |
'deal': False, | |
'url': 'https://www.amazon.in/dp/B077Q42J32', | |
'date': datetime.datetime(2019, 8, 7, 9, 51, 35, 648000) | |
}, | |
{ | |
'name': 'Nokia 8.1 (Iron, 4GB RAM, 64GB Storage)', | |
'price': 19029.0, | |
'deal': False, | |
'url': 'https://www.amazon.in/dp/B077Q42J32', | |
'date': datetime.datetime(2019, 8, 8, 16, 23, 38, 749000) | |
}, | |
{ | |
'name': 'Nokia 8.1 (Iron, 4GB RAM, 64GB Storage)', | |
'price': 19028.0, | |
'deal': False, | |
'url': 'https://www.amazon.in/dp/B077Q42J32', | |
'date': datetime.datetime(2019, 8, 8, 17, 18, 26, 461000) | |
} | |
] | |
} |
‘asin’
is the Amazon Standard Identification Number. It is a set of 10 digit alphanumeric characters that is unique to each product. Example, in this URL https://www.amazon.in/dp/B077Q42J3 2 the asin
is the last part B077Q42J3
‘details’
is an array of a dictionary and each dictionary contains a name
, price
, deal
, url
and date
for a product.
{
'name': 'Nokia 8.1 (Iron, 4GB RAM, 64GB Storage)',
'price': 19026.0,
'deal': False,
'url': 'https://www.amazon.in/dp/B077Q42J32',
'date': datetime.datetime(2019, 8, 9, 19, 20, 30, 858904)
}
Let us look at the function now.
def add_product_detail(details): | |
new = db["products"] | |
ASIN = details["url"][len(details["url"])-10:len(details["url"])] | |
details["date"] = datetime.datetime.utcnow() | |
try: | |
new.update_one( | |
{ | |
"asin":ASIN | |
}, | |
{ | |
"$set": { | |
"asin":ASIN | |
}, | |
"$push":{ | |
"details":details | |
} | |
}, | |
upsert=True | |
) | |
return True | |
except Exception as identifier: | |
print(identifier) | |
return False |
Function to add product details to the database
The first thing that we do in the function is to select the collection in MongoDB in this case, it will be products
.
new = db["products"]
Next, we will extract the asin
from the URL
ASIN = details["url"][len(details["url"])-10:len(details["url"])]
After that, add a timestamp with the key date
in the dictionary details
.
details["date"] = datetime.datetime.utcnow()
Now comes the try
block, where we store the data in the database.
new.update_one(
{
"asin":ASIN
},
{
"$set":
{
"asin":ASIN
},
"$push":
{
"details":details
}
},
upsert=True
)
The function update_one()
is divided into three parts
{
"asin":ASIN
}
The first part of the function searches for asin
{
"$set":
{
"asin":ASIN
},
"$push":
{
"details":details
}
}
The second part updates the asin
and adds the data to the array of data.
upsert=True
The third part is a condition when it’s set to true, creates a new document if the search doesn’t find asin
else it just updates it.
Step 4 – Create a function to retrieve data from the database
We need to retrieve the data that is stored in the database to build an excel sheet or visualize the data or any other thing that you may think. Now let us begin to code the function.
def get_product_history(asin): | |
new = db["products"] | |
try: | |
find = new.find_one({"asin": asin}, {"_id": 0}) | |
if find: | |
return find | |
except Exception as identifier: | |
print(identifier) | |
return None |
This function would find the document based on the asin
. The output from this would be something like this.
{ | |
'asin': 'B077Q42J32', | |
'details': [ | |
{ | |
'name': 'Nokia 8.1 (Iron, 4GB RAM, 64GB Storage)', | |
'price': 19139.0, | |
'deal': False, | |
'url': 'https://www.amazon.in/dp/B077Q42J32', | |
'date': datetime.datetime(2019, 8, 7, 9, 51, 35, 648000) | |
}, | |
{ | |
'name': 'Nokia 8.1 (Iron, 4GB RAM, 64GB Storage)', | |
'price': 19029.0, | |
'deal': False, | |
'url': 'https://www.amazon.in/dp/B077Q42J32', | |
'date': datetime.datetime(2019, 8, 8, 16, 23, 38, 749000) | |
}, | |
{ | |
'name': 'Nokia 8.1 (Iron, 4GB RAM, 64GB Storage)', | |
'price': 19028.0, | |
'deal': False, | |
'url': 'https://www.amazon.in/dp/B077Q42J32', | |
'date': datetime.datetime(2019, 8, 8, 17, 18, 26, 461000) | |
} | |
] | |
} |
Step 5 – Creating the driver program
Open the file tracker.py
that we had created earlier.
Import db
and scraper
from our previous project as we need to combine the functions that we had created earlier and also import the time library.
import db
import scraper
import time
Next, specify the URL
URL = "https://www.amazon.in/dp/B077Q42J32/"
Now, let us create a function track()
that’s where we would combine the functions from scraper.py
and db.py
to get the data from the website and push it to the database.
The first thing to do inside the function is to get the details, you do that by calling the function get_product_details(URL)
with the argument URL
from scraper
details = scraper.get_product_details(URL)
Now check if the details returned by the function get_product_details(URL)
is None
or not.
If it is None
that means something is wrong and we have to return “not done”
.
if details is None:
result = "not done"
If it is not None
then push the details to the database.
else:
inserted = db.add_product_detail(details)
Now again check if the function add_product_details(details)
has returned True
or False
. Returning True
would mean it is inserted and returning False
would mean data was not stored in the database.
if inserted:
result = "done"
else:
result = "not done"
return result
The last part is to call the function track()
every 1 minute or 60 seconds or change it depending on your requirement.
Note: You should not do it every second, doing so you would cross the rate limit and amazon will block you meaning you will not be able to scrap the data.
while True:
print(track())
time.sleep(60)
Here is the complete code.
import db | |
import scraper | |
import time | |
URL = "https://www.amazon.in/dp/B077Q42J32/" | |
def track(): | |
details = scraper.get_product_details(URL) | |
result = "" | |
if details is None: | |
result = "not done" | |
else: | |
inserted = db.add_product_detail(details) | |
if inserted: | |
result = "done" | |
else: | |
result = "not done" | |
return result | |
while True: | |
print(track()) | |
time.sleep(60) |
Driver program to track the price every 1 min
Note: Also you should rotate your user-agent as amazon may put a hard rate limit of 1 request per minute on you. For more information on how to do that check here.
Go and check your GUI based software, you should see something like this.

Learning Tools
MongoDB is widely used across various web applications as the primary data store. In addition, it has become a popular choice of a highly scalable database. Moreover, it is currently being used as the backend data store of many well-known organizations like IBM, Twitter, Zendesk, Forbes, Facebook, Google.
Learning Strategy
I was able to learn MongoDB based on the problem I had. Implementing the concepts of an app that uses MongoDB using PyMongo To Store Data made it easier for me to understand the basic concepts that I had not known earlier on.
Reflective Analysis
While indulging in the development process of developing this application. I found out more about how MongoDB is a powerful, highly scalable, free, and open-source NoSQL based database. I also found out how to code with vscode. Furthermore, I also noticed you can use it to implement a new crop of technologies. Technologies that have emerged in response to these demands. In addition including a new class of databases known as NoSQL. Many organizations have chosen to take advantage of new databases. Databases such as MongoDB, have been able to build new applications that were previously either impossible or simply impractical.
Conclusion
I hope this blog was informative and has added value to your knowledge. It took me 8 hours to complete the entire writeup including coding this application
Future directions of these projects.I will propose MongoDB Stitch as a Back-end-as-a-service platform. A platform that offers a RESTFUL API in an effort to ease application developments. Moreover, it lets users connect to cloud services and set real-time triggers in databases
Here is the link to the github repository to get started.
Future Directions
We can implement this concept to incorporate the data being interactive and also that it can visualize extracted data from endless Web Pages
MongoDB Mobile: Beta Technology that extends MongoDB Applications to mobile devices and equipment on the internet of things, with automatic synchronization of data to back-end databases
Thanks for sharing great informative article. I love this article.
The Homedepot survey was conducted to assess customer satisfaction with the retailer. It was designed to Get answers to survey questions about the store environment, store associates, and shoppers’ need for assistance. With this survey, HomeDepot wants to understand what customers are looking for and how they feel. In addition, the company will use this data to improve its business model.