Image Source: https://morioh.com/p/b34ff29f7207

Introduction

Recently there was an Amazon sale and I wanted to buy a product that I had been checking on for a long time. But, as I was ready to buy it, I noticed that the price had increased and I wondered if the sale was even legitimate. Moreover, I also noticed that most of the amazon price trackers do not offer a way to visualize the data found and manipulate it into several uses. So, I figured developing an application that would enable users to visualize the data as it increases and decreases over time would be a project that I am passionate about and it would increase my fluency in Python as a programming language and MongoDB.

We will develop an application that uses MongoDB / PyMongo to Store Data. This project can be later used to conduct Machine Learning with the data which can then be helpful in various real-world processes such as price comparison, job listings, research, and development, and much more.

In this article, I’ll show you how you can build an application that uses MongoDB using PyMongo To Store Data and enable users to visualize the data as it increases and decreases over time. Let’s get started. You can find the code and config files on Github to follow throughout the tutorial. You can also check out my previous article to get a glimpse of what ill be working on as a continuation from the previous article and also we are going to import some of the files from our previous project here

Glossary

  • Python  Python is an interpreted, object-oriented, high-level programming language with dynamic semantics. Its high-level built-in data structures, combined with dynamic typing and dynamic binding, make it very attractive for Rapid Application Development, as well as for use as a scripting or glue language to connect existing components.
  • MongoDB – MongoDB is a general-purpose, document-based, distributed database built for modern application developers and for the cloud era.
  • PyMongo is a Python distribution containing tools for working with MongoDB, and is the recommended way to work with MongoDB from Python.

Prerequisites

An application that uses MongoDB using PyMongo To Store Data is what I’ll be developing. We’ll be using the Visual Studio Code in this project. Ensure you have Python installed in your system or any coding platform you wish to use, but I would recommend using Visual Studio Code because of its simplicity. You can navigate to extensions in the Visual Studio Code and search for Python and then install the recommended Python Extensions in Visual Studio Code that would have been highlighted.

Step 1 – Creating some files for the project

  • Open whichever directory you like and create a folder, name it Mongo_and_Pymongo, or just anything you want.
  • Now open the folder and create two files tracker.py and db.py.
  • I’ll explain tracker.py later on, for now, open db.py which is where we will code to store and retrieve the data to/from the database.

Now, import some libraries that are required.

import datetime
import pymongo

pymongo is for connecting to MongoDB and datetime is to get a timestamp.

Next, we will connect to MongoDB and select amazon as our database

client = pymongo.MongoClient("mongodb://localhost:27017/")
db = client["amazon"]

Note: you may want to have NoSQLBooster or MongoDB Compass as an interface to monitor or modify the database easily since they are GUI based software.

Step 2 – Installing the required packages

  • Run this command to install pymongo (a driver to access MongoDB)
$ pip install pymongo

Step 3 – Create a function to add product details to the database

OK so before we begin let us look at the structure or format of how we will store the details.

We will have a collection (a table in SQL terms, read here)for products and each product will be a document (a tuple in SQL terms, read here).

Here is an example of a document.

{
'asin': 'B077Q42J32',
'details': [
{
'name': 'Nokia 8.1 (Iron, 4GB RAM, 64GB Storage)',
'price': 19139.0,
'deal': False,
'url': 'https://www.amazon.in/dp/B077Q42J32',
'date': datetime.datetime(2019, 8, 7, 9, 51, 35, 648000)
},
{
'name': 'Nokia 8.1 (Iron, 4GB RAM, 64GB Storage)',
'price': 19029.0,
'deal': False,
'url': 'https://www.amazon.in/dp/B077Q42J32',
'date': datetime.datetime(2019, 8, 8, 16, 23, 38, 749000)
},
{
'name': 'Nokia 8.1 (Iron, 4GB RAM, 64GB Storage)',
'price': 19028.0,
'deal': False,
'url': 'https://www.amazon.in/dp/B077Q42J32',
'date': datetime.datetime(2019, 8, 8, 17, 18, 26, 461000)
}
]
}
view raw db.py hosted with ❤ by GitHub

‘asin’ is the Amazon Standard Identification Number. It is a set of 10 digit alphanumeric characters that is unique to each product. Example, in this URL https://www.amazon.in/dp/B077Q42J3 2 the asin is the last part B077Q42J3

‘details’ is an array of a dictionary and each dictionary contains a namepricedealurl and date for a product.

{
  'name': 'Nokia 8.1 (Iron, 4GB RAM, 64GB Storage)',
  'price': 19026.0,
  'deal': False,
  'url': 'https://www.amazon.in/dp/B077Q42J32',
  'date': datetime.datetime(2019, 8, 9, 19, 20, 30, 858904)
}

Let us look at the function now.

def add_product_detail(details):
new = db["products"]
ASIN = details["url"][len(details["url"])-10:len(details["url"])]
details["date"] = datetime.datetime.utcnow()
try:
new.update_one(
{
"asin":ASIN
},
{
"$set": {
"asin":ASIN
},
"$push":{
"details":details
}
},
upsert=True
)
return True
except Exception as identifier:
print(identifier)
return False
view raw db.py hosted with ❤ by GitHub

Function to add product details to the database

The first thing that we do in the function is to select the collection in MongoDB in this case, it will be products.

new = db["products"]

Next, we will extract the asin from the URL

ASIN = details["url"][len(details["url"])-10:len(details["url"])]

After that, add a timestamp with the key date in the dictionary details.

details["date"] = datetime.datetime.utcnow()

Now comes the try block, where we store the data in the database.

new.update_one(
    {                
        "asin":ASIN
    },
    {
        "$set": 
        {
            "asin":ASIN
        },
        "$push":
        {
            "details":details
        }
    },             
    upsert=True
)

The function update_one() is divided into three parts

{
    "asin":ASIN
} 

The first part of the function searches for asin

{
    "$set": 
    {
        "asin":ASIN
    },
    "$push":
    {
        "details":details
    }
}

The second part updates the asin and adds the data to the array of data.

upsert=True

The third part is a condition when it’s set to true, creates a new document if the search doesn’t find asin else it just updates it.

Step 4 – Create a function to retrieve data from the database

We need to retrieve the data that is stored in the database to build an excel sheet or visualize the data or any other thing that you may think. Now let us begin to code the function.

def get_product_history(asin):
new = db["products"]
try:
find = new.find_one({"asin": asin}, {"_id": 0})
if find:
return find
except Exception as identifier:
print(identifier)
return None
view raw db.py hosted with ❤ by GitHub

This function would find the document based on the asin. The output from this would be something like this.

{
'asin': 'B077Q42J32',
'details': [
{
'name': 'Nokia 8.1 (Iron, 4GB RAM, 64GB Storage)',
'price': 19139.0,
'deal': False,
'url': 'https://www.amazon.in/dp/B077Q42J32',
'date': datetime.datetime(2019, 8, 7, 9, 51, 35, 648000)
},
{
'name': 'Nokia 8.1 (Iron, 4GB RAM, 64GB Storage)',
'price': 19029.0,
'deal': False,
'url': 'https://www.amazon.in/dp/B077Q42J32',
'date': datetime.datetime(2019, 8, 8, 16, 23, 38, 749000)
},
{
'name': 'Nokia 8.1 (Iron, 4GB RAM, 64GB Storage)',
'price': 19028.0,
'deal': False,
'url': 'https://www.amazon.in/dp/B077Q42J32',
'date': datetime.datetime(2019, 8, 8, 17, 18, 26, 461000)
}
]
}
view raw db.py hosted with ❤ by GitHub

Step 5 – Creating the driver program

Open the file tracker.py that we had created earlier.

Import db and scraper from our previous project as we need to combine the functions that we had created earlier and also import the time library.

import db
import scraper
import time

Next, specify the URL

URL = "https://www.amazon.in/dp/B077Q42J32/"

Now, let us create a function track() that’s where we would combine the functions from scraper.py and db.py to get the data from the website and push it to the database.

The first thing to do inside the function is to get the details, you do that by calling the function get_product_details(URL) with the argument URL from scraper

details = scraper.get_product_details(URL)

Now check if the details returned by the function get_product_details(URL) is None or not.

If it is None that means something is wrong and we have to return “not done”.

if details is None:        
    result = "not done"

If it is not None then push the details to the database.

else:        
    inserted = db.add_product_detail(details)

Now again check if the function add_product_details(details) has returned True or False. Returning True would mean it is inserted and returning False would mean data was not stored in the database.

if inserted:            
    result = "done"        
else:            
    result = "not done"
return result

The last part is to call the function track() every 1 minute or 60 seconds or change it depending on your requirement.

Note: You should not do it every second, doing so you would cross the rate limit and amazon will block you meaning you will not be able to scrap the data.

while True:    
    print(track())    
    time.sleep(60)

Here is the complete code.

import db
import scraper
import time
URL = "https://www.amazon.in/dp/B077Q42J32/"
def track():
details = scraper.get_product_details(URL)
result = ""
if details is None:
result = "not done"
else:
inserted = db.add_product_detail(details)
if inserted:
result = "done"
else:
result = "not done"
return result
while True:
print(track())
time.sleep(60)
view raw tracker.py hosted with ❤ by GitHub

Driver program to track the price every 1 min

Note: Also you should rotate your user-agent as amazon may put a hard rate limit of 1 request per minute on you. For more information on how to do that check here.

Go and check your GUI based software, you should see something like this.

Learning Tools

MongoDB is widely used across various web applications as the primary data store. In addition, it has become a popular choice of a highly scalable database. Moreover, it is currently being used as the backend data store of many well-known organizations like IBM, Twitter, Zendesk, Forbes, Facebook, Google.

Learning Strategy

I was able to learn MongoDB based on the problem I had. Implementing the concepts of an app that uses MongoDB using PyMongo To Store Data made it easier for me to understand the basic concepts that I had not known earlier on.

Reflective Analysis

While indulging in the development process of developing this application. I found out more about how MongoDB is a powerful, highly scalable, free, and open-source NoSQL based database. I also found out how to code with vscode. Furthermore, I also noticed you can use it to implement a new crop of technologies. Technologies that have emerged in response to these demands. In addition including a new class of databases known as NoSQL. Many organizations have chosen to take advantage of new databases. Databases such as MongoDB, have been able to build new applications that were previously either impossible or simply impractical.

Conclusion

I hope this blog was informative and has added value to your knowledge. It took me 8 hours to complete the entire writeup including coding this application

Future directions of these projects.I will propose MongoDB Stitch as a Back-end-as-a-service platform. A platform that offers a RESTFUL API in an effort to ease application developments. Moreover, it lets users connect to cloud services and set real-time triggers in databases

Here is the link to the github repository to get started.