Image Source

Introduction

Recently there was an Amazon sale and I wanted to buy a product that I had been checking on for a long time. But, as I was ready to buy it I noticed that the price had increased and I wondered if the sale was even legitimate. So, I figured developing a python app that monitors Amazon prices (Amazon Price Tracker) would increase my fluency in Python as a programming language, and it would be a project that I am passionate about.

Besides being used to develop an Amazon Price Tracker, Python can be used to pull a large amount of data from websites which can then be helpful in various real-world processes such as price comparison, job listings, research and development, and much more.

In this article, I’ll show you how you can build price tracking for products and let you know if the price has dropped more than a certain value. Let’s get started. You can find the code and config files on Github to follow throughout the tutorial

Glossary

  • Python Python is an interpreted, object-oriented, high-level programming language with dynamic semantics. Its high-level built-in data structures, combined with dynamic typing and dynamic binding, make it very attractive for Rapid Application Development, as well as for use as a scripting or glue language to connect existing components.

Prerequisites

An Amazon Price Tracker is what I’ll be developing. We’ll be using Visual Studio Code in this project. Ensure you have Python installed in your system or any coding platform you wish to use, but I would recommend using Visual Studio Code because of its simplicity. You can navigate to extensions in the Visual Studio Code and search for Python and then install the recommended Python Extensions in Visual Studio Code that would have been highlighted.

Step 1 – Creating files and folders for the project for our Amazon Price Tracker

  • Open whichever directory you like and create a folder, name it amazon_price_tracker, or just anything you want.
  • Now open the folder and create one file scraper.py 
  • That’s all for the first step, now open the terminal in the projects directory and head to the next step.

Step 2 – Creating a virtual environment with virtualenv(Optional) for

This is an Optional step to isolate the packages that are being installed. You can find more about virtualenv here

i. Installing virtualenv

If you are using Python 3.3 or newer, the venv module is the preferred way to create and manage virtual environments. venv is included in the Python standard library and requires no additional installation. If you are using venv, you may skip this section.

  1. On Mac OS and Linux
python3 -m venv env

On Windows:

py -m venv env

ii.Activating a Virtual Environment

Before you can start installing or using packages in your virtual environment you’ll need to activate it. Activating a virtual environment will put the virtual environment-specific python and pip executables into your shell’s PATH.

On Mac OS and Linux:

source env/bin/activate

On Windows:

.\env\Scripts\activate

iii.Deactivating the Environment

deactivate

Step 3 – Installing the required packages for the Price Tracker(Amazon Price Monitor)

  • Install requests by running the code below (a library to make HTTP requests)
 pip install requests
  • To install BeautifulSoup4 (a library to scrape information from web pages) use this command
pip install bs4
  • Ensure to run this command to install html5lib(modern HTML5 parser)
pip install html5lib

Find more about pip commands here

Step 4 – Starting to code the extract_url(URL) function for our Amazon Price Tracker

Now, open scraper.py and we need to import a few packages that we had previously installed.

import requests
from bs4 import BeautifulSoup

Now, let us create a function extract_url(URL) to make the URL shorter and verify if the URL is valid www.amazon.in URL or not

def extract_url(url):
if url.find("www.amazon.in") != -1:
index = url.find("/dp/")
if index != -1:
index2 = index + 14
url = "https://www.amazon.in" + url[index:index2]
else:
index = url.find("/gp/")
if index != -1:
index2 = index + 22
url = "https://www.amazon.in" + url[index:index2]
else:
url = None
else:
url = None
return url
view raw scraper.py hosted with ❤ by GitHub

URL extraction function

This function takes a Long Amazon URL and converts them to a shorter URL  which is more manageable. Also if the URL is not valid www.amazon.in URL then it would return a None

Step 5 – What we need for the next function required for the Amazon Price Monitor

For the next function google “my user agent”, copy your user agent and assign it to variable headers.

Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.106 Safari/537.36

Before we create a function to scrap details from the Amazon page let us visit an Amazon product page like this one and find the elements that have the name and the price of the product. We will need the element’s id to be able to find the elements when we extract the data from the Amazon page that will enable us to develop the Amazon Price Tracker

Once the Amazon page is rendered, do a right-click on the name of the product and click on “inspect” which will show the element which has the name of the product that you can use to develop the Amazon Price Tracker.

Page Rendering

We can see that the element with id=“productTitle” now holds on to this id, we will use it later to scrap the name of the product from the Amazon Page, which will allow us to monitor the cost or price of that specific product or others.

We will do the same for the price, now right-click on the price and click on inspect from the Amazon Page

Inspecting the Code

The <span> element with id=“priceblock_dealprice” has the price that we need. But, this product is on sale so its id is different from a normal id which is id=“priceblock_ourprice”.

Step 6 – Creating the price converter function for the Amazon Price Tracker

If you look closely the <span> element has the price but it has many unwanted pieces of stuff like the ₹ rupee symbol, blank space, a comma separator, and decimal point.

We just want the integer portion of the price, so we will create a price converter function that will remove the unwanted characters and give us the price in integer type.

Let us name this function get_converted_price(price)

def get_converted_price(price):
stripped_price = price.strip("₹ ,")
replaced_price = stripped_price.replace(",", "")
find_dot = replaced_price.find(".")
to_convert_price = replaced_price[0:find_dot]
converted_price = int(to_convert_price)
return converted_price
view raw scraper.py hosted with ❤ by GitHub

Price conversion function

With some simple string manipulations, this function will give us the converted price in integer type.

Regex Command

UPDATE: As mentioned by @oorjahalt we can simply use regex to extract price.

def get_converted_price(price):
# stripped_price = price.strip("₹ ,")
# replaced_price = stripped_price.replace(",", "")
# find_dot = replaced_price.find(".")
# to_convert_price = replaced_price[0:find_dot]
# converted_price = int(to_convert_price)
converted_price = float(re.sub(r"[^\d.]", "", price)) # Thanks to https://medium.com/@oorjahalt
return converted_price
view raw scraper.py hosted with ❤ by GitHub

Updated price function.

NOTE: While tracker is meant for www.amazon.in, this may very well be used for www.amazon.com or other similar websites with very minor changes such as:

Converting the Price

To make this compatible with the global version of amazon simply do this:

  • change the ₹ to $
stripped_price = price.strip("$ ,")

we can skip find_dot and to_convert_price entirely and just do this

converted_price = float(replace_price)

We would, however, will be converting the price to a float type.

This would make it compatible with www.amazon.com.

Now, as we buckle up we can finally proceed towards creating the scraper function.

Step 7 – Onto the details scraper function

Ok so let us create a function that would extract the details of the product that will monitor its cost from the Amazon Page such as its name, price and returns a dictionary that contains the name, price, and the URL of the product. We will name this function get_product_details(URL). This part if you want to obtain the functionality of the Amazon Price Tracker

i.function get_product_details(URL).

The first two variables for this function are headers and details, headers which will contain your user-agent and details is a dictionary that will contain the details of the product.

headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.106 Safari/537.36"
}
details = {"name": "", "price": 0, "deal": True, "url": ""}

Another variable _url will hold the extracted URL for us and we will check if the URL is valid or not. An invalid URL would return None, if the URL is invalid then we will set the details to None and return at the end so that we know something is wrong with the URL.

url = extract_url(url)
if _url is None:
    details = None

Now, we come to the else part.

ii.the else part.

This has 4 variables : pagesouptitle and price.

page variable will hold the requested product’s page.

The soup variable will hold the HTML, with this we can do lots of stuff like finding an element with an id and extract its text, which is what we will do. You can find more about other BeautifulSoup’s functions here.

title variable as the name suggests will hold the element that has the title of the product.

price variable will hold the element that has the price of the product.

page = requests.get(url, headers=headers)
soup = BeautifulSoup(page.content, "html5lib")
title = soup.find(id="productTitle")
price = soup.find(id="priceblock_dealprice")

Now that we have the elements for title and price, we will do some checks.

Let us begin with price, as mentioned earlier the id of price can be either id=”priceblock_dealprice” on deals or id=”priceblock_ourprice”on normal days

if price is None:
    price = soup.find(id="priceblock_ourprice")
    details["deal"] = False

Since we are first checking if there is any deal price or not, the code will change price from deal price to normal price and also set the deal to false in detail [“deal”]if there is no deal price. This is done so that we know the price is normal.

Now, even then if we don’t get the price that means something is wrong with the page, maybe the product is out of stock or maybe the product is not released yet or some other possibilities. The following code will check if there are titles and prices or not.

if title is not None and price is not None:
    details["name"] = title.get_text().strip()
    details["price"] = get_converted_price(price.get_text())
    details["url"] = _url

If there are the price and title of the product then we will store them.

details["name"] = title.get_text().strip()

This will store the name of the product but, we have to strip any unwanted blank leading and trailing spaces from the title. The strip() function removes any trailing and leading spaces.

details["price"] = get_converted_price(price.get_text())

This will store the price of the product. With the help of the get_converted_price(price) function that we created earlier gives us the converted price in integer type.

details["url"] = _url

This will store the extracted URL.

else:
    details = None
return details

We will set the details to None if the price or title doesn’t exist.

Finally, the function is complete and here is the complete code

def get_product_details(url):
headers = {
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36"
}
details = {"name": "", "price": 0, "deal": True, "url": ""}
_url = extract_url(url)
if _url == "":
details = None
else:
page = requests.get(url, headers=headers)
soup = BeautifulSoup(page.content, "html5lib")
title = soup.find(id="productTitle")
price = soup.find(id="priceblock_dealprice")
if price is None:
price = soup.find(id="priceblock_ourprice")
details["deal"] = False
if title is not None and price is not None:
details["name"] = title.get_text().strip()
details["price"] = get_converted_price(price.get_text())
details["url"] = _url
else:
return None
return details
view raw scraper.py hosted with ❤ by GitHub

Function to extract products details

Note: While this code does not work for books since books have different productid, you can make it work for books if you tweak the code.

Step 8 – Let us run scraper.py

Finally, the function is complete and here is the complete code

print(get_product_details("Insert an Amazon URL"))

Open the terminal where you have your scraper.py file and run the scraper.py like so.

$ python3 scraper.py

If you have done everything correctly you should get an output like this.

{‘name’: ‘Nokia 8.1 (Iron, 4GB RAM, 64GB Storage)’, ‘price’: 19999, ‘deal’: False, ‘url’: ‘https://www.amazon.in/dp/B077Q42J32'}

Learning Tools

Python is a powerful and diverse language that can be used in several ways such as data science, price stock analysis, and Price Tracking for Products. Moreover am yet to explore the possibilities of Python as a programming language besides developing an Amazon Price Tracker.

Learning Strategy

I was able to learn Python based on the problem I had. Implementing the concepts of a Python app that monitors Amazon Prices made it easier for me to understand basic concepts that I had not known earlier on. I also got hiccups during the way when I wanted to install the Python Extension that was responsible for listing my code but luckily I found a way out of it by installing a different version of the Python Extension pack that would make it compatible with vscode while developing an application for Price Tracking for Products.

Reflective Analysis

While indulging in the development process of developing an Amazon Price Tracker, I found out more about how Python is a powerful tool in the Stock market. I also found out how to code with vscode. I also noticed you can use it to implement a comparison for a variety of goods: services such as ParseHub use web scraping to collect data from online shopping websites and use it to compare the prices of products.

Conclusion

I hope this blog about an Amazon Price Tracker was informative and has added value to your knowledge. In this tutorial, we learned how to develop A Python App that Monitors Amazon Prices. It took me 16 hours to complete the entire writeup including coding this application for Price Tracking for Products.

Future directions with Python: Python is also used in the development of interactive games. There are libraries such as PySoy which is a 3D game engine supporting Python3, PyGame which provides functionality and a library for game development. Games such as Civilization-IV, Vega Strike, built using Python.

Here is the link to the github repository to get started.