What is scraping ?

Web scraping is a process through which certain websites are crawled or scraped and important information extracted from them according to pre-defined specifications. This information is then formatted and stored in a database for business applications.

What’s woocommerce ?

WooCommerce is an open-source e-commerce plugin for WordPress. It is designed for small to large-sized online merchants using WordPress. You can see the instructions for how to install your woo commerce in their official docs: https://docs.woocommerce.com/document/installing-uninstalling-woocommerce/

The aim of this tutorial

The goal of this project is to allow you to use scraping to add products to your woocommerce website.

What you need to have installed to follow up ?

  • BeatifulSoup
  • lxml
  • requests

Setup The Enviorment

First we need to create the scraping environment using virtualenv

pip install virtualenv

then we create our scraping enviorment

virtualenv scraping

and then install the dependencies with the requirements.txt in the github repo

pip install -r requirements.txt

Let’s Code !!

The first part is to import the required modules.

import requests,codecs
from bs4 import BeautifulSoup
import json
...

Now you can choose any product site to scrape as part of your tutorial. I chose this product as our test subject.

https://fenceworkshop.com/product/athens-double-driveway-gate/

Lets set our headers and create the parser

hdr = {'User-Agent': 'Mozilla/5.0'}
def _scrapeProduct(url):
    with requests.get(url,headers=hdr)  as page_response:
        soup = BeautifulSoup(page_response.content, 'lxml')
...

This part of the code just simply loads the page

def _scrapeProduct(url):
    product = {}
    with requests.get(url,headers=hdr)  as page_response:
        soup = BeautifulSoup(page_response.content, 'lxml')
        price = soup.findAll("div",{"id": "product-addons-total"})
        product_title= soup.findAll("h1",{"class": "product_title"})
        description = soup.findAll("div",{"class": "content-text"})
        images = soup.findAll("div",{"class":"images"})[0].findAll("a")
...

The above code is for getting the product data

After that we need to store that in the product dict.

def _scrapeProduct(url):
    product = {}
    with requests.get(url,headers=hdr)  as page_response:pip install virtualenv
        soup = BeautifulSoup(page_response.content, 'lxml')
        price = soup.findAll("div",{"id": "product-addons-total"})
        product_title= soup.findAll("h1",{"class": "product_title"})
        description = soup.findAll("div",{"class": "content-text"})
        images = soup.findAll("div",{"class":"images"})[0].findAll("a")
    imgs = [i["href"] for i in images]
    product["Images"] = ",".join(imgs)
    product["Name"] = product_title[0].text
    product["Price"] = price[0]['data-price']
    product["Description"] = description[0].text
    return product 

Now to import the data we got from the website, we need to convert it to a CSV so we can use woocommerce importer to add the product in our product list. I’ll provide the module for the converting along with the GitHub project.

...

data = [_scrapeProduct("https://fenceworkshop.com/product/athens-double-driveway-gate/")]
print data
with open('data.json', 'w') as f:
        json.dump(data, codecs.getwriter('utf-8')(f), ensure_ascii=False,indent=4)
api.json_to_csv('data.json',"out.csv")

This part saves the file into a csv file so we can easily import it in the woocomerce admin panel.

After we created our csv file now we import it to our store.

and we follow the steps and here we go: we have our first scraped product in our shop.

Thank you for following up with me , here is the github link https://github.com/learningdollars/ahmedkhatab-wordpress-woocommerce