What is scraping ?
Web scraping is a process through which certain websites are crawled or scraped and important information extracted from them according to pre-defined specifications. This information is then formatted and stored in a database for business applications.
What’s woocommerce ?
WooCommerce is an open-source e-commerce plugin for WordPress. It is designed for small to large-sized online merchants using WordPress. You can see the instructions for how to install your woo commerce in their official docs: https://docs.woocommerce.com/document/installing-uninstalling-woocommerce/
The aim of this tutorial
The goal of this project is to allow you to use scraping to add products to your woocommerce website.
What you need to have installed to follow up ?
- BeatifulSoup
- lxml
- requests
Setup The Enviorment
First we need to create the scraping environment using virtualenv
pip install virtualenv
then we create our scraping enviorment
virtualenv scraping
and then install the dependencies with the requirements.txt in the github repo
pip install -r requirements.txt
Let’s Code !!
The first part is to import the required modules.
import requests,codecs from bs4 import BeautifulSoup import json ...
Now you can choose any product site to scrape as part of your tutorial. I chose this product as our test subject.
https://fenceworkshop.com/product/athens-double-driveway-gate/
Lets set our headers and create the parser
hdr = {'User-Agent': 'Mozilla/5.0'} def _scrapeProduct(url): with requests.get(url,headers=hdr) as page_response: soup = BeautifulSoup(page_response.content, 'lxml') ...
This part of the code just simply loads the page
def _scrapeProduct(url): product = {} with requests.get(url,headers=hdr) as page_response: soup = BeautifulSoup(page_response.content, 'lxml') price = soup.findAll("div",{"id": "product-addons-total"}) product_title= soup.findAll("h1",{"class": "product_title"}) description = soup.findAll("div",{"class": "content-text"}) images = soup.findAll("div",{"class":"images"})[0].findAll("a") ...
The above code is for getting the product data
After that we need to store that in the product dict.
def _scrapeProduct(url): product = {} with requests.get(url,headers=hdr) as page_response:pip install virtualenv soup = BeautifulSoup(page_response.content, 'lxml') price = soup.findAll("div",{"id": "product-addons-total"}) product_title= soup.findAll("h1",{"class": "product_title"}) description = soup.findAll("div",{"class": "content-text"}) images = soup.findAll("div",{"class":"images"})[0].findAll("a") imgs = [i["href"] for i in images] product["Images"] = ",".join(imgs) product["Name"] = product_title[0].text product["Price"] = price[0]['data-price'] product["Description"] = description[0].text return product
Now to import the data we got from the website, we need to convert it to a CSV so we can use woocommerce importer to add the product in our product list. I’ll provide the module for the converting along with the GitHub project.
... data = [_scrapeProduct("https://fenceworkshop.com/product/athens-double-driveway-gate/")] print data with open('data.json', 'w') as f: json.dump(data, codecs.getwriter('utf-8')(f), ensure_ascii=False,indent=4) api.json_to_csv('data.json',"out.csv")
This part saves the file into a csv file so we can easily import it in the woocomerce admin panel.
After we created our csv file now we import it to our store.


and we follow the steps and here we go: we have our first scraped product in our shop.

Thank you for following up with me , here is the github link https://github.com/learningdollars/ahmedkhatab-wordpress-woocommerce
awesome article thank you!!! also you may use eScraper for WooCommerce it helps(to scrape) you with missing data for your e-store