A post by Learning Dollars software engineer Ahmed K.
You can find this project in GitHub https://github.com/learningdollars/ahmedkhatab-captchasolver
CAPTCHA stands for Completely Automated Public Turing test to tell Computers and Humans Apart. CAPTCHA is a difficult acronym but is a very useful tool in differentiating robots from humans. In short, it is a simple test to determine if the user is a robot or human.
There are a lot of captcha systems out there and each one is different from another. In this article we will use one technique to tackle a captcha problem which is noisy background.
Prerequisites :
- Pillow library installed
pip install pillow
- Tesseract and auxiliary libraries
pip install pytesseract
andsudo apt install tesseract-ocr && sudo apt install libtesseract-dev
- requests
pip install requests
- Virtualenv
pip install virtualenv
- Scipy
pip install scipy
Let’s code !!

At first glance, it seems like an easy task but the square patterns in the background actually are making this task harder.
So let’s try to decrypt the original image with pytesseract alone (an OCR library)
First of all, we need to setup a virtual environment for our project using virtualenv and activate it
ahmmkh@ahmmkh:~$ virtualenv ocr
ahmmkh@ahmmkh:~$ source ocr/bin/activate
(ocr) ahmmkh@ahmmkh:~$
Your terminal will look something like that. We will start by trying to do OCR on the original image. The code would look something like this
decrypt.py

(ocr) ahmmkh@ahmmkh:~$ python decrypt.py
(ocr) ahmmkh@ahmmkh:~$
Weird!! it seems to be easy yet the library couldn’t recognize the numbers in the photo. That’s because the background patterns are the source of this noise. But how do we get rid of it !!
The basic solution is to perform a threshold filter so the picture would look like this

decrypt.py

After running the following code the library still didn’t recognize the number.
(ocr) ahmmkh@ahmmkh:~$ python decrypt.py
(ocr) ahmmkh@ahmmkh:~$
So we need to add an extra layer of filtering to get this number to be clear in the photo. Our goal is to remove the noisy background.
A trick to do that is to blur the whole image a little bit, then you perform another threshold in the picture.


The code will look something like this.

After running the code
(ocr) ahmmkh@ahmmkh:~$ python decrypt.py
12026
(ocr) ahmmkh@ahmmkh:~$
You can find this project in GitHub https://github.com/learningdollars/ahmedkhatab-captchasolver