Web Scraping With Gerapy: Setup, Troubleshooting and Tips

Gerapy is a full stack solution for Scrapy deployment. If you look at the commit history, it’s received some dependency bumps but hasn’t really been updated since 2022. Getting Gerapy started can be a difficult process often filled with trial and error.

This guide exists to make Gerapy easier. By the end of this guide, you’ll be able to answer the following questions.

Why doesn’t Gerapy work with my standard Python installation?
How can I configure Python and pip for Gerapy?
How do I create an admin account?
How do I write my first scraper?
How do I troubleshoot my scraper?
How do I test and deploy my scraper?

Introduction to Gerapy

Let’s get a better understanding of what Gerapy actually is and what makes it unique.

What is Gerapy?

Gerapy provides us with a Django management dashboard and the Scrapyd API. These services give you a simple yet powerful interface to manage your stack. At this point, it’s a legacy program but it still improves workflow and speeds up deployment. Gerapy makes web scraping more accessible to DevOps and management-oriented teams.

GUI dashboard for creating and monitoring scrapers.
Deploy a scraper with the click of a button.
Get real-time visibility into logs and errors as they occur.

What Makes Gerapy Unique?

Gerapy gives you a one-stop shop for scraper management. Getting up and running with Gerapy is a tedious process due to its legacy code and dependencies. However, once you’ve got it working, you unlock a full toolset tailored for handling scrapers at scale.

Build your scrapers from inside the browser.
Deploy them to Scrapyd without touching the command line.
Centralized management for all of your crawlers and scrapers.
Frontend built on Django for spider management.
Backend powered by Scrapyd for easy building and deployment.
Built-in scheduler for task automation.

How To Scrape the Web With Gerapy

Gerapy’s setup process is laborious. You need to address technical debt and perform software maintenance. After much trial and error, we learned that Gerapy isn’t even compatible with more modern versions of Python. We started with a modern installation of Python 3.13. It was too modern for Gerapy’s dependencies. We tried 3.12 — still no luck — just more dependency issues.

As it turned out, we needed Python 3.10. On top of that, we needed to alter some of Gerapy’s actual code to fix a deprecated class — and then we needed to manually downgrade almost every dependency in Gerapy. Python has undergone significant changes in the last three years and Gerapy’s development hasn’t kept pace. We need to recreate Gerapy’s ideal conditions from three years ago.

Project Setup

Python 3.10

To start, we need to install Python 3.10. This version isn’t extinct, but it’s no longer widely available. On native Ubuntu and Windows WSL with Ubuntu, it can be installed with apt.

sudo apt update
sudo apt install software-properties-common
sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt update
sudo apt install python3.10 python3.10-venv python3.10-dev

You can then check to make sure it’s installed with the --version flag.

python3.10 --version

If all goes well, you should see output similar to the output below.

Python 3.10.17

Creating a Project Folder

First, make a new folder.

mkdir gerapy-environment

Next, we need to cd into our new project folder and setup a virtual environment.

cd gerapy-environment
python3.10 -m venv venv

Activate the environment.

source venv/bin/activate

Once your environment is active, you can check the active version of Python.

python --version

As you can see, python now defaults to our 3.10 installation from within the virtual environment.

Python 3.10.17

Installing Dependencies

The command below installs Gerapy and its required dependency versions. As you can see, we need to manually target many legacy packages using pip==.

pip install setuptools==80.8.0
pip install scrapy==2.7.1 gerapy==0.9.13 scrapy-splash==0.8.0 scrapy-redis==0.7.3 scrapyd==1.2.0 scrapyd-client==1.2.0 pyopenssl==23.2.0 cryptography==41.0.7 twisted==21.2.0

We’ll now create an actual Gerapy project with the init command.

gerapy init

Next, we’ll cd into our gerapy folder and run migrate to create our database.

cd gerapy
gerapy migrate

Now, it’s time to create an admin account. This command gives you administrator privileges by default.

gerapy initadmin

Finally, we start the Gerapy server.

gerapy runserver

You should see an output like this.

Watching for file changes with StatReloader
Performing system checks...

System check identified no issues (0 silenced).
INFO - 2025-05-24 13:49:16,241 - process: 1726 - scheduler.py - gerapy.server.core.scheduler - 105 - scheduler - successfully synced task with jobs with force
May 24, 2025 - 13:49:16
Django version 2.2.28, using settings 'gerapy.server.server.settings'
Starting development server at http://127.0.0.1:8000/
Quit the server with CONTROL-C.

Using the Dashboard

If you visit http://127.0.0.1:8000/, you’ll be prompted to log in. Your default account name is admin and so is your password. After logging in, you’ll be taken to Gerapy’s dashboard.

Click on the “Projects” tab and create a new project. We’ll call this one quotes.

Getting the Target Site

Now, we’ll create a new spider. From within your new project, click the “add spider” button. In the “Start Urls” section, add https://quotes.toscrape.com. Under “Domains”, enter quotes.toscrape.com.

Extraction Logic

Next, we’ll add our extraction logic. The parse() function below uses CSS selectors to extract quotes from the page. You can learn more about selectors here.

Scroll down to the “Inner Code” section and add your parsing function.

def parse(self, response):
    quotes = response.css('div.quote')
    print(f"Found {len(quotes)} quotes")
    for quote in quotes:
        text = quote.css('span.text::text').get()
        author = quote.css('small.author::text').get()
        print(f"Text: {text}, Author: {author}")
        yield {
            'text': text,
            'author': author,
        }

Now, click the “Save” button located in the bottom right hand corner of the screen. If you run the spider now, you’ll run into a critical error. Gerapy is trying to use BaseItem from Scrapy. However, BaseItem was removed from Scrapy several years ago.

Fixing the BaseItem Error

To solve this error, we actually need to edit Scrapy’s internal code. You can do this from the command line. However, it’s much easier from a GUI text editor with search features.

cd into the source files for your virtual environment.

cd venv/lib/python3.10/site-packages/gerapy/server/core

To open the folder in VSCode, you can use the command below.

code .

Open up parser.py, and you’ll find our culprit.

We need to replace this line with the following.

from scrapy import Item

Replacing BaseItem with Item in Our
Imports

Now that we’ve removed the BaseItem import, we need to remove all instances of BaseItem with Item. Our only instance of it is in the run_callback() function. When you’re finished saving the changes, close the editor.

Replacing BaseItem with Item Throughout the File

If you run your spider, you’ll now receive a new error.

REQUEST_FINGERPRINTER_IMPLEMENTATION Deprecation Error

Fixing REQUEST_FINGERPRINTER_IMPLEMENTATION Deprecation

It’s not apparent, but Gerapy actually injects our settings directly into our spider. cd out of our current folder and then into the projects folder.

cd
cd gerapy-environment/gerapy/projects/quotes

Once again, open up your text editor.

code .

Now open up your spider. It should be titled quotes.py and it’s located inside the spiders folder. You should see your parse() function inside the spider class. At the bottom of the file, you should see an array called custom_settings. Our settings have literally been injected into the spider by Gerapy.

We need to add one new setting. You need to use 2.7. 2.6 will continue to throw the error. We discovered this after numerous instances of trial and error.

"REQUEST_FINGERPRINTER_IMPLEMENTATION": "2.7",

Now, when you run the spider using Gerapy’s play button, all errors are resolved. As you can see below, instead of an error message, we just see a “Follow Request”.

Putting Everything Together

Building the Scraper

If you go back to your “Projects” tab in Gerapy, you’ll see an “X” in the “Built” column for the project. This means that our scraper hasn’t been built into an executable file for deployment.

Click the “deploy” button. Now, click “build”.

Using The Scheduler

To schedule your scraper to run at a specific time or interval, click “Tasks” and then create a new task. Then, select your desired settings for the schedule.

Once finished, click the “create” button.

Limitations When Scraping With Gerapy

Dependencies

Its legacy code introduces many limitations we’ve addressed head-on during this article. Just to get Gerapy running, we needed to go in and edit its internal source code. If you’re not comfortable touching the system’s internals, Gerapy is not for you. Remember the BaseItem error?

As Gerapy’s dependencies continue to evolve, Gerapy remains frozen in time. To continue using it, you’ll need to maintain your installation personally. This adds technical debt in the form of maintenance and a very real process of trial and error.

Recall this snippet below. Each of these version numbers was discovered through a meticulous process of trial and error. When dependencies break, you need to continually try different version numbers until you get a working one. In this tutorial alone, we had to use trial and error to find working versions of 10 dependencies. As time goes on, this will only get worse.

pip install setuptools==80.8.0
pip install scrapy==2.7.1 gerapy==0.9.13 scrapy-splash==0.8.0 scrapy-redis==0.7.3 scrapyd==1.2.0 scrapyd-client==1.2.0 pyopenssl==23.2.0 cryptography==41.0.7 twisted==21.2.0

Operating System Limitations

When we attempted this tutorial initially, we tried using native Windows. This was how we discovered the initial limitations due to Python versions. Current Python stable releases are limited to 3.9, 3.11 and 3.13. Managing multiple versions of Python is difficult regardless of OS. However, Ubuntu gives us the deadsnakes PPA repository.

Without deadsnakes, it is possible to find a compatible version of Python, but even then, you need to handle PATH issues and differentiate between python (your default installation) and python3.10. It’s likely possible to handle this natively from Windows and macOS, but you will need to find a different workaround. With Ubuntu and other apt-based Linux distros, you at least get a reproducible environment with quick access to older versions of Python directly installed into your PATH.

Proxy Integration With Gerapy

As with vanilla Scrapy itself, proxy integration is easily done. In the true spirit of Gerapy’s settings injection, we can inject a proxy directly into the spider. In the example below, we add the HTTPPROXY_ENABLED and HTTPPROXY_PROXY settings to connect using Web Unlocker.

"HTTPPROXY_ENABLED": True,
"HTTPPROXY_PROXY": "http://brd-customer-<your-username>-zone-<your-zone-name>:<your-password>@brd.superproxy.io:33335"

Here’s the full spider after proxy integration. Remember to swap the username, zone and password with your own.

Viable Alternatives to Gerapy

Scrapyd: This is the actual backbone behind Gerapy and just about any other Scrapy stack. With Scrapyd, you can manage everything through plain old HTTP Requests and build a dashboard if you so choose.
Scraping Functions: Our scraping functions allow you to deploy your scrapers directly to the cloud and edit them from an online IDE — with a dashboard like Gerapy but more flexible and modern.

Conclusion

Gerapy is a legacy product in our rapidly changing world. It requires real maintenance and you’ll need to get your hands dirty. Tools like Gerapy allow you to centralize your scraping environment and monitor everything from a single dashboard. In DevOps circles, Gerapy provides real utility and value.

If Scrapy isn’t your thing, we offer many viable alternatives to meet your need for data collection. The products below are just a few.

Custom Scraper: Create scrapers with no code required and deploy them to our cloud infrastructure.
Datasets: Access historical datasets updated daily from all over the web. A library of internet history right at your fingertips.
Residential Proxies: Whether you prefer to write code yourself or scrape with AI, our proxies give you access to the internet with geotargeting on a real residential internet connection.

Start free trial

Start free with Google

Gerapy Web Scraping: A Full Stack Scrapy Deployment Guide