How to Use FlareSolverr to Scrape Cloudflare Websites

Ever wanted to scrape a website for valuable data and then got disappointed when you suddenly realized that it is using Cloudflare? Well, there is no longer any need to be discouraged, as there is a practical solution to this problem, and it is called FlareSolverr.

With FlareSolverr, you can easily bypass Cloudflare limits and get scraping access to websites that were once out of reach.

This guide will teach you how to install and use FlareSolverr for unlimited data scraping. Furthermore, you will learn how to use post requests and manage sessions for a more effective workflow.

What Is FlareSolverr

FlareSolverr is a proxy server that allows you to scrape Cloudflare-protected websites and overcome their limitations. Here is how it works.

As soon as a request is received, FlareSolverr creates a Chrome-based web browser and opens your target URL with user parameters. Then, it waits until the Cloudflare challenge is solved. Finally, it returns the content and cookies so that you can use them to bypass Cloudflare with an HTTP client such as Python Request.

How to Install FlareSolverr

FlareSolverr supports Windows, Linux, and macOS, and you can install it using its docker image, precompiled binaries, or source code. In this tutorial, you will learn how to install it using a Docker container. The main benefit of this approach is that the docker image comes with a preinstalled external browser that you need to make FlareSolverr work.

1. Install Software Dependencies

Before installing FlareSolverr, you should consider its software dependencies as follows.

  • Install Docker using its official guide for your operating system.
  • Update libseccomp2 to version 2.5 or higher (Debian users only).

If you use Ubuntu with root privileges, the fastest and easiest way to install Docker is with the following three commands.

apt-get install docker.io
systemctl enable docker
systemctl start docker

2. Install FlareSolverr Using Its Docker Image

Use the following command to install FlareSolverr from the docker image on Linux, Windows, or Mac. Remember to add “sudo” if you use Linux without root privileges.

docker pull flaresolverr/flaresolverr

You can find the FlareSolverr docker image using the links below for more information.

GitHub: github.com/orgs/FlareSolverr/packages/container/package/flaresolverr
DockerHub: hub.docker.com/r/flaresolverr/flaresolverr

3. Run FlareSolverr

You can now run FlareSolverr using the following command on  Linux, Windows, or Mac.  Remember to add “sudo” if you use Linux without root privileges.

docker run -d \
--name=flaresolverr \
-p 8191:8191 \
-e LOG_LEVEL=info \
--restart unless-stopped \
ghcr.io/flaresolverr/flaresolverr:latest

4. Verify That FlareSolverr Is Working

Finally, test if FlareSolverr works by opening the following URL in your browser.

URL: http://localhost:8191

If you see a response such as “FlareSolverr is ready!” you can rest assured that you installed it successfully.

That is all. Now it is time to learn how to use this wonderful tool to get all the data you need from Cloudflare-protected websites.

How to Use FlareSolverr

As you will see in a later part of this section, there are two primary ways of using FlareSolverr to scrape website data. But before diving deep into these, you should look at two simple examples of requests. The first will rely on bash and curl, and the second will use Python Requests.

Running a Curl Request

If you want to use FlareSolverr directly from bash, you can do this with the curl command. Take a look at the code snippet below for reference.

curl -L -X POST 'http://localhost:8191/v1' \
-H 'Content-Type: application/json' \
--data-raw '{
"cmd": "request.get",
"url":"http://www.website.com",
"maxTimeout": 60000
}'

Running a Python Request

On the other hand, you have the option of using Python Requests to do the same. Take a look at the code snippet below for reference.

import requests
url = "http://localhost:8191/v1"
headers = {"Content-Type": "application/json"}
data = {
    "cmd": "request.get",
    "url": "http://www.website.com",
    "maxTimeout": 60000
}
response = requests.post(url, headers=headers, json=data)
print(response.text)

Output Example

If everything works as expected, your output should look similar to the one below.

{
"status": "ok",
"message": "Challenge solved!",
"solution": {
"url": "https://website.com",
"status": 200,
"cookies": [
{
"domain": "website.com",
"httpOnly": false,
"name": "2F_TT",
"path": "/",
"secure": true,
"value": "0"
},
],
"userAgent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36",
"headers": {},
"response": "<html><head>...</head><body>...</body></html>"
},
}

Scraping Using Python Requests

You can configure FlareSolverr to retrieve valid Cloudflare cookies and use them with Python Requests. This is the most resource-efficient way to scrape Cloudflare websites. Take a look at the code snippet below for reference.

import requests

post_body = {
"cmd": "request.get",
"url":"https://website.com",
"maxTimeout": 60000
}
response = requests.post('http://localhost:8191/v1', headers={'Content-Type': 'application/json'}, json=post_body)

if response.status_code == 200:
json_response = response.json()
if json_response.get('status') == 'ok':

cookies = json_response['solution']['cookies']
clean_cookies_dict = {cookie['name']: cookie['value'] for cookie in cookies}
# Fetches cookies

user_agent = json_response['solution']['userAgent']
# Fetches user agent

headers={"User-Agent": user_agent}
# Creates a request

        response = requests.get("https://website.com", headers=headers, cookies=clean_cookies_dict)
if response.status_code == 200:
print('Success')

Scraping Using a List of URLs

Alternatively, you can use FlareSolverr with a list of page URLs to simplify things. In that case, you must rely on its integrated HTTP client, which is more resource-intensive. Take a look at the code snippet below for reference.

import requests

url_list = [
    'https://website1.com',
    'https://website2.com',
    'https://website3.com',
]
for url in url_list:
post_body = {
"cmd": "request.get",
"url": url,
"maxTimeout": 60000
}
response = requests.post('http://localhost:8191/v1', headers={'Content-Type': 'application/json'}, json=post_body)
if response.status_code == 200:
json_response = response.json()
if json_response.get('status') == 'ok':
html = json_response['solution']['response']
print('Success')

How to Manage Sessions

If you need to use Cloudflare cookies for a while, you can set up FlareSolverr sessions. By doing this, you will no longer need to repeatedly solve Cloudflare challenges or send cookies every time you make a request.

You can use FlareSolverr to create, list, and remove sessions with the commands listed below.

  • sessions.create
  • sessions.list
  • sessions.destroy

Read on to learn how to use each one of them.

Creating a Session

To create a session, set the “cmd” setting to “session.create.” Take a look at the code snippet below for reference.

import requests
url = "http://localhost:8191/v1"
headers = {"Content-Type": "application/json"}
data = {
"cmd": "sessions.create",
"url": "https://website.com",
"maxTimeout": 60000
}
response = requests.post(url, headers=headers, json=data)
print(response.content)

If everything goes well, you should see an output containing “Session created successfully.”

Listing Sessions

If you want to check your active sessions, you can list them by their IDs. To do that, set the “cmd” setting to “sessions.list.” Take a look at the code snippet below for reference.

curl -L -X POST 'http://localhost:8191/v1' \
-H 'Content-Type: application/json' \
--data-raw '{
"cmd": "sessions.list",
"url":"http://website.com",
"maxTimeout": 60000
}'

Removing a Session

Now that you know your session ID, you can use it to remove the session. To remove a session, set the “cmd” setting to “session.destroy.” and set the “session” setting to the proper ID. Take a look at the code snippet below for reference.

curl -L -X POST 'http://localhost:8191/v1' \
-H 'Content-Type: application/json' \
--data-raw '{
"cmd": "sessions.destroy",
"session": "session_ID",
"url":"http://website.com",
"maxTimeout": 60000
}'

Now you know everything there is to know about managing FlareSolverr sessions.

How to Make POST Requests

If you need to retrieve Cloudflare cookies from POST endpoints, you can use FlareSolverr to make POST requests. To do this, you must configure the cmd setting by replacing “request.get” with “request.post”. Take a look at the code snippet below for reference.

import requests
post_body = {
"cmd": "request.post",
"url":"https://www.website.com/POST",
"postData": POST_DATA,
"maxTimeout": 60000
}
response = requests.post('http://localhost:8191/v1', headers={'Content-Type': 'application/json'}, json=post_body)
print(response.json())

Remember to use a string with application/x-www-form-urlencoded (such as a=b&c=d) when setting “POST_DATA.”

Final Words

FlareSolverr is a very effective tool that you can use to bypass Cloudflare limits and scrape the data that you previously found to be inaccessible.

With this step-by-step guide, you have learned to install and run this software effectively. Furthermore, you now know how to use it in various ways and manage it more effectively with sessions.

Scraping Cloudflare-protected websites should now be a breeze. So, test it out yourself and start reaping the benefits of this fantastic tool.


Published version: gridpanel.net/blog/flaresolverr-python-tutorial

Task: Write a blog post on using FlareSolverr to scrape Cloudflare sites.

Client: GridPanel