Introduction to Python

When using Python to scrape the web and encountering blocks, you’ll turn to proxies to bypass them. Proxies hide your original location and IP, keeping you anonymous. They also let you rotate your IP address to avoid blocks and geolocation issues, making them essential for web scraping.

Getting Started

First, install Python 3. Then, purchase proxies from our dashboard here. All the residential providers and ISP/DC proxies we offer are compatible with Python.

Next, install the requests module, a concise Python library for performing HTTP requests. In your terminal, type:

pip install requests

If you’re unsure whether you already have the necessary packages, check through the terminal. Open your terminal and type:

pip freeze

This command lists all installed packages and their version numbers.

Reformatting Your Proxies

All the proxies generated from the dashboard are in the format of IP:PORT:USER:PASSWORD. For example:

pr.rampageproxies.com:8888:lifUXmUP-cc-de-pool-oproxy-sessionid-2882719010-sesstime-30:GlIWysEI

IP: pr.rampageproxies.com
Port: 8888
User: lifUXmUP-cc-de-pool-oproxy-sessionid-2882719010-sesstime-30
Password: GlIWysEI

This “IP” may look different from what you’ve used previously, as it’s non-numerical.

The following script automates the reformatting process, so you don’t have to do it manually. This script will open the proxies from a list using a Python dictionary and a text file named “proxies.txt.” This script pulls proxies from the text file, which must be in the same directory as the script to work.

Next, you’ll work on the code block that handles this. The code will open the “proxies.txt” file, create a list called “proxylist,” (although this can be called anything):

with open("proxies.txt", "r") as f:

The script then reads all lines in the txt file, splitting them into the respective portions of the proxy: Finally, the code appends these in the correct order using the proxy’s specific components, including the proxy protocol “HTTP.”

proxies = []

with open("proxies.txt", "r") as f:
    proxylist = f.read().splitlines()
    for proxy in proxylist:
        ip, port, user, pw = proxy.split(":")
        proxies.append(f"http://{user}:{pw}@{ip}:{port}")

Building The Script

Start by importing the two required packages, requests and random. We'll use random to randomise the proxy being selected from the list:

import requests
import random

Next, we'll pull from the .txt file containing proxies generated from the Rampage dashboard, split them, and append them in the correct format under the proxy list. In our example, we’re using static, German Oxylabs, but all our providers are suitable.

proxies = {
        'http': random.choice(proxies),
        'https': random.choice(proxies),

Here we're instructing the script to select a random proxy from the list we have in our text file. We're also using both HTTP and HTTPS protocols to allow our script to make connections in either. Randomising the proxies is a good way of helping to reduce the chance of being blocked/detected by randomising the IP address of each connection- a key "golden rule" of web scraping infact.

Once defined, input a target URL. This script targets ipinfo as an impartial target:

url = 'https://ipinfo.io/'
r = requests.get(url, proxies=proxies)
print(f"Status Code: {r.status_code}, Proxy Location: {r.json()['country']}, IP: {r.json()['ip']}")

After this, the script will check the response using a GET request targeting the website defined in url with the proxies you predefined in the dictionary. Then, add a response check. You’ll verify this by printing the response. The final line prints all the relevant information, including:

The response code
The location of the proxy by country
The IP address of the proxy

Printing this response asks the page to return the status code stored in response, which should be 200. Status code 200 acknowledges a successful request, indicating that it has been received, understood, and returned. Use this as a quick method to test whether a proxy is working.

You can also print differently by parsing the response as a JSON, returning much more information:

print(r.json())

Error Handling

At the end of the script, write a short function for error handling. If there are issues with the script or proxies, you’ll see “Request Failed.” If any exception occurs (such as the proxy not working or the proxies.txt file being malformed), the script will catch it and print an error message with details.

There are a few status codes you might also see such as 407 or 404. 407 is a common error found, especially when testing proxies- this indicates there's an authentication error. As all the proxies at Rampage are username:password authenticated, it's worth checking you've copied the details in right, being sure to make the username and password of the proxies are correct. Error 404 would indicate the resource isn't found; check the URL you are requesting and make sure it hasn't been copied wrong.

except Exception as e:
    print(f"Request Failed - {e}")

You can test this yourself by altering the proxy string, changing the credentials and forcing a 407:

Proxy Tester

With a few small modifications, we can turn the script above into a basic proxy tester. This will use the code from above with some slight alterations to help keep track of what we've tested and also keep inline with best practises; such as adding a random time delay before the next request is made:

import random
import requests
import time

proxies = []

with open("proxies.txt", "r") as f:
    proxylist = f.read().splitlines()
    for proxy in proxylist:
        ip, port, user, pw = proxy.split(":")
        proxies.append(f"http://{user}:{pw}@{ip}:{port}")

total_tested = 0
successful = 0
failed = 0

for proxy in proxies:
    try:
        time.sleep(random.uniform(1, 5)) 

        proxy_config = {
            "http": proxy,
            "https": proxy
        }

        url = "https://ipinfo.io/json"
        r = requests.get(url, proxies=proxy_config)
        
        total_tested += 1

        if r.status_code == 200:
            successful += 1
            print(f"Success: {r.status_code}\nProxy: {proxy}\nIP: {r.json()['ip']}\nLocation: {r.json()['country']}\n")
        else:
            failed += 1
            print(f"Failed: {r.status_code}\nProxy: {proxy}")

    except Exception as e:
        failed += 1
        print(f"Request Failed: {e}\nProxy: {proxy}")

print(f"\nTotal Proxies Tested: {total_tested}")
print(f"Success: {successful}")
print(f"Failed Requests: {failed}")

This is the basic script for a proxy tester. Using the proxies pre-allocated in the same txt file as before, it will randomly request the same website, taking the key information from the proxy and printing it out. If a proxy fails, you'll be provided the status code and the failure reason in the same output. At the end, you'll be presented with the statistics of the test such as the number of proxies tested, the number of successful tests, and number of failures. This is for the current test, and will reset each time the test is run. You could also chose to exort these to a file elsewhere to keep track.

Using the function of time.sleep(random.uniform(1, 5)) allows us to create a random time between each request and can be configured accordingly. This also helps in lowering the resource requirement by slowing down the amount of requests made and making our activity appear more "human" like.

Conclusion

Armed with the knowledge and the proxies, you’re ready to take on the world of web scraping. Remember, web scraping is a tool- and there are best practices and rules to abide by. If you’re ready to take on the next step of browser automation, why not check out our guide on Selenium?

Frequently asked questions

Rampage allows purchase from 10 of the largest residential providers on one dashboard, starting at just 1GB. There's no need to commit to any large bandwidth packages. Through our dashboard, you're also given options such as static or rotating proxies and various targeting options, all for a single price per provider.

All purchases are made through the Rampage dashboard.

Rampage also offers high-quality, lightning-fast ISP and DC proxies available in the US, UK, and DE regions.

If you're unsure what provider would suit your use case base, please contact our support; we'll gladly assist.

Rampage Blogs

How to Use Python with Proxies

Owen Crisp

Introduction to Python

Getting Started

Reformatting Your Proxies

Building The Script

Error Handling

Proxy Tester

Conclusion

Frequently asked questions

Why Rampage is the best proxy platform

Unlimited Connections and IPs

Worldwide Support

Speedy Customer Support

Digital Dashboard