Introduction to Puppeteer

Puppeteer is a popular JavaScript library with high automation and control over Chrome and Firefox. By default, Puppeteer runs “headless” with no visible UI but can be set to run “headful” with a visible UI.

While Puppeteer is powerful, some advanced websites implement anti-bot measures that can detect headless browsers. To bypass these, you’ll need to use more sophisticated techniques like stealth plugins or rotating proxies. In this guide, we’ll focus on Puppeteer and Chrome, the go-to browser for automation. By combining Puppeteer with other scraping tools like Cheerio or external APIs, you’ll have a robust setup for web scraping complex sites.

Like Selenium, Puppeteer can be used for many browser automation tasks, such as navigating web pages. You can use Puppeteer to open, refresh, or close web pages and navigate through the pages or tabs. You can also simulate user interactions by clicking buttons, typing into fields, hovering, scrolling, submissions, and uploading files.

Alongside the features above, there’s also a host of other features available through Puppeteer, such as:

The ability to generate PDFs of web pages, including adding headers, fittings, and page formats like orientation, margins, and scaling.
Handling pop-ups on websites and interacting with javascript alert and confirmation dialogues.
Taking screenshots of sites or specific areas with advanced features such as setting custom backgrounds or device frames.
Advanced content handling and management, such as waiting for elements to load or appear dynamically, are handy for Java-based websites.

Puppeteer is well documented and maintained. Here’s a link to the documentation.

Install Puppeteer

First, you’ll need to ensure that Node.js is installed on your system. This is easy to install. The easiest way to do this is to use the installer available here.

To check whether you have this installed already, you can run the following command in your terminal; this will show the current version installed:

node -v

Npm is the default package manager for Node.js, similar to pip for Python. You can also check the status of this by using this command in the terminal:

npm -v

Once we’ve established that these are installed, we can move on to installing Puppeteer. We’ll use the npm package manager to do this, although there are two other installation methods. If you want to explore these versions, view the documentation here. We’ll install Puppeteer by typing the following command:

npm i puppeteer

This can be installed on MacOS, Windows, or Linux-based systems. It will also install a recent version of Chrome for testing set to work with Puppeteer. In recent iterations, it will install the base and core packages. The base package (just called Puppeteer) is the product for browser automation, whereas the core package is mainly for using the DevTools protocol. As stated in their documentation, core should be used if you plan to connect to a remote browser or manage the browsers yourself. You can then quickly check if it’s installed by doing:

npm list puppeteer

The Basics

We can start by writing a small script to open a website, sit on the page, and then close when you give it the command- in this case, pressing the “enter” key on your keyboard. We’ll use the site “whatsmyip” as a reference because we’ll later be going through how to integrate proxies directly into the puppeteer script so we’ll need to see what our IP is now.

const puppeteer = require('puppeteer');
const readline = require('readline');

(async () => {
    const browser = await puppeteer.launch({
        headless: false, 
        defaultViewport: null, 
        args: ['--start-maximized'], 
    });

    const page = await browser.newPage();
    await page.goto('https://www.whatismyip.com/');

    const rl = readline.createInterface({
        input: process.stdin,
        output: process.stdout
    });

    rl.question('Press Enter to close the browser...', () => {
        browser.close();
        rl.close();
    });
})();

We need to cover a few key features here. The first is that Puppeteer will load headless by default. Instead, we’ll specify against this to have the browser open and navigate to the page. We’ll also tell Puppeteer that since we’re loading headed, we’re setting the page to the maximum viewport and going against the default which is usually 800x600- although this is optional. Together, these options will make sure the page renders in full resolution in a maximized window, simulating real-world browser behavior.

        headless: false, 
        defaultViewport: null, 
        args: ['--start-maximized'],

Together, these options will make sure the page renders in full resolution in a maximized window, simulating real-world browser behavior. Finally, we’ll give the script a method of termination. In this case, it’s pressing the enter key.

If you’re unfamiliar with using Javascript, to start the script, type in your terminal:

node {filename}.js

Using Proxies

Now that we know how to open and sit on a page, we’ll go over how to use proxies thoroughly. The good news is that all the proxies sold at Rampage are compatible with Puppeteer. If you’re yet to buy, you can check out the various providers we have here. There are 10 different residential providers for you to choose from.

The first addition we’ll make to the script above is to add an “argument”. The argument in question is to specify the proxy server that Puppeteer should use to route all browser traffic. In this case, this will be both the IP/domain of the proxy server and the port:

        args: [
            '--proxy-server=pr.rampageproxies.com:8888' 
        ],

We’ll then use the function of page.authenticate to pass the remaining credentials of the proxy: the username and password to the script In our case, we’ve used a rotating proxy. This proxy rotates per request, which gives a new IP every time.

    await page.authenticate({
        username: 'bXTeUdho-cc-us-pool-oproxy', 
        password: 'CaSwJqZV', 
    });

This script is similar to the one we covered in the previous, where it opens a page (another IP checker), waits, and then prompts the user to close. If you run this script repeatedly, the IP you see changes as the proxy rotates each request.

const puppeteer = require('puppeteer');
const readline = require('readline');

(async () => {
    try {
        const browser = await puppeteer.launch({
            headless: false,
            defaultViewport: null,
            args: [
                '--proxy-server=pr.rampageproxies.com:8888'
            ],
        });

        const page = await browser.newPage();

        // Proxy authentication
        await page.authenticate({
            username: 'bXTeUdho-cc-us-pool-oproxy',
            password: 'CaSwJqZV',
        });

        // Navigate to the page and capture the response object
        const response = await page.goto('https://www.showmyip.com/', {
            waitUntil: 'domcontentloaded',
            timeout: 120000 // 120 seconds
        });

        // Log the HTTP response status code
        console.log(`Response status code: ${response.status()}`);

        // Set up readline interface to wait for user input before closing the browser
        const rl = readline.createInterface({
            input: process.stdin,
            output: process.stdout
        });

        rl.question('Press Enter to close the browser...', () => {
            browser.close(); // Close the browser after Enter is pressed
            rl.close(); // Close the readline interface
        });

    } catch (error) {
        // Print out the error message for debugging
        console.error('Error occurred while loading the page:', error);
    }
})();

There are a few key differences between this script and the first one we made. The first is the addition of a few “stalling features” to keep the page open. By default, Puppeteer has a timeout of 30 seconds for page navigation. If a page takes longer to load, it will present a timeout error and close. To prevent this, we specified a longer timeout- in our case, it’s now 120 seconds. Although proxies are much faster, having this timeout gives breathing room to allow the page and its elements to load fully before the script terminates. In addition to this, we’ve also added the “waitUntil”, by the time the 'domcontentloaded' event fires, the browser has completed parsing the HTML and constructing the DOM (Document Object Model) tree.

Finally, as part of this script, we’ve included some extra functionality for error-catching and handling. You’ll get an output of the status code returned by the site:

A status code 200 indicates the request was made successfully, whereas 407 shows otherwise. The 407 was an authentication issue in this case, as the proxy password was incorrect. Printing the status code allows us to debug the issue, whether it’s a proxy issue or not.

Although not crucial, having easier ways to interact with the script is a desirable quality-of-life feature. In our case, we can add the functionality to check for extra keyboard inputs. In this addition to the script, we're adding a further path to control the Puppeteer browser session directly from the terminal. Using the process.stdin function, we listen for specific keyboard inputs- pressing "r" triggers a page refresh, and pressing "q" closes the browser. The script enables process.stdin.setRawMode(true) to capture single key presses without pressing enter. This is useful when automating browser actions, like refreshing a page to see updated results or terminating the session manually when we're done, adding a layer of control to Puppeteer's automation.

The script will prompt in the terminal to press either; this is a good way to either refresh the page to see the IP change (again, we’re using a rotating proxy) or to end it entirely:

await logStatusCode('https://www.showmyip.com/');
    console.log('Page loaded. Press "r" to refresh or "q" to quit.');

    process.stdin.setRawMode(true);
    process.stdin.resume(); 
    process.stdin.on('data', async (key) => {
        if (key.toString().trim() === 'r') {
            console.log('Refreshing...');
            await logStatusCode('https://www.showmyip.com/');
            console.log('Refreshed.');
        } else if (key.toString().trim() === 'q') {
            console.log('Quitting...');
            await browser.close();
            process.exit();
        }
    });

After we’ve made all these modifications, here’s the finished script. It will:
Open up the specific website using the proxy given.
Present you with the status code.
You can either refresh the page by pressing “r” or close it with “q”.

const puppeteer = require('puppeteer');

(async () => {
    const browser = await puppeteer.launch({
        headless: false,
        defaultViewport: null,
        args: ['--proxy-server=pr.rampageproxies.com:8888']
    });

    const page = await browser.newPage();
    await page.authenticate({ username: 'bXTeUdho-cc-us-pool-oproxy', password: 'CaSwJqZV' });

    const logStatusCode = async (url) => {
        try {
            const response = await page.goto(url, {
                waitUntil: 'domcontentloaded',
                timeout: 120000 // 120 seconds
            });
            console.log(`Status code: ${response.status()}`);
        } catch (error) {
            console.error('Error:', error);
        }
    };

    await logStatusCode('https://www.showmyip.com/');
    console.log('Page loaded. Press "r" to refresh or "q" to quit.');

    process.stdin.setRawMode(true);
    process.stdin.resume(); 
    process.stdin.on('data', async (key) => {
        if (key.toString().trim() === 'r') {
            console.log('Refreshing...');
            await logStatusCode('https://www.showmyip.com/');
            console.log('Refreshed.');
        } else if (key.toString().trim() === 'q') {
            console.log('Quitting...');
            await browser.close();
            process.exit();
        }
    });
})();

Conclusion

In this guide, we've covered the essentials of Puppeteer, from basic browser automation to advanced features like handling proxies. We walked through setting up Puppeteer, integrating proxies for anonymity and IP rotating capabilities, and adding interactive controls for real-time script management. With these tips and techniques, you’re all set to begin using Puppeteer with web automation and scraping tasks effectively.

Frequently Asked Questions

Rampage allows purchase from 10 of the largest residential providers on one dashboard, starting at just 1GB. There's no need to commit to any large bandwidth packages. Through our dashboard, you're also given options such as static or rotating proxies and various targeting options, all for a single price per provider.

All purchases are made through the Rampage dashboard.

Rampage also offers high-quality, lightning-fast ISP and DC proxies available in the US, UK, and DE regions.

If you're unsure what provider would suit your use case base, please contact our support; we'll gladly assist.

Rampage Blogs

How to use Proxies with Puppeteer

Owen Crisp

Introduction to Puppeteer

Install Puppeteer

The Basics

Using Proxies

Conclusion

Frequently Asked Questions

Why Rampage is the best proxy platform

Unlimited Connections and IPs

Worldwide Support

Speedy Customer Support

Digital Dashboard