Scrape Webpage API#

Example Usage#

An HTTP API for scraping websites, based off our pre-rendering service so it's smart enough to wait for JavaScript apps to finish rendering.

See cURL and Node.js examples below, but since it's a simple GET request to a standard HTTP API, you can use any language that can make HTTP requests

cURL Auth#

curl --header "X-Prerender-Token: secret-token

Node.js Auth#

Either use an env var (best practice):

PRERENDER_TOKEN=secret-token node app.js

or hard-code it (typically not a best practice to store secrets in code);

const prerendercloud = require("prerendercloud");
prerendercloud.set("prerenderToken", "mySecretToken");

cURL Examples#

Note: --compressed is a best practice for cURL, it tells cURL to use gzip compression, which is supported by the API (and virtually all HTTP servers)

cURL Scrape Website

curl --compressed \
  https://service.headless-render-api.com/scrape/https://example.com > out.html

Setting options in cURL via HTTP header

Set scrape options via HTTP headers, with cURL this would look like:

curl --compressed \
  --header 'Prerender-Emulated-Media: print' \
  https://service.headless-render-api.com/scrape/https://example.com

Note: http headers are case-insensitive

all scrape options as HTTP headers:

  • prerender-wait-extra-long
  • prerender-dont-wait-for-web-sockets
  • prerender-block-cookies
  • prerender-follow-redirects
  • prerender-device-width
  • prerender-device-height
  • prerender-device-is-mobile
  • prerender-with-screenshot
  • prerender-with-metadata
  • Go here for full docs on these options

Node.js Scrape Website

// https://github.com/sanfrancesco/prerendercloud-nodejs#scrape
const prerendercloud = require("prerendercloud");

// set your API key via env var PRERENDER_TOKEN or the following line:
// prerendercloud.set('prerenderToken', 'mySecretToken')

// Headless-Render-API will scrape the HTML, parse
// various meta tags, links, and take a screenshot all
// on the server. The client only parses the results
const {
  body,
  meta: {
    title,
    h1,
    description,
    ogImage,
    ogTitle,
    ogDescription,
    ogType,
    twitterCard,
  },
  links, // array
  screenshot, // buffer
  statusCode, // number
  headers, // object of headers, e.g. { 'content-type': 'text/html' }
} = await prerendercloud.scrape("https://example.com", {
  withMetadata: true,
  withScreenshot: true,
  // followRedirects: false, // Default: false
  // deviceIsMobile: false, // Default: false, whether the meta viewport tag is taken into account
});

fs.writeFileSync("body.html", body);
fs.writeFileSync("screenshot.png", screenshot);

console.log({
  meta: {
    title,
    h1,
    description,
    ogImage,
    ogTitle,
    ogDescription,
    ogType,
    twitterCard,
  },
  links,
});

// scrape example.com with default options
const html = await prerendercloud.scrape("http://example.com");
fs.writeFileSync("out.html", html);

// scrape example.com with all options shown for reference
const html = await prerendercloud.scrape("https://headless-render-api.com", {
  // withScreenshot: false, // Default: false
  // withMetadata: false, // Default: false
  // waitExtraLong: false, // Default: false
  // dontWaitForWebSockets: false, // Default: false
  // blockCookies: false, // Default: false
  // followRedirects: false, // Default: false
  // deviceWidth: 1366, // Default: 1366
  // deviceHeight: 768, // Default: 768
  // deviceIsMobile: false, // Default: false
  // (screen, print, braille, embossed, handheld, projection, speech, tty, tv)
  // emulatedMedia: "screen", // Default: "screen"
});

fs.writeFileSync("out.html", html);