Scrape API

This document describes the Scrape functionality/endpoint, its associated HTTP header options, and possible error responses.

Endpoint: service.headless-render-api.com

Scrape Path: /scrape/$URL

Example URL: 

https://service.headless-render-api.com/scrape/https://example.com/

The target $URL is obviously part of the path, but do not re-encode or re-escape it beyond what you'd enter into a browser URL field. Both UTF-8 encoding and percent-encoding are acceptable.

Related links:

Auth

Send your secret API token (you'll get it after creating an account) as a header with all your requests to avoid rate limiting.
X-Prerender-Token
  • curl --header "X-Prerender-Token: secret-token"

Webpage Scrape API

A scraping API. Try it in your browser: service.headless-render-api.com/scrape/https://example.com

cURL

curl https://service.headless-render-api.com/scrape/https://example.com/ > out.html

Node.js client

// npm install prerendercloud
const { body, meta, screenshot, statusCode, headers } = await prerendercloud.scrape("https://example.com/", {
  withMetadata: true,
  withScreenshot: true,
});

Optional HTTP Headers for Headless-Render-API Server Behavior

prerender-wait-extra-long
  • Waits longer than normal for a page to finish rendering. Similar to Puppeteer's { waitUntil: 'networkidle' }. Useful for pages that depend on AJAX/XHR that fire late or IPFS hosted pages
  • curl --header "Prerender-Wait-Extra-Long: true"
prerender-dont-wait-for-web-sockets
  • By default, headless-render-api.com will wait for all ws activity to finish, but it doesn't make sense to "wait" for them to finish if they never stop. An example: real time prices on a stock price website.
  • curl --header "prerender-dont-wait-for-web-sockets: true"
prerender-block-cookies
  • By default, headless-render-api.com sends cookies back to the server. Use this to block them.
  • curl --header "prerender-block-cookies: true"
prerender-follow-redirects
  • By default, if your origin server returns 301/302, headless-render-api.com will just return that outright - which is appropriate for the common use case of proxying traffic through headless-render-api.com. If using the API in a crawling/scraping/batching you may want to follow redirects.
  • curl --header "Prerender-Follow-Redirects: true"

Optional HTTP Headers for Device Metrics

prerender-device-width
  • Overrides device screen width (default: 1366)
  • curl --header "prerender-device-width: 800"
prerender-device-height
  • Overrides device screen height (default: 768)
  • curl --header "prerender-device-height: 600"
prerender-device-is-mobile
  • Whether to emulate mobile device (default: false).

    This includes viewport meta tag, overlay scrollbars, text autosizing and more. In other words, whether the meta viewport tag is taken into account.

    an example of "viewport meta tag":
  • curl --header "prerender-device-is-mobile: true"

Optional HTTP Headers for Browser Emulated Media

prerender-emulated-media
  • Emulates the given media type or media feature for CSS media queries

    default: screen

    Possible values are: screen, print, braille, embossed, handheld, projection, speech, tty, tv
  • curl --header "Prerender-Emulated-Media: screen"

Optional HTTP Headers for Configure Server Response

prerender-with-screenshot
  • Changes the response from text/html to application/json and returns an object 2 fields: {body, screenshot}
    The values are base64 encoded:
    • body is the normal pre-rendered response
    • screenshot is a base64 encoded PNG

    This is useful for savings screenshots or injecting them as open graph images. Often used with Prerender-With-Metadata

  • curl --header "Prerender-With-Screenshot: true"
prerender-with-metadata
  • Changes the response from text/html to application/json and returns an object 3 fields: {body, meta, links}
    The values are base64 encoded:
    • body is the normal pre-rendered response
    • meta is an object that includes  { title, metaDescription, h1, ogImage, ogTitle, twitterCard }
    • links is an array of URLs/paths on the page

    This is useful for capturing SEO-relevant metadata during the pre-render.

  • curl --header "Prerender-With-Metadata: true"

HTTP Error Codes

400
  • Invalid Request
  • There will be an error message in the HTML, fix your request and retry
  • Example causes:
    • malformed URL
    • a localhost URL/IP
    • or a page responds with content-type: application/octet-stream
429
  • Rate limited
  • Requests made without API tokens (or expired/missing billing information) will get this - see pricing
500
  • General Error
  • Example causes:
    • 10s (timeout) while waiting for a page to finish rendering (waits until all in-flight requests finish, load event, domContent event etc.)
    • or HTTPS Page is making HTTP (non secure) requests
    • ...or a random bug on our end
  • The error will show up in the headless-render-api.com web UI (after you sign in)
502
  • Bad Gateway (your origin returned 5xx)
  • It means we received a 5xx when trying to visit your page
  • Or it means we received a 403 (forbidden). Typically this means your page is behind a login wall or a firewall (like Cloudflare) is blocking the headless-render-api user-agent (make an exception to allow user-agents matching /prerendercloud/)
  • Probably not retryable. It depends on the site you're requesting. If it's your site, make sure it's up and running correctly
503
  • Over capacity (Rare)
  • Retry the request with some backoff
  • We'll see the error and our autoscaler will increase capacity within 5 minutes, but you should email us anyway: support@headless-render-api.com
504
  • Gateway Timeout (Rare)
  • Retry the request with some backoff
  • This is unexpected and should not happen. We'll see it, but you should email us anyway: support@headless-render-api.com