An HTTP API for scraping websites, based off our pre-rendering service so it's smart enough to wait for JavaScript apps to finish rendering.
See cURL and Node.js examples below, but since it's a simple GET request to a standard HTTP API, you can use any language that can make HTTP requests
curl --header "X-Prerender-Token: secret-token
Either use an env var (best practice):
PRERENDER_TOKEN=secret-token node app.js
or hard-code it (typically not a best practice to store secrets in code);
const prerendercloud = require("prerendercloud");
prerendercloud.set("prerenderToken", "mySecretToken");
Note: --compressed
is a best practice for cURL, it tells cURL to use gzip compression, which is supported by the API (and virtually all HTTP servers)
cURL Scrape Website
curl --compressed \
https://service.headless-render-api.com/scrape/https://example.com > out.html
Setting options in cURL via HTTP header
Set scrape options via HTTP headers, with cURL this would look like:
curl --compressed \
--header 'Prerender-Emulated-Media: print' \
https://service.headless-render-api.com/scrape/https://example.com
Note: http headers are case-insensitive
all scrape options as HTTP headers:
Node.js Scrape Website
// https://github.com/sanfrancesco/prerendercloud-nodejs#scrape
const prerendercloud = require("prerendercloud");
// set your API key via env var PRERENDER_TOKEN or the following line:
// prerendercloud.set('prerenderToken', 'mySecretToken')
// Headless-Render-API will scrape the HTML, parse
// various meta tags, links, and take a screenshot all
// on the server. The client only parses the results
const {
body,
meta: {
title,
h1,
description,
ogImage,
ogTitle,
ogDescription,
ogType,
twitterCard,
},
links, // array
screenshot, // buffer
statusCode, // number
headers, // object of headers, e.g. { 'content-type': 'text/html' }
} = await prerendercloud.scrape("https://example.com", {
withMetadata: true,
withScreenshot: true,
// followRedirects: false, // Default: false
// deviceIsMobile: false, // Default: false, whether the meta viewport tag is taken into account
});
fs.writeFileSync("body.html", body);
fs.writeFileSync("screenshot.png", screenshot);
console.log({
meta: {
title,
h1,
description,
ogImage,
ogTitle,
ogDescription,
ogType,
twitterCard,
},
links,
});
// scrape example.com with default options
const html = await prerendercloud.scrape("http://example.com");
fs.writeFileSync("out.html", html);
// scrape example.com with all options shown for reference
const html = await prerendercloud.scrape("https://headless-render-api.com", {
// withScreenshot: false, // Default: false
// withMetadata: false, // Default: false
// waitExtraLong: false, // Default: false
// dontWaitForWebSockets: false, // Default: false
// blockCookies: false, // Default: false
// followRedirects: false, // Default: false
// deviceWidth: 1366, // Default: 1366
// deviceHeight: 768, // Default: 768
// deviceIsMobile: false, // Default: false
// (screen, print, braille, embossed, handheld, projection, speech, tty, tv)
// emulatedMedia: "screen", // Default: "screen"
});
fs.writeFileSync("out.html", html);