{"id":2183,"date":"2021-01-10T06:00:16","date_gmt":"2021-01-10T06:00:16","guid":{"rendered":"https:\/\/thenextweb.com\/?p=1333462"},"modified":"2021-01-10T06:00:16","modified_gmt":"2021-01-10T06:00:16","slug":"how-to-turn-web-pages-into-pdfs-with-puppeteer-and-nodejs","status":"publish","type":"post","link":"https:\/\/www.londonchiropracter.com\/?p=2183","title":{"rendered":"How to turn web pages into PDFs with Puppeteer and NodeJS"},"content":{"rendered":"\n<p>As a web developer, you may have wanted to generate a PDF file of a web page to share with your clients, use it in presentations, or add it as a new feature in your web app. No matter your reason, Puppeteer, Google\u2019s Node API for headless Chrome and Chromium, makes the task quite simple for you.<\/p>\n<p>In this tutorial, we will see how to convert web pages into PDF with Puppeteer and Node.js. Let\u2019s start the work with a quick introduction to what Puppeteer is.<\/p>\n<h2 id=\"what-is-puppeteer-and-why-is-it-awesome\">What is Puppeteer, and why is it awesome?<\/h2>\n<p>In Google\u2019s own words,<span>&nbsp;<\/span><a href=\"https:\/\/github.com\/puppeteer\/puppeteer\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Puppeteer<\/a><span>&nbsp;<\/span>is, \u201cA Node library which provides a high-level API to control headless Chrome or Chromium over the DevTools Protocol.\u201d<\/p>\n<p><em>[Read:&nbsp;<a class=\"c-link c-message_attachment__title_link\" href=\"https:\/\/thenextweb.com\/dutch-disruptors\/2020\/12\/15\/meet-the-4-scale-ups-using-data-to-save-the-planet\/\" target=\"_blank\" rel=\"noreferrer noopener\" data-qa=\"message_attachment_title_link\"><span dir=\"auto\">Meet the 4 scale-ups using data to save the planet<\/span><\/a>]<\/em><\/p>\n<h3 id=\"what-is-a-headless-browser\">What is a headless browser?<\/h3>\n<p>If you are unfamiliar with the term headless browsers, it\u2019s simply a browser without a GUI. In that sense, a headless browser is simply just another browser that understands how to render HTML web pages and process JavaScript. Due to the lack of a GUI, the interactions with a headless browser take place over a command line.<\/p>\n<p>Even though Puppeteer is mainly a headless browser, you can configure and use it as non-headless Chrome or Chromium.<\/p>\n<h3 id=\"what-can-you-do-with-puppeteer\">What can you do with Puppeteer?<\/h3>\n<p>Puppeteer\u2019s powerful browser-capabilities make it a perfect candidate for web app testing and web scraping.<\/p>\n<p>To name a few use cases where Puppeteer provides the perfect functionalities for web developers,<\/p>\n<ul>\n<li>Generate PDFs and screenshots of web pages<\/li>\n<li>Automate form submission<\/li>\n<li>Scrape web pages<\/li>\n<li>Perform automated UI tests while keeping the test environment up-to-date.<\/li>\n<li>Generating pre-rendered content for Single Page Applications (SPAs)<\/li>\n<\/ul>\n<h2 id=\"set-up-the-project-environment\">Set up the project environment<\/h2>\n<p>You can use Puppeteer on the backend and frontend to generate PDFs. In this tutorial, we are using a Node backend for the task.<\/p>\n<p>Initialize NPM and set up the usual Express server to get started with the tutorial.<\/p>\n<p><span><\/p>\n<figure class=\"post-image post-mediaBleed alignnone\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-1333471 lazy\" src=\"https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/01\/Screenshot-2021-01-08-at-10.52.03.png\" alt width=\"934\" height=\"404\" sizes=\"(max-width: 934px) 100vw, 934px\" data-lazy=\"true\" data-srcset=\"https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/01\/Screenshot-2021-01-08-at-10.52.03.png 934w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/01\/Screenshot-2021-01-08-at-10.52.03-280x121.png 280w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/01\/Screenshot-2021-01-08-at-10.52.03-540x234.png 540w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/01\/Screenshot-2021-01-08-at-10.52.03-270x117.png 270w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/01\/Screenshot-2021-01-08-at-10.52.03-796x344.png 796w\"><\/figure>\n<p><\/span><\/p>\n<p>Make sure to install the Puppeteer NPM package with the following command before you start.<\/p>\n<p><span><\/p>\n<figure class=\"post-image post-mediaBleed alignnone\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-1333472 lazy\" src=\"https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/01\/Screenshot-2021-01-08-at-10.52.05.png\" alt width=\"924\" height=\"126\" sizes=\"(max-width: 924px) 100vw, 924px\" data-lazy=\"true\" data-srcset=\"https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/01\/Screenshot-2021-01-08-at-10.52.05.png 924w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/01\/Screenshot-2021-01-08-at-10.52.05-280x38.png 280w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/01\/Screenshot-2021-01-08-at-10.52.05-540x74.png 540w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/01\/Screenshot-2021-01-08-at-10.52.05-270x37.png 270w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/01\/Screenshot-2021-01-08-at-10.52.05-796x109.png 796w\"><\/figure>\n<p><\/span><\/p>\n<h2 id=\"convert-web-pages-to-pdf\">Convert web pages to PDF<\/h2>\n<p>Now we get to the exciting part of the tutorial. With Puppeteer, we only need a few lines of code to convert web pages into PDF.<\/p>\n<p>First, create a browser instance using Puppeteer\u2019s<span>&nbsp;<\/span><code>launch<\/code><span>&nbsp;<\/span>function.<\/p>\n<p><span><\/p>\n<figure class=\"post-image post-mediaBleed alignnone\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-1333473 lazy\" src=\"https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/01\/Screenshot-2021-01-08-at-10.52.55.png\" alt width=\"922\" height=\"124\" sizes=\"(max-width: 922px) 100vw, 922px\" data-lazy=\"true\" data-srcset=\"https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/01\/Screenshot-2021-01-08-at-10.52.55.png 922w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/01\/Screenshot-2021-01-08-at-10.52.55-280x38.png 280w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/01\/Screenshot-2021-01-08-at-10.52.55-540x73.png 540w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/01\/Screenshot-2021-01-08-at-10.52.55-270x36.png 270w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/01\/Screenshot-2021-01-08-at-10.52.55-796x107.png 796w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/01\/Screenshot-2021-01-08-at-10.52.55-912x124.png 912w\"><\/figure>\n<p><\/span><\/p>\n<p>Then, we create a new page instance and visit the given page URL using Puppeteer.<\/p>\n<p><span><\/p>\n<figure class=\"post-image post-mediaBleed alignnone\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-1333474 lazy\" src=\"https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/01\/Screenshot-2021-01-08-at-10.52.58.png\" alt width=\"936\" height=\"400\" sizes=\"(max-width: 936px) 100vw, 936px\" data-lazy=\"true\" data-srcset=\"https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/01\/Screenshot-2021-01-08-at-10.52.58.png 936w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/01\/Screenshot-2021-01-08-at-10.52.58-280x120.png 280w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/01\/Screenshot-2021-01-08-at-10.52.58-540x231.png 540w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/01\/Screenshot-2021-01-08-at-10.52.58-270x115.png 270w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/01\/Screenshot-2021-01-08-at-10.52.58-796x340.png 796w\"><\/figure>\n<p><\/span><\/p>\n<p>We have set the<span>&nbsp;<\/span><code>waitUntil<\/code><span>&nbsp;<\/span>option to<span>&nbsp;<\/span><code>networkidle0<\/code>. When we use<span>&nbsp;<\/span><code>networkidle0<\/code><span>&nbsp;<\/span>option, Puppeteer waits until there are no new network connections within the last 500 ms. It is a way to determine whether the site has finished loading. It\u2019s not exact, and Puppeteer offers other options, but it is one of the most reliable for most cases.<\/p>\n<p>Finally, we create the PDF from the crawled page content and save it to our device.<\/p>\n<p><span><\/p>\n<figure class=\"post-image post-mediaBleed alignnone\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-1333475 lazy\" src=\"https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/01\/Screenshot-2021-01-08-at-10.53.36.png\" alt width=\"922\" height=\"666\" sizes=\"(max-width: 922px) 100vw, 922px\" data-lazy=\"true\" data-srcset=\"https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/01\/Screenshot-2021-01-08-at-10.53.36.png 922w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/01\/Screenshot-2021-01-08-at-10.53.36-280x202.png 280w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/01\/Screenshot-2021-01-08-at-10.53.36-374x270.png 374w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/01\/Screenshot-2021-01-08-at-10.53.36-187x135.png 187w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/01\/Screenshot-2021-01-08-at-10.53.36-796x575.png 796w\"><\/figure>\n<p><\/span><\/p>\n<p>The print to<span>&nbsp;<\/span><a href=\"https:\/\/github.com\/puppeteer\/puppeteer\/blob\/v5.5.0\/docs\/api.md#pagepdfoptions\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">PDF function<\/a><span>&nbsp;<\/span>is quite complicated and allows for a lot of customization, which is fantastic. Here are some of the options we used:<\/p>\n<ul>\n<li>\n<strong>printBackground<\/strong>: When this option is set to true, Puppeteer prints any background colors or images you have used on the web page to the PDF.<\/li>\n<li>\n<strong>path<\/strong>: Path specifies where to save the generated PDF file. You can also store it into a memory stream to avoid writing to disk.<\/li>\n<li>\n<strong>format<\/strong>: You can set the PDF format to one of the given options: Letter, A4, A3, A2, etc.<\/li>\n<li>\n<strong>margin<\/strong>: You can specify a margin for the generated PDF with this option.<\/li>\n<\/ul>\n<p>When the PDF creation is over, close the browser connection with<span>&nbsp;<\/span><code>browser.close()<\/code>.<\/p>\n<h2 id=\"build-an-api-to-generate-and-respond-pdfs-from-urls\">Build an API to generate and respond PDFs from URLs<\/h2>\n<p>With the knowledge we gather so far, we can now create a new endpoint that will receive a URL as a query string, and then it will stream back to the client the generated PDF.<\/p>\n<p>Here is the code:<\/p>\n<p><span><\/p>\n<figure class=\"post-image post-mediaBleed alignnone\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-1333476 lazy\" src=\"https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/01\/Screenshot-2021-01-08-at-10.54.11.png\" alt width=\"710\" height=\"1322\" sizes=\"(max-width: 710px) 100vw, 710px\" data-lazy=\"true\" data-srcset=\"https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/01\/Screenshot-2021-01-08-at-10.54.11.png 710w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/01\/Screenshot-2021-01-08-at-10.54.11-113x210.png 113w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/01\/Screenshot-2021-01-08-at-10.54.11-145x270.png 145w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/01\/Screenshot-2021-01-08-at-10.54.11-73x135.png 73w\"><\/figure>\n<p><\/span><\/p>\n<p>If you start the server and visit the<span>&nbsp;<\/span><code>\/pdf<\/code><span>&nbsp;<\/span>route, with a<span>&nbsp;<\/span><code>target<\/code><span>&nbsp;<\/span>query param containing the URL we want to convert. The server will serve the generated PDF directly without ever storing it on disk.<\/p>\n<p>URL example:<span>&nbsp;<\/span><code>http:\/\/localhost:3000\/pdf?target=https:\/\/google.com<\/code><\/p>\n<p>Which will generate the following PDF as it looks on the image:<\/p>\n<figure class data-src=\"\/post\/convert-web-pages-into-pdfs-with-puppeteer-and-nodejs\/resulting-pdf_hu9a429796f97b32aa78a0e7af898769fa_221035_700x0_resize_box_2.png\">\n<p><figure class=\"post-image post-mediaBleed aligncenter\"><img decoding=\"async\" loading=\"lazy\" class=\"lazy loaded lazy\" src=\"https:\/\/livecodestream.dev\/post\/convert-web-pages-into-pdfs-with-puppeteer-and-nodejs\/resulting-pdf_hu9a429796f97b32aa78a0e7af898769fa_221035_700x0_resize_box_2.png\" alt width=\"700\" height=\"601\" data-lazy=\"true\"><figcaption><a href=\"https:\/\/thenextweb.com\/syndication\/2021\/01\/10\/how-to-turn-web-pages-into-pdfs-with-puppeteer-and-nodejs\/#\" data-url=\"https:\/\/twitter.com\/intent\/tweet?url=https%3A%2F%2Fthenextweb.com%2Fsyndication%2F2021%2F01%2F10%2Fhow-to-turn-web-pages-into-pdfs-with-puppeteer-and-nodejs%2F&amp;via=thenextweb&amp;related=thenextweb&amp;text=Check out this picture on: Sample PDF capture\" data-title=\"Share Sample PDF capture on Twitter\" data-width=\"685\" data-height=\"500\" class=\"post-image-share popitup\" title=\"Share Sample PDF capture on Twitter\"><i class=\"icon icon--inline icon--twitter--dark\"><\/i><\/a>Sample PDF capture<\/figcaption><\/figure>\n<\/p>\n<\/figure>\n<p>That\u2019s it! You have completed the conversion of a web page to PDF. Wasn\u2019t that easy?<\/p>\n<p>As mentioned, Puppeteer offers many customization options, so make sure you play around with the opportunities to get different results.<\/p>\n<p>Next, we can change the viewport size to capture websites under different resolutions.<\/p>\n<h2 id=\"capture-websites-with-different-viewports\">Capture websites with different viewports<\/h2>\n<p>In the previously created PDF, we didn\u2019t specify the viewport size for the web page Puppeteer is visiting, instead used the default viewport size, 800\u00d7600px.<\/p>\n<p>However, we can precisely set the page\u2019s viewport size before crawling the page.<\/p>\n<p><span><\/p>\n<figure class=\"post-image post-mediaBleed alignnone\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-1333477 lazy\" src=\"https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/01\/Screenshot-2021-01-08-at-10.54.56.png\" alt width=\"706\" height=\"276\" sizes=\"(max-width: 706px) 100vw, 706px\" data-lazy=\"true\" data-srcset=\"https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/01\/Screenshot-2021-01-08-at-10.54.56.png 706w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/01\/Screenshot-2021-01-08-at-10.54.56-280x109.png 280w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/01\/Screenshot-2021-01-08-at-10.54.56-540x211.png 540w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/01\/Screenshot-2021-01-08-at-10.54.56-270x106.png 270w\"><\/figure>\n<p><\/span><\/p>\n<h2 id=\"conclusion\">Conclusion<\/h2>\n<p>In today\u2019s tutorial, we used Puppeteer, a Node API for headless Chrome, to generate a PDF of a given web page. Since you are now familiar with the basics of Puppeteer, you can use this knowledge in the future to create PDFs or even for other purposes like web scraping and UI testing.<\/p>\n<p><i><span>This <\/span><\/i><a href=\"https:\/\/livecodestream.dev\/post\/convert-web-pages-into-pdfs-with-puppeteer-and-nodejs\/\" target=\"_blank\" rel=\"nofollow noopener noreferrer\"><i><span>article<\/span><\/i><\/a><i><span> was originally published on <\/span><\/i><a href=\"https:\/\/livecodestream.dev\/\" target=\"_blank\" rel=\"nofollow noopener noreferrer\"><i><span>Live Code Stream<\/span><\/i><\/a><i><span> by <\/span><\/i><a href=\"https:\/\/www.linkedin.com\/in\/bajcmartinez\/\" target=\"_blank\" rel=\"nofollow noopener noreferrer\"><i><span>Juan Cruz Martinez<\/span><\/i><\/a><i><span> (twitter: <\/span><\/i><a href=\"https:\/\/twitter.com\/bajcmartinez\" target=\"_blank\" rel=\"nofollow noopener noreferrer\"><i><span>@bajcmartinez<\/span><\/i><\/a><i><span>), founder and publisher of Live Code Stream, entrepreneur, developer, author, speaker, and doer of things.<\/span><\/i><\/p>\n<p><a href=\"https:\/\/livecodestream.dev\/subscribe\" target=\"_blank\" rel=\"nofollow noopener noreferrer\"><i><span>Live Code Stream<\/span><\/i><\/a><i><span> is also available as a free weekly newsletter. Sign up for updates on everything related to programming, AI, and computer science in general.<\/span><\/i><\/p>\n<p> <a href=\"https:\/\/thenextweb.com\/syndication\/2021\/01\/10\/how-to-turn-web-pages-into-pdfs-with-puppeteer-and-nodejs\/\">Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>As a web developer, you may have wanted to generate a PDF file of a web page to share with your clients, use it in presentations, or add it as a new&#8230;<\/p>\n","protected":false},"author":1,"featured_media":2184,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[1],"tags":[],"_links":{"self":[{"href":"https:\/\/www.londonchiropracter.com\/index.php?rest_route=\/wp\/v2\/posts\/2183"}],"collection":[{"href":"https:\/\/www.londonchiropracter.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.londonchiropracter.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.londonchiropracter.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.londonchiropracter.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=2183"}],"version-history":[{"count":0,"href":"https:\/\/www.londonchiropracter.com\/index.php?rest_route=\/wp\/v2\/posts\/2183\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.londonchiropracter.com\/index.php?rest_route=\/wp\/v2\/media\/2184"}],"wp:attachment":[{"href":"https:\/\/www.londonchiropracter.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=2183"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.londonchiropracter.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=2183"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.londonchiropracter.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=2183"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}