Website to PDF using AWS Lambda Function URLs

Website to PDF using AWS Lambda Function URLs

AWS released a nifty feature for Lambda’s recently – Function URLs.

As a serverless compute service, AWS Lambda allows you to write functions in any language(with Docker image support), and execute them without provisioning resources manually.

These functions respond to different events like an HTTP request, or when a DynamoDB table is updated.

A very popular option to trigger the lambda using HTTP is by integrating it with an API gateway or a load balancer. They provide advanced features like request validation, throttling, custom authorizers, caching, etc. And the cost associated with an API gateway may end up more than the cost of executing the lambda function too.

Lambda now allows you to create a function URL easily, and you get an HTTPS endpoint at no additional cost. You can set up IAM authentication or disable it and roll up your authentication mechanism.

  • Lambda function URLs have a 15-minute maximum timeout compared to 30 seconds of API gateway
  • You cannot create a custom domain. AWS will generate a URL similar to https://<unique-id>.lambda-url.<region>.on.aws
  • Provides CORS support

Let’s roll up a quick example by building a lambda function that converts a web page into a PDF file, and triggers it using the function URL!

We will use AWS CDK to define and deploy our infrastructure.

And the lambda function is going to use NodeJS.

With a few lines of Javascript we can develop and deploy our function that converts a webpage to a PDF file.

The chrome-aws-lambda is a very useful library that provides Chromium Binary for AWS Lambda and Google Cloud Functions. Using this, we can run puppeteer in a Lambda function.

Puppeteer is a headless Chrome Node.js API and allows you to do most things that you can do on a desktop browser, like crawling the webpage, UI testing, taking screenshots, and saving as PDF files.

Let’s install these dependencies first.

npm install chrome-aws-lambda puppeteer

and import it to our function.

const chromium = require('chrome-aws-lambda');

The Lambda function URL request and response formats are the same as that of the API gateway and are documented at https://docs.aws.amazon.com/lambda/latest/dg/urls-invocation.html

The query parameters can be read from the queryStringParameters of the event object.

const url = event.queryStringParameters.url;

Let’s create an instance of the Chromium browser.

const browser = await chromium.puppeteer.launch({
    args: chromium.args,
    headless: true,
    ignoreHTTPSErrors: true,
    defaultViewport: chromium.defaultViewport,
    executablePath: await chromium.executablePath,
  });

And navigate to the URL that we need to convert to PDF.

const page = await browser.newPage();
await page.goto(url, { waitUntil: 'networkidle0' });

And finally, generate the PDF.

const buffer = await page.pdf({
    scale: 1,
    displayHeaderFooter: false,
});

The last thing is to send the response. We need to convert the binary data to base64 and set the isBase64Encoded property to true.

return {
    statusCode: 200,
    headers: {
      'Content-type': 'application/pdf',
    },
    body: buffer.toString('base64'),
    isBase64Encoded: true,
  };

And we have our function ready!

Now, let’s deploy this using CDK.

Install the CDK cli if it’s not already installed.

npm install -g aws-cdk

Initialize a new CDK project.

cdk init app --language javascript

And let’s build our stack.

We will make use of the container image support for lambda. Chromium and Puppeteer are pretty heavy libraries and will exceed the 50 MB zipped size limit of lambda. Container images can go up to 10 GB, and it’s straightforward to build one.

Let’s define the Dockerfile.

FROM public.ecr.aws/lambda/nodejs:14

COPY website-to-pdf-function.js package.json package-lock.json ${LAMBDA_TASK_ROOT}

RUN npm install

CMD [ "website-to-pdf-function.handler" ]

We will also need to bump up the memory size to make sure that Puppeteer is able to load the webpage and generate PDFs correctly. To be safe, let’s use 512 MB. The execution timeout is set to 4 minutes, but any reasonable time can be set instead.

    const websiteToPDFFunction = new lambda.DockerImageFunction(this, 'websiteToPDFFunction', {
      functionName: 'websiteToPDFFunction',
      timeout: cdk.Duration.minutes(4),
      memorySize: 512,
      code: lambda.DockerImageCode.fromImageAsset(path.join(__dirname, '../functions')),
    });

This will build the Dockerfile in the ../functions directory, create an ECR, push the image there, and create a lambda function for you!

The next step is to create a function URL for the lambda, which is pretty easy too.

const websiteToPDFFunctionURL = websiteToPDFFunction.addFunctionUrl({
      authType: lambda.FunctionUrlAuthType.NONE,
});

And let’s output the created URL.

new cdk.CfnOutput(this, 'websiteToPDFFunctionURL', {
      value: websiteToPDFFunctionURL.url,
      description: 'Website to PDF Function URL',
});

Let’s synthesize the Cloudfromation templates and deploy them.

cdk synth
cdk deploy

After the deployment is completed, you will see the function URL in the output.

[100%] success: Published d1f6fc9b1385b546ac0b10a0af983b0b4a336d97f4231808d03af3a5b489aecc:current_account-current_region
WebsiteToPDFStack: creating CloudFormation changeset...

 ✅  WebsiteToPDFStack

✨  Deployment time: 290.84s

Outputs:
WebsiteToPDFStack.websiteToPDFFunctionURL = https://unique-id.lambda-url.ca-central-1.on.aws/
Stack ARN:
arn:aws:cloudformation:ca-central-1:1234567890:stack/WebsiteToPDFStack/777d7f40-ca15-11ec-8cd9-0aaad63835aa

✨  Total time: 292.45s

It’s testing time! In the function URL, provide a URL query parameter with the website that you’d want to convert to PDF.

https://unique-id.lambda-url.ca-central-1.on.aws/?url=https://jobinbasani.com/

Depending on how big the site is, you should be able to access a PDF version of the website that you requested.

In my Desktop chrome browser, it opens a PDF version of the website nicely 🙂

jobinbasani.com rendered as PDF

After your testing, destroy the stack

cdk destroy

The full source code is available here – https://github.com/jobinbasani/aws-lambda-website-to-pdf

3 Comments

Leave a Reply to Jobin Basani Cancel reply

Your email address will not be published. Required fields are marked *