There are quite a few cases in which we’d like to be able to output a dynamic PDF (invoices, statements, receipts, etc.) However, our experience has been that working with PDF templates and editors is fairly painful. We would instead like to be able to work with the tools we’re familiar with, HTML and CSS. As such, we needed a mechanism which takes HTML as input and returns a PDF as output.

Creating the HTML to PDF Lambda

Creating a custom Lambda is fairly straightforward. You’ll need to log into your AWS account, head over to Services > Compute > Lambda, and click “Create a Lambda function”. You’ll be asked to select a blueprint, and you’ll choose blank. Next it will ask you to configure a trigger, but let’s skip that for now. The next section allows you to configure your function, and has a ton of options. We really only care about a handful of these.

lambda function configuration

Name, description, and runtime are obvious. The index.handler tells the lambda to look for an index.js in the root of the project. The only value that we need to change here is the IAM role. We’ve created lambda_s3_exec_role which will let our lambda read from / write to S3, which we’ll need to do our transformation.

Unfortunately since we have external dependencies we aren’t able to use the inline code entry type, and we’ll have to package up our code and upload it.

Create the Project

mkdir html-to-pdf
cd html-to-pdf
touch index.js
vim index.js

Let’s take a run at a first implementation. To start we’re going to read in the html in base64 from the event, that way we can test it from the configuration screen directly (extended from the example code here).

var wkhtmltopdf = require('wkhtmltopdf');
var MemoryStream = require('memorystream');
var AWS = require('aws-sdk');

process.env['PATH'] = process.env['PATH'] + ':' + process.env['LAMBDA_TASK_ROOT'];

exports.handler = function(event, context) {
  var memStream = new MemoryStream();
  var html_utf8 = new Buffer(event.html_base64, 'base64').toString('utf8');

  wkhtmltopdf(html_utf8, event.options, function(code, signal) {
    var pdf = memStream.read();
    var s3 = new AWS.S3();

    var params = {
      Bucket : "my-bucket",
      Key : "test.pdf",
      Body : pdf
    }

    s3.putObject(params, function(err, data) {
      if (err) {
        console.log(err)
      } else {
        context.done(null, { pdf_base64: pdf.toString('base64') });
      }
    });
  }).pipe(memStream);
};

We need to add our dependencies to the project; we’ll do that with yarn add wkhtmltopdf memorystream aws-sdk.

Then we can zip the project and we’ll be ready to upload zip --exclude \*.git\* -r release.zip .

Test the Project

We’ll start by configuring a test event.

configure test event

The following base64 string PGJvZHk+SGVsbG8gd29ybGQ8L2JvZHk+ corresponds to <body>Hello world</body>. So we’ll add that to our test event.

test-event

Go ahead and click Save and Test and you should see “Execution result: succeeded” with the following output:

{
  "pdf_base64": "JVBERi0xLjQKMSAwIG9iago8PAovVGl0bGUgKP7..."
}

If you check your S3 bucket, you should also see a test.pdf that looks like this:

hello world pdf

Add a Trigger

Adding a trigger is really straight forward. Let’s go to the triggers tab and click “Add trigger”. We want it to trigger when an object is created, and we only want it to fire for html files.

lambda s3 trigger

Finished HTML to PDF converter

Let’s fast forward a bit. I’ve added the ability to read the html file from a bucket and write the corresponding pdf back to that bucket. The file and bucket name can be read from the trigger event. We can now upload an html file to this bucket and a corresponding pdf will be created.

var wkhtmltopdf = require('wkhtmltopdf');
var MemoryStream = require('memorystream');
var AWS = require('aws-sdk');

process.env['PATH'] = process.env['PATH'] + ':' + process.env['LAMBDA_TASK_ROOT'];

var convertToPdf = function(htmlUtf8, event, callback) {
  var memStream = new MemoryStream();
  wkhtmltopdf(htmlUtf8, event.options, function(code, signal) {
    callback(memStream.read());
  }).pipe(memStream);
}

exports.handler = function(event, context) {
  if(event.Records) {
    var bucketName = event.Records[0].s3.bucket.name;
    var fileName = decodeURIComponent(event.Records[0].s3.object.key.replace(/\+/g, " "));

    var params = {
      Bucket: bucketName,
      Key: fileName
    };

    var s3 = new AWS.S3();
    s3.getObject(params, function(err, data) {
      var htmlUtf8 = data.Body.toString('utf8');
      convertToPdf(htmlUtf8, event, function(pdf) {
        params.Body = pdf;
        params.Key = params.Key + ".pdf";
        s3.putObject(params, function(err, data) {
          context.done(null, { pdf_base64: pdf.toString('base64') });
        })
      });
    });
  }
};

Creating the URL to HTML Lambda

The first lambda works great, and is actually all we’re using for our original use case. However, uploading html files to S3 is a little clunky. It would be nice if we could provide a url and have the html automatically created and uploaded to our bucket.

Creating the project and the Lambda is almost exactly the same as for the previous lambda (same handler and role), but this time we won’t add a trigger. We’ll start out testing from within the lambda configuration area, and later set up an API Gateway to call it.

Here’s what our code is going to look like:

var http = require('http');
var AWS = require('aws-sdk');

exports.handler = function(event, context) {
  var options = {
    host: event.host,
    path: event.path
  };

  var req = http.request(options, function (res) {
    var data = '';
  
    res.on('data', function (chunk) {
      data += chunk;
    });

    res.on('end', function () {
      var params = {
        Bucket: event.bucket,
        Key: event.fileName,
        Body: data
      };

      var s3 = new AWS.S3();
      s3.putObject(params, function(err, data) {
        if (err) {
          context.fail(err)
        } else {
          context.succeed(params);
        }
      })
    });
  });

  req.on('error', function (e) {
    context.fail(e.message)
  });

  req.end();
}

Lets test it with:

{
  "host": "www.google.com",
  "path": "/",
  "fileName": "test-file.html",
  "bucket": "my-bucket"
}

You should see the following output:

{
  "Bucket": "my-bucket",
  "Key": "test-file.html",
  "Body": "..."
}

And the html and pdf files should have been created:

test files

Putting it all together behind an API Gateway

Select Services > Amazon API Gateway and click Create API. Select the New API radio button and give your API a name.

create api

Next we want to add a named resource (from the Actions dropdown).

create resource

And a POST method which will call your lambda region and function.

post method

Deploy your API.

deploy api

If you click on the POST method under your develop stage you’ll see the url you can use to test the API.

develop stage post method

We can throw this in Postman and see what happens.

postman

Success! Let’s check our S3 bucket to make sure it made it there as well.

We can also change the path, like so:

s3 bucket

Limitations

Since we’re using the http library to request the page from an AWS machine somewhere, there are a few limitations. For this project, the pages we’re interested in are being served on an internal network, which is not accessible publicly. However, even if it were public, it is still behind a login and we would need a way to create the request with a valid session token (essentially an intentional CSRF attack).

For now, we’re rendering a partial to string and sending that over to our S3 bucket that is set up to trigger the html-to-pdf Lambda. We then read from the bucket (with a retry mechanism in case the lambda isn’t instant) and stream the pdf bytes to the client.