Convert PDFs to images on AWS Lambda using Python without pdf.la

Craig Chen
2 min readOct 7, 2019
Photo by Markus Spiske on Unsplash

In this post, I’m going to share my experience on how to convert a PDF file into an image on AWS Lambda function and how I deal with the pdf.la: file not found error.

In “previous” AWS Lambda operation environment, it is very easy to convert PDF into images using PythonWand library on AWS Lambda. Wand library is based on connecting ImageMagick api to manipulate files in Python.

However, this is not a happy ending.

The story just began

Things happened when AWS updated the lambda environment 2018.03 of Amazon Linux. This upgrade influences some core system libraries such as pdf.la , a package that ImageMagick relies on to process PDF files. This change causes ImageMagick fail to convert any PDF file to image. The Python wand package depends heavily on ImageMagick. Therefore, the conversion of PDF to image is not working as well. The error was showed as following:

/usr/lib64/ImageMagick-6.7.8/modules-Q16/coders/pdf.la': file not found

I have tried multiple ways to rebuild ImageMagick on AWS Lambda, but I still cannot find a way to make it work. If anyone has the solution about how to run wand on AWS Lambda, feel free to share.

Work Around — pdf2image

pdf2image is a package utilizing pdftoppm and pdftocairo (parts of poppler-utils) to convert PDF to PIL Image object. Unfortunately, pdftoppm and pdftocairo are not part of the default software on AWS Lambda. Therefore, we need to install them manually.

If you want to know how to install third party command line software on AWS Lambda, please visit my post: How to install third-party command line software on AWS Lambda using Python Chalice. I will assume that both pdftoppm and pdftocairo are ready in the following steps.

My code

In the following code, I downloaded a sample pdf file from W3C., read the PDF file, passed it to the convert_from_bytes function, stored it as an .jpg file and returned it as a .jpg file back.

Result

It’s working 🎉! We’ve successfully fetched a .jpg image file.

Thanks for reading!

Craig Chen

Quantitative Analyst / Data Scientist

Recommended from Medium

Lists

See more recommendations