Convert PDFs to images on AWS Lambda using Python without pdf.la
In this post, I’m going to share my experience on how to convert a PDF file into an image on AWS Lambda function and how I deal with the
pdf.la: file not found error.
In “previous” AWS Lambda operation environment, it is very easy to convert PDF into images using Python
Wand library on AWS Lambda.
Wand library is based on connecting ImageMagick api to manipulate files in Python.
However, this is not a happy ending.
The story just began
Things happened when AWS updated the lambda environment 2018.03 of Amazon Linux. This upgrade influences some core system libraries such as
pdf.la , a package that ImageMagick relies on to process PDF files. This change causes ImageMagick fail to convert any PDF file to image. The Python
wand package depends heavily on ImageMagick. Therefore, the conversion of PDF to image is not working as well. The error was showed as following:
/usr/lib64/ImageMagick-6.7.8/modules-Q16/coders/pdf.la': file not found
I have tried multiple ways to rebuild ImageMagick on AWS Lambda, but I still cannot find a way to make it work. If anyone has the solution about how to run
wandon AWS Lambda, feel free to share.
Work Around — pdf2image
pdf2image is a package utilizing pdftoppm and pdftocairo (parts of poppler-utils) to convert PDF to PIL Image object. Unfortunately, pdftoppm and pdftocairo are not part of the default software on AWS Lambda. Therefore, we need to install them manually.
If you want to know how to install third party command line software on AWS Lambda, please visit my post: How to install third-party command line software on AWS Lambda using Python Chalice. I will assume that both pdftoppm and pdftocairo are ready in the following steps.
It’s working 🎉! We’ve successfully fetched a
.jpg image file.
Thanks for reading!