Convert PDFs to images on AWS Lambda using Python without pdf.la
--
In this post, I’m going to share my experience on how to convert a PDF file into an image on AWS Lambda function and how I deal with the pdf.la: file not found error.
In “previous” AWS Lambda operation environment, it is very easy to convert PDF into images using PythonWand
library on AWS Lambda. Wand
library is based on connecting ImageMagick api to manipulate files in Python.
However, this is not a happy ending.
The story just began
Things happened when AWS updated the lambda environment 2018.03 of Amazon Linux. This upgrade influences some core system libraries such as pdf.la
, a package that ImageMagick relies on to process PDF files. This change causes ImageMagick fail to convert any PDF file to image. The Python wand
package depends heavily on ImageMagick. Therefore, the conversion of PDF to image is not working as well. The error was showed as following:
/usr/lib64/ImageMagick-6.7.8/modules-Q16/coders/pdf.la': file not found
I have tried multiple ways to rebuild ImageMagick on AWS Lambda, but I still cannot find a way to make it work. If anyone has the solution about how to run
wand
on AWS Lambda, feel free to share.
Work Around — pdf2image
pdf2image is a package utilizing pdftoppm and pdftocairo (parts of poppler-utils) to convert PDF to PIL Image object. Unfortunately, pdftoppm and pdftocairo are not part of the default software on AWS Lambda. Therefore, we need to install them manually.
If you want to know how to install third party command line software on AWS Lambda, please visit my post: How to install third-party command line software on AWS Lambda using Python Chalice. I will assume that both pdftoppm and pdftocairo are ready in the following steps.
My code
In the following code, I downloaded a sample pdf file from W3C., read the PDF file, passed it to the convert_from_bytes
function, stored it as an .jpg
file and returned it as a .jpg
file back.
Result
It’s working 🎉! We’ve successfully fetched a .jpg
image file.
Thanks for reading!