This is part one of a two part post on how to set up transfer data from Redshift to an SFTP server, using S3 as transient storage, and Lambda as the processing function. Part two will deal with the S3/Lambda configuration and permissions (IAM). This first part deals with how to package and deploy a Lambda expression, written in Python3, with some complex platform-specific dependencies (crypto).
AWS Lambda is Amazon's 'serverless' or Function-as-a-Service, product. They were first out of the gate, but Google has Cloud Functions, and Microsoft has Azure Functions, so everyone's at it - they are officially a thing. The concept is pretty simple - written a small single-purpose function (in the language of your choice - they all support a broad range of environments) - deploy it to the relevant platform, and configure it to fire on a specified event.
We chose AWS because it suited out purpose best - we had a requirement to push a data extract from Amazon Redshift (our data warehouse) to a remote SFTP server, and the recommended method for extracting Redshift data is to use the
UNLOAD command which will publish an extract file to S3. The bit we had to write was how to get the file from S3 to the SFTP server, and to do this on receipt of the extract. It's the perfect Lambda scenario. All we had to do was write the function, and deploy it.
Packaging Python3 Functions
1. Write a python function
At its simplest, a Lambda function is literally just that - a function. We chose Python as our runtime environment, so all we had to do was write a function with the correct signature:
def my_lambda_function(event, context):
If the function were this simple, you could write it directly into the Lambda console, however most functions will rely on some external dependencies, and if that's the case, you'll need to create a function package.
2. Add dependencies
To package up a function that has external dependencies, you need to create a zip archive that contains the dependencies and your function module. The Amazon docs are pretty good on how to do this (http://docs.aws.amazon.com/lambda/latest/dg/lambda-python-how-to-create-deployment-package.html) - and if you follow them through you'll have a
package.zip to upload in no time. The following
Makefile will do this for you:
# install dependencies into package directory
pip install -r requirements.txt -t package
# copy in your lambda .py source file(s)
cp my_script.py package/
# zip up the entire directory into package.zip
cd package; zip -r ../package.zip .
# always clean up after yourself
rm -rf package/
This works very well so long as your dependencies have no C extensions or platform-specific installs - if you do (e.g. image processing, crypto, etc.), then you'll find the function will run fine locally, but fail when uploaded to AWS, because AWS isn't running macOS (/Ubuntu/Windows/...) - it's running amazonlinux.
3. Adding platform-specific dependencies
In order to create a package that contains the right packages for amazonlinux, you need to run the
pip install from within an amazonlinux environment. Fortunately Amazon make available a range of Docker containers to make this pretty easy. Unfortunately, the default image does not come in a Python3 variant, so you'll need to install that (as well as
4. Add Python3 support
Although the docs will tell you that there isn't any formal support for Python3 yet, it turns out that there is a distribution available, and installing it is pretty simple. This Dockerfile will install python3 and zip (so we can zip up the package):
# Install python3.6 and zip
RUN yum install -y python36 zip
# This is the mount location for the Lambda function directory
# Default entrypoint / command is to package the function
Using this Dockerfile you can build an image that can be used to package up your application for deployment to Lambda. You need to run make command from within the container, thus:
# build the new image
docker build -t lambda-packager .
# run a single-use container off the image to create the package
docker run --volume $(pwd):/lambda lambda-packager
This should produce a single
package.zip archive that contains your script together with all the dependencies required, built for the amazonlinux environment. Upload this to AWS, and test it out.