Do More With AWS in Timescale: An AWS Lambda Tutorial Using SAM CLI
AWS Lambda Tutorial: Serverless Time-Series Applications With SAM CLI
In my last blog post about AWS Lambda, I briefly touched on how serverless and time-series applications go hand in hand. In this installment, I’d like to go more in-depth, explaining how to actually build a serverless time-series application on AWS Lambda and Timescale using the SAM CLI in the most developer friendly way.
I created the code featured in this blog post for a video on building a serverless time-series application.
Why Lambda Is Great
I first discovered AWS Lambda functions in the summer of 2020 when I (like everybody else) was forced to stay home. The abundance of time at my disposal caused me to experiment with Lambda functions like there was no tomorrow, and I quickly learned the ins and outs of Lambda.
It didn’t take me long to discover that scaling was one of Lambda’s strongest features. I didn’t have to provision multiple expensive instances, didn’t have to configure scaling policies, or pay for expensive load balancers.
I could write most apps without giving concurrency a thought, and they would still scale relatively effortlessly. The only time I had to explicitly think about the distributed nature of my application was when ACID transactions or idempotency came into play, but most of that could be solved by using locks or mutexes.
An unexpected side effect of Lambda's seamless scaling was that you could also scale down very efficiently. Because my toy projects were getting, at most, a handful of requests per day, there was rarely a need for more than one concurrent Lambda function.
In fact, a majority of the time, none of my Lambda functions were running at all. This is called scaling to zero and is surprisingly tricky to achieve using traditional deployment methods. This is because the time it takes to start an EC2 instance is measured in seconds or even minutes, but the time it takes to kick off a Lambda function is measured in milliseconds.
So when the API Gateway receives a request, it can forward that request to a Lambda function that isn’t yet running. Granted, that initial request will take an additional 300 milliseconds, but I think that’s a worthy trade-off for the cost savings of keeping a function idling. This is an especially welcome behavior for projects that lay dormant and unvisited for most of their life yet cost me nothing to operate.
If you’ve ever used AWS, you know how well services integrate with each other. AWS Lambda is a prime example of this. Whether you’re using CloudFront, S3, or Kinesis, there are a plethora of ways you can use Lambda to enhance your experience. An easy way to see the sheer scale of Lambda’s integrations is by creating a function and clicking on Add Trigger
. The enormous list you are presented with will undoubtedly give any build-happy engineer tons of incredible ideas for serverless time-series applications.
All these benefits and features caused Lambda to solidify its place in my engineering toolbox. But that doesn’t mean Lambda is free of shortcomings, and those mostly come in suboptimal user experiences.
Thankfully, there are a handful of tools to mitigate this pain point.
The AWS SAM CLI
The AWS Serverless Application Model (SAM) CLI is one of them. Allow me to explain.
One of the most difficult aspects of a serverless application is that, unlike a traditional API, each route has its own standalone function, codebase, and container or executable. This makes it so that deploying Lambda functions can be tedious and time-consuming if not done properly.
In my early Lambda days, I wrote complicated Terraform IaC files to compile Golang functions, compress the binary and upload it to AWS. However, this soon turned out to be less than ideal as it was brittle, slow, and didn’t allow local testing.
Because it was impossible to test your function without compiling, compressing, and uploading, the software development cycle took much longer than it should. The problem only worsened when my functions required other infrastructure, such as SQS queues, S3 buckets, and API Gateways.
In essence, the user experience was horrible until I learned about the SAM CLI. SAM is an open-source CLI tool that helps you write, test, and deploy your serverless applications in a much cleaner and user-friendlier way.
One of these core-enhancing features allows you to describe your application infrastructure in a CloudFormation-esque YAML file. This includes all the required functions and their configuration, such as runtime type, memory capacity, and environment variables, as well as what events should trigger the function, which can also be defined and configured. The inherited centralization of your applications infrastructure as code creates a clear overview of what resources are being provisioned.
Another great feature of the SAM CLI is how it makes using containers super easy. Before I knew about the SAM CLI, I would manually create and manage Elastic Container Registry (ECR) repositories to build, tag, and upload my containers to the appropriate repository.
As you can imagine, this is quite tedious. Luckily, the SAM CLI does this for you based on the function’s configuration. Switching from zip files to containers comes with the added benefit of being able to test your Lambda functions locally with ease through the start-api
and local-invoke
commands, significantly improving the user experience and speed of development.
But more about this later.
Writing the Serverless Application
Let’s set the scene: we have a bunch of temperature sensors in every room of our house, and we’d like these sensors to automatically push their temperature to our Timescale database to visualize them in Grafana.
Unfortunately, our hypothetical sensors only speak HTTP, so inserting our readings directly into Timescale from our sensors is not an option. That’s why we’ll use an AWS HTTP API Gateway to receive our sensors’ messages. This API Gateway will have two routes with a respective Lambda function attached to them.
You can find the source code to this project here.
PostSensorData
This is a POST
route that allows our sensors to insert their readings into our Timescale instance. It parses some JSON from the request body and uses that data in a SQL insert statement.
GetSensorData
This is a GET
route that returns the five latest readings in our database. Obviously, this is just an example of what a function like this can do. Alternatively, you could adapt this function to have some kind of query capabilities by adding request parameters or a JSON body.
Or, you could omit this function and query the database using SQL—this approach would work better in a dashboarding setting where visualizing HTTP responses could prove difficult.
Application Architecture
Both functions are written in Python using the psycopg2 library and are deployed using Docker containers. For the sake of time and simplicity, we have omitted authentication and input validation from both functions. Obviously, that makes this application unfit for production use and should be deployed with caution.
Now that we know what our application looks like from an architectural standpoint, let’s look at how to create, configure, and code the project. If you don’t want to follow along and would like to skip to a working prototype, you can find the complete project files in our GitHub repository.
We can create a boilerplate app by using sam init
and select the following options:
- Hello World Example Template
- python3.7 runtime
- Image package type
You can get rid of the tests
directory (for now) and all the __init__.py
files. Then we want to rename the hello_world
directory to get_sensor_data
and make a copy named post_sensor_data
. This should leave you with a directory structure that looks something like this:
├── README.md
├── events
│ ├── event.json
├── get_sensor_data
│ ├── Dockerfile
│ ├── app.py
│ └── requirements.txt
├── post_sensor_data
│ ├── Dockerfile
│ ├── app.py
│ └── requirements.txt
└── template.yaml
You can see two separate functions containing their own code, Dockerfile, and pip requirements. The infrastructure of code and configuration of our app and functions is described in the template.yaml
file.
A very important part of the template.yaml
file is the Globals record. Here, we've declared an environment variable called CONN_STRING
, which holds our Timescale connection string. This environment variable will be used in both functions to create a connection to our instance.
Doing it this way isn’t very secure, as anyone with access to this file, the repository, or even the AWS Lambda dashboard can see the string and access your database. For a production deployment, it’s recommended to use a service like AWS Secrets Manager to store and retrieve credentials and connection strings.
In Globals, we also find the VpcConfig
record. Here we can configure whether or not our Lambda function should be in a VPC and which security groups it should have. This is great if we want our Lambda function to be in a private subnet peered to our Timescale VPC.
This way, you run no risk of exposing your valuable time-series data to the public internet. Check out this blog post if you want to know how to set up VPC Peering with Timescale. If you don't want your Lambda functions to be in a VPC, you can remove these five lines.
Globals:
Function:
Timeout: 3
MemorySize: 512
Environment:
Variables:
CONN_STRING: postgres://tsdbadmin:password@hostname:5432/tsdb
VpcConfig:
SecurityGroupIds:
- sg-0123456789
SubnetIds:
- subnet-0123456789
Further down the file, we can see the function definitions. These have some information about the package type, architecture, and events. An important part here is DockerContext under Metadata. This tells the SAM CLI where the code to our Lambda function is written, which correlates to the two folders in our project called get_sensor_data and post_sensor_data.
Another important one to mention is the API route declaration under Events. This tells the SAM CLI how to set up our API Gateway. Both our functions live in the /sensor path; the only difference is their respective request method.
So, depending on whether we send a GET
or POST
request, we'll use a different function. If you are following along, it’s important to duplicate the GetSensorDataFunction
record and transform it to the PostSensorDataFunction
, which thankfully only requires you to change the function name, event name, request method, and directory.
Resources:
GetSensorDataFunction:
Type: AWS::Serverless::Function
Properties:
PackageType: Image
Architectures:
- x86_64
Events:
GetSensorData:
Type: HttpApi
Properties:
Path: /sensor
Method: GET
ApiId: !Ref ApiResource
Metadata:
Dockerfile: Dockerfile
DockerContext: ./get_sensor_data
DockerTag: python3.9-v1
Then we configure the API Gateway resource. We set the type to Http:Api as the default API Gateway type is a REST API. Do keep in mind that this record is still part of the Resources record.
Lastly, we configure the output, which simply prints the HTTP URL of our API Gateway at the end of our deployment process. This will make it easier to test our application using a tool like curl or postman without having to go digging in AWS dashboards.
ApiResource:
Type: AWS::Serverless::HttpApi
Properties:
StageName: dev
Outputs:
SimpleLambdaApi:
Description: "API Gateway endpoint URL for dev stage for sensor_data functions"
Value: !Sub "https://${ApiResource}.execute-api.${AWS::Region}.amazonaws.com/dev/sensor"
Code
Now that we understand the infrastructure as code and configuration of our application, let's look at our application code. We’ll start with the PostSensorData function in post_sensor_data/app.py
as it’s the most straightforward. Our function exists in two distinct stages: the init and the handler stages.
This code gets executed when our Lambda function is called for the first time. It imports the appropriate libraries and creates a connection and cursor to our Timescale instance. When that connection is established, we create a hypertable called sensor_data
.
If you're worried about the performance hit from trying to create a table that probably already exists, this is a good time to talk about how AWS Lambda executes code. Because unlike what you might think, this statement does not run every time we call our Lambda function.
When the Lambda function is called for the very first time, it will execute the Python script as usual, executing this query. This is called a cold start. When the Lambda function is called again, only the lambda_handler
function is called.
At this stage, our function is warm and will stay warm as long as it's being used. After about 30 minutes of inactivity, our function goes cold again. And the cycle restarts. So, in reality, this table creation query will only execute once, so long the function is being used.
import json
import psycopg2
import os
conn = psycopg2.connect(os.environ["CONN_STRING"])
cursor = conn.cursor()
create_table_statement = """CREATE TABLE IF NOT EXISTS sensor_data (
time TIMESTAMPTZ,
location TEXT,
temperature DOUBLE PRECISION
);
SELECT create_hypertable('sensor_data', 'time', if_not_exists => TRUE);
"""
cursor.execute(create_table_statement)
conn.commit()
As mentioned above, the code in the lambda_handler
function gets executed every time the Lambda function is called. This function takes in an event, which is the API Gateway event, and contains details like the protocol, requested path, source IP, and the request body. An object called context is also passed in case we want to do context propagation between requests.
First, we marshal the request body (a string) into a JSON object. We extract the temperature and location from this JSON object, store them in their own variables and print them out for good measure. This can be a tremendous help when debugging your application.
Afterward, we execute and commit an insert statement on our sensor_data
table. To indicate success, we return a 200 status code. As you can see, we are not doing proper error handling or propagation for simplicity's sake. But if something were to go catastrophically wrong, the AWS Lambda runtime is smart enough to return a 500 status code, which indicates an internal server error.
def lambda_handler(event, context):
body = json.loads(event["body"])
temperature = body["temperature"]
location = body["location"]
print(temperature, location)
cursor.execute("""INSERT INTO sensor_data
(time, location, temperature) VALUES (NOW(), %s, %s);
""", (location, temperature))
conn.commit()
return {
"statusCode": 200
}
That’s all there is to it for the post_sensor_data
function!
If you take a look at the get_sensor_data
function’s code in the GitHub repository, you’ll see that the init part is exactly the same, so for now, we only have to worry about the handler.
We start by querying for the five most recent rows in the sensor_data
table and creating a list variable to store the queried rows. Then, we iterate over cursor.fetchall()
, an iterator that returns the results from your query. In this iteration loop, we create a dictionary with the time, location, and temperature and add it to our response list.
Last but not least, we convert that list to JSON as the body of the response.
def lambda_handler(event, context):
cursor.execute("SELECT * FROM sensor_data ORDER BY time DESC LIMIT 5;")
response = []
rows = cursor.fetchall()
for row in rows:
response.append({
"time": int(time.mktime(row[0].timetuple())),
"location": row[1],
"temperature": row[2]
})
return {
"statusCode": 200,
"body": json.dumps(response)
}
That’s all the coding we need to do today! Next up is testing and deploying our application!
Building, Testing, and Deploying With SAM CLI
Before we can test our two functions, we need to build them. Go to the root of your project (the directory with the template.yaml
file) and execute the following command:
sam build
This can take a while the first time around as the SAM CLI needs to build both containers. Once it’s done building, we can test our functions by using the local invoke
command. To do this, we need to create an event that resembles the event passed along by the API Gateway to our lambda_handler
function.
Create a file called post.json
with the following content in the events
directory.
{
"body": "{\"temperature\":65,\"location\":\"bedroom\"}",
"resource": "/sensor",
"path": "/sensor",
"httpMethod": "POST"
}
Do the same for events/get.json
:
{
"resource": "/sensor",
"path": "/sensor",
"httpMethod": "GET",
}
With these events, we can test our functions locally. Keep in mind that if you configured VPC peering with a Timescale VPC, your Timescale instance would not be reachable from your home or office network. A solution is to spin up an EC2 instance in the peered VPC and test there.
sam local invoke "PostSensorDataFunction" -e events/post.json
sam local invoke "GetSensorDataFunction" -e events/get.json
If all went well, you should see that the GetSensorData function returned the sensor reading that the PostSensorData function had inserted.
Alternatively, you can use the local API command to “host” your API and manually make GET
and POST
requests.
sam local start-api
When you’re done testing your functions, you can provision the infrastructure and deploy the functions using the deploy command. If you are deploying for the first time, use the --guided
flag to configure things like the stack name, AWS region, and state file. During subsequent deployments, you can omit this flag altogether.
sam deploy --guided
Due to the SAM CLI having to push the two containers to their respective ECR repositories, this deployment process can easily take 15 minutes to complete.
When it is done deploying, you should see the URL of your API gateway printed at the bottom of your terminal. We can use a tool like curl to make a POST
and GET
request to test out the functionality of our serverless application.
curl --request POST 'your-apigateway-url.amazonaws.com/dev/sensor' \
--header 'Content-Type: application/json' \
--data-raw '{
"location": "bedroom",
"temperature": 66
}'
curl --request GET 'https://your-apigateway-url.amazonaws.com/dev/sensor'
# [{"time": 1678925592, "location": "bedroom", "temperature": 66.0}]
AWS Lambda Tutorial: The End
Congratulations! You’ve successfully written your first serverless time-series application. You’re now ready to conquer the world with AWS Lambda and Timescale.
If you haven’t tried Timescale yet, sign up for a 30-day free trial (no credit card required), and start deploying serverless time-series applications with AWS Lambda and SAM CLI.