Infrastructure as Code — Introducing the Serverless Framework for Code Deployment

Published in

Trade Me Blog

6 min readMay 28, 2019

One of the benefits of running in the cloud is that everything can be translated into code. Going for infrastructure as code early on in your project is a smart call because although using the cloud provider´s GUI to explore services can be helpful, relying on such manual interactions to continuously roll out production changes is a fragile and error-prone approach, likely to bring you headaches along the way.

What automation gives us:

Better understanding of the environment’s current state;
Reduced risk of human error;
Faster delivery;
Ability to easily replicate the entire environment.

In this blog post, we explore the Serverless Framework: what it is, a use case to help you understand when to use it, how it compares with other technologies, it’s benefits and drawbacks.

I am not going into much detail on how to use the technology because the primary focus here is to help you understand in what context it might be the right tool for you. If you are looking for an example please check my project on GitHub where I deploy an API using the Serverless Framework.

What is the Serverless Framework ?

According to the Serverless Framework website:

The Serverless Framework is a CLI tool that allows users to build & deploy auto-scaling, pay-per-execution, event-driven functions.

Think of it as a simple way to automate the deployment of Serverless Functions. These functions are deployed as part of a service that can either contain all of your code or be broken down in smaller and more targeted services.

Services are written in a file called serverless.yml. This file defines the components to deploy to the target cloud provider. E.g:functions,triggers,etc.

The Framework is compatible with most cloud providers. On AWS for example you can use it to deploy Lambda Functions. In this case, as part of the deployment process Cloud Formation templates are generated and pushed to AWS.

What is the problem to be solved ?

Trade Me is the biggest e-commerce platform in New Zealand. Multiple user interactions happen every second on the website, people searching, buying, bidding, adding items to their watchlist, etc. These actions give valuable insights to the business, therefore, they are tracked and sent to a Data Lake hosted on AWS for further analysis and exploration.

The AWS infrastructure used to support the Data Lake is fully automated and can be easily replicated. But what are the automation technologies involved ? where does the Serverless Framework fit in ?

In a high-level picture, the architecture looks like the following:

The architecture above presents three important parts, decoupled from each other, but translated into infrastructure as code on AWS: Security(VPC), Data Processing(EMR) and Data Pipeline.

Security

A Virtual Private Network is in place to delimit access to the EMR cluster and make sure Data is only available to authorized parts inside the company’s network.

VPC’s don’t hold custom logic and there is also no code involved. It only sets access barriers to AWS services, being purely part of the infrastructure. Therefore, its automation is done through Cloud Formation.

Data Processing

The EMR cluster runs Spark jobs to process aggregations and also clean, partition and compress the data. It is a managed service, part of the infrastructure, and again automated through Cloud Formation.

Data Pipeline

Files arrive in the cloud from application web servers containing events in JSON format. These files need to be properly identified and queued to be processed by the EMR cluster. To create this data flow, AWS Services are tied together composing the Data Pipeline.

Differently than the VPC and the EMR cluster, the Data Pipeline relies on logic written in Lambda Functions and Step Functions, making it the perfect use case for the Serverless Framework.

The Data Pipeline takes advantage of multiple serverless services. It starts from files landing on S3 buckets, goes through sanity checks inside lambda functions, uses SQS to queue messages/handle retries and SNS to make it flexible for parallel processing. Once the message hits the Step Function a new workflow kicks off, orchestrating the communication between the Data Pipeline and EMR cluster.

From the description above, it makes sense to split the pipeline architecture in 2 Serverless Framework services: Pipeline Events and Pipeline Processing.

The Pipeline Events service takes care of everything up to the Step Function execution. This service deploys the logic to identify the events, queue files to be processed, match the desired schema and avoid duplicate files.

The Pipeline Processing service deploys the Step Function workflow and all Lambda Functions orchestrated as part of it.

Without going into the implementation details of each service, the following code block shows how functions and step functions can be defined in the serverless.yml file.

How does the Serverless Framework compare with other technologies ?

The following image introduces four commonly used infrastructure deployment tools.

Terraform, Cloudformation, SAM and Serverless Framework

At first sight you can argue the four tools above can do everything, from infrastructure to code deployment. And you are not wrong, they do. However, you might spend time writing unnecessary code with a tool not fit for purpose.

Terraform and Cloud Formation are designed to help you deploy Infrastructure. Taking AWS as our cloud provider again, think of VPC’s, Load Balancers, EC2 instances.

AWS Sam and the Serverless Framework deploy your code and other resources required to support the execution of your code, e.g: triggers or objects your functions interact with, like queues, database tables, etc.

For me, I choose the tool to use by separating what is pure infrastructure from custom logic written in serverless functions. The rule of thumb to use the Serverless Framework is the need for code deployment.

Benefits and Drawbacks

If you read up to here I believe the benefits of the Serverless Framework are quite clear in your mind. It simplifies code deployment and let you focus on what really matters, the logic inside your function.

However, as any other tool, there are also drawbacks. The Serverless Framework translates the code to the cloud provider’s native infrastructure deployment tool. On AWS for example, as part of deployment it generates Cloud Formation templates. The “translation” process runs seamlessly and doesn’t bring problems, but during development you might have to research how to do something in Cloud Formation and then reproduce it on your serverless.yml file. Cloud Formation is better documented and offers a wider range of examples, the Serverless Framework being a level on top of Cloud Formation can sometimes slow you down.

My second mention is about relying on plugins. This is not a huge drawback as the Framework has been well adopted and the community is always writing new plugins, but for me at least, not having an “official” and native way to solve something can be frustrating sometimes.

Conclusion

The point of this post was to share my experience with the Serverless Framework and hopefully this can help you pick the tool that better fits what you are trying to achieve.

More important than anything else, automating your infrastructure is vital to succeed in the cloud and deliver at a higher speed. Don’t worry about the time you spend writing your infrastructure as code as it will save you headaches on the long run.