How this Github project can take your local Prefect workflows to the next level by provisioning AWS & Prefect Cloud infrastructure and the important role Github Actions plays as part of it.
TL;DR: This project sets up a dataflow management system powered by Prefect and AWS. Its deployment has been fully automated through Github Actions, which additionally exposes a reusable interface to register workflows with Prefect Cloud.
This is the second and final part of my series about designing and building an event generation engine. In Part 1 I explained why we selected AWS Kinesis Analytics for SQL in the first iteration of our project and I will now share how it fits into our existing architecture and the tweaks we made ensuring it worked for us.
Prior to this project, data was being captured from the aircraft, sent to a data pipeline for enrichment and stored in the data lake for analytics.
At Spidertracks we work to help make aviation safer by providing real-time aircraft tracking as well as transforming flight data into valuable insights. As part of the company’s journey to become a data-driven organisation I am currently involved in a project empowering customers to define what a typical flight looks like and allowing our platform the responsibility to trigger events - also referred to as safety events - if the unexpected were to occur.
This project comes with interesting problems to solve and I will focus this series of 2 articles on the technical aspect of the event generation engine…
A Workflow consists of steps, configured to respect a predefined order and accomplish a specific business objective. They vary from something as simple as defining an IT request process in a small company to complex data transformations aimed to deliver key business insights.
Leaving complexity aside, some characteristics are common across all of them:
To address the above is where the concept of Workflow Automation comes in. It can be thought as a framework that seeks to standardize and…
All steps described on this post can be reproduced by deploying my github project called kafka-aws-deployment. This project uses terraform to deploy the architecture to AWS and all of the deployment steps are also detailed on its README file.
First step is to either clone it through git or to download it locally to your machine.
This is part 2 of my blog series about building a Kafka playground on AWS. If you missed part 1 please check this.
Now that we know the Kafka components and the AWS services of choice let’s look into a graphical representation of this architecture and explain how it works.
A VPC is a virtual private network that AWS resources can be securely placed into, delimiting access to only allowed parts. …
As part of my recent studies I decided to explore Apache Kafka. Kafka is an open-source streaming platform used to collect/analyse high volumes of data, and Its ecosystem is composed by multiple components. These components are deployed separately to the core of Kafka and while this decoupled architecture has clear benefits — like scalability — it also introduces challenges, specially when planning what a platform deployment would look like.
There are solutions out there that facilitate getting the full architecture up and running, like Confluent’s platform for example. However, at this point I am interested in exploring AWS’s managed Kafka…
Trade Me, like many companies around the globe, is leveraging the capabilities of the public cloud. The countless services and the cost-saving opportunities are only a few advantages the cloud brings to businesses.
This was a collaborative project between Business Intelligence and Data Engineering and this blog post - written by Maikel Penz (Data Engineer) and Lashwin Naidoo (BI Developer) - shares our experience building a data warehouse in the cloud with Snowflake.
Snowflake is a relational SQL data warehouse provided as a service. It runs on top of either AWS, Azure or Google Cloud. There is no infrastructure management…
One of the benefits of running in the cloud is that everything can be translated into code. Going for infrastructure as code early on in your project is a smart call because although using the cloud provider´s GUI to explore services can be helpful, relying on such manual interactions to continuously roll out production changes is a fragile and error-prone approach, likely to bring you headaches along the way.
What automation gives us:
In this blog post, we explore…
Hi ! I am Maikel.
The year is 2019, technology has taken over the world and writing a blog post is something straightforward that anyone can do. While part of this is true because here I am, writing my first blog post, technology hasn’t completely taken over the world. If that was the case, robots would be writing this blog post for me and I would be playing on Xbox.
The tools and methods used to solve problems today are likely to be less relevant next year (month, week, actually wait ! that’s not worth looking into anymore..). While the…