How this Github project can take your local Prefect workflows to the next level by provisioning AWS & Prefect Cloud infrastructure and the important role Github Actions plays as part of it.

Photo by Safar Safarov on Unsplash

TL;DR: This project sets up a dataflow management system powered by Prefect and AWS. Its deployment has been fully automated through Github Actions, which additionally exposes a reusable interface to register workflows with Prefect Cloud.

The problem

Prefect is an open-source tool that empowers teams to orchestrate workflows with Python. Its cloud solution - Prefect Cloud - adds a management layer on top of the framework. …


How Kinesis Analytics for SQL fits into our Data Pipeline and what challenges we faced with it

Photo by Avel Chuklanov on Unsplash

This is the second and final part of my series about designing and building an event generation engine. In Part 1 I explained why we selected AWS Kinesis Analytics for SQL in the first iteration of our project and I will now share how it fits into our existing architecture and the tweaks we made ensuring it worked for us.

Kinesis Analytics: The moving parts

The Data Source

Prior to this project, data was being captured from the aircraft, sent to a data pipeline for enrichment and stored in the data lake for analytics.

Our intention was to modify this data pipeline by adding a step in parallel…


Our challenges introducing real-time analytics over flight data to deliver valuable insights

Photo by Joseph Bradshaw on Unsplash

At Spidertracks we work to help make aviation safer by providing real-time aircraft tracking as well as transforming flight data into valuable insights. As part of the company’s journey to become a data-driven organisation I am currently involved in a project empowering customers to define what a typical flight looks like and allowing our platform the responsibility to trigger events - also referred to as safety events - if the unexpected were to occur.

This project comes with interesting problems to solve and I will focus this series of 2 articles on the technical aspect of the event generation engine


Photo by Adi Goldstein on Unsplash

A Workflow consists of steps, configured to respect a predefined order and accomplish a specific business objective. They vary from something as simple as defining an IT request process in a small company to complex data transformations aimed to deliver key business insights.

Leaving complexity aside, some characteristics are common across all of them:

  • Workflows need to run somewhere;
  • Repeatedly;
  • Triggered on a schedule or ad-hoc;
  • And sadly they will not always perform as expected.

To address the above is where the concept of Workflow Automation comes in. It can be thought as a framework that seeks to standardize and…


This is Part 3 of this blog series and we are now going to make use of the architecture described on Parts 1 and 2 to stream database events to Kafka and consume them through KSQL.

All steps described on this post can be reproduced by deploying my github project called kafka-aws-deployment. This project uses terraform to deploy the architecture to AWS and all of the deployment steps are also detailed on its README file.

First step is to either clone it through git or to download it locally to your machine.

git clone https://github.com/maikelpenz/kafka-aws-deployment

Important: the AWS infrastructure part…


This is part 2 of my blog series about building a Kafka playground on AWS. If you missed part 1 please check this.

Now that we know the Kafka components and the AWS services of choice let’s look into a graphical representation of this architecture and explain how it works.

AWS Architecture for Kafka Deployment

VPC

A VPC is a virtual private network that AWS resources can be securely placed into, delimiting access to only allowed parts. …


As part of my recent studies I decided to explore Apache Kafka. Kafka is an open-source streaming platform used to collect/analyse high volumes of data, and Its ecosystem is composed by multiple components. These components are deployed separately to the core of Kafka and while this decoupled architecture has clear benefits — like scalability — it also introduces challenges, specially when planning what a platform deployment would look like.

There are solutions out there that facilitate getting the full architecture up and running, like Confluent’s platform for example. However, at this point I am interested in exploring AWS’s managed Kafka…


Photo by Porapak Apichodilok from Pexels

Trade Me, like many companies around the globe, is leveraging the capabilities of the public cloud. The countless services and the cost-saving opportunities are only a few advantages the cloud brings to businesses.

This was a collaborative project between Business Intelligence and Data Engineering and this blog post - written by Maikel Penz (Data Engineer) and Lashwin Naidoo (BI Developer) - shares our experience building a data warehouse in the cloud with Snowflake.

Overview

Snowflake is a relational SQL data warehouse provided as a service. It runs on top of either AWS, Azure or Google Cloud. There is no infrastructure management…


Pipeline Photo by tian kuan on Unsplash

One of the benefits of running in the cloud is that everything can be translated into code. Going for infrastructure as code early on in your project is a smart call because although using the cloud provider´s GUI to explore services can be helpful, relying on such manual interactions to continuously roll out production changes is a fragile and error-prone approach, likely to bring you headaches along the way.

What automation gives us:

  • Better understanding of the environment’s current state;
  • Reduced risk of human error;
  • Faster delivery;
  • Ability to easily replicate the entire environment.

In this blog post, we explore…


Photo by Simon Abrams on Unsplash

Hi ! I am Maikel.

The year is 2019, technology has taken over the world and writing a blog post is something straightforward that anyone can do. While part of this is true because here I am, writing my first blog post, technology hasn’t completely taken over the world. If that was the case, robots would be writing this blog post for me and I would be playing on Xbox.

The tools and methods used to solve problems today are likely to be less relevant next year (month, week, actually wait ! that’s not worth looking into anymore..). While the…

Maikel Penz

Data Engineering Team Lead @ Plexure | AWS Certified Solution Architect | LinkedIn: https://www.linkedin.com/in/maikel-alexsander-penz/

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store