Making Sense of Big Data

In this article, I’ll share a comprehensive example of how to integrate Spark Structured Streaming with Kafka to create a streaming data visualization.

Photo by Markus Spiske on Unsplash

Introduction

Apache Kafka is being largely adopted in modern architectures providing a more reliable and scalable way to capture and integrate real-time data between systems. …

A brief guide on how to set up a development environment with Spark, Airflow and Jupyter Notebook

Photo by Christopher Gower on Unsplash

Brief context

As a Data Engineer, it is common to use in our daily routine the Apache Spark and Apache Airflow (if you do not yet use them, you should try) to overcome typical Data Engineering challenges like build pipelines to get data from someplace, do a lot of transformations and deliver…

This article series aims to show how to identify hard bounce e-mails using machine learning techniques. In part 1 we will see Feature Engineering and Exploratory Analysis.

Photo by Tiffany Tertipes on Unsplash

What is e-mail hard bounce?

This terminology is widely used in Marketing and is related to bounced e-mail messages which occur when an e-mail message is rejected by…

Thiago Cordon

Data practicioner, enabling business with data. Editor at https://medium.com/data-arena

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store