Category Data Engineering

Concurrency through Futures in Scala

Concurrency through Futures in Scala

When we imagine a simple programming algorithm, it is logical to think about a succession of instructions that are executed sequentially, where the next instruction will not be executed until the one immediately preceding it has been completed. However, depending…

Kafka + Spark for Batch processing

How to leverage Streaming technologies like Apache Kafka and Apache Spark for Batch processing

How to leverage Streaming technologies like Apache Kafka and Apache Spark for Batch processing ETL process. Central piece of the Big Data project Collecting, ingesting, integrating, processing, storing and analyzing large volumes of information are the fundamental activities of a…

Introduction to Apache YARN

Introducción a Apache YARN

Basic Single Node Configuration Note: the code of this post has been tested using Apache Hadoop 2.10.1. Please check out our previous post, Introduction to Apache Hadoop, to configure this version of Hadoop, in case you have not done it…