Category Data Engineering

Custom Data Source in Spark 3

Custom Data Source in Spark 3

In 2020 Apache Spark released version 3.0.0.0 which introduced some changes to the API for defining custom data sources, known within the Spark environment as Custom Data Source. These were previously used through DatasourceV2, which generated confusion and an unintuitive…

Concurrency through Futures in Scala

Concurrency through Futures in Scala

When we imagine a simple programming algorithm, it is logical to think about a succession of instructions that are executed sequentially, where the next instruction will not be executed until the one immediately preceding it has been completed. However, depending…