Tag Apache Spark

Custom Data Source in Spark 3

Custom Data Source in Spark 3

In 2020 Apache Spark released version 3.0.0.0 which introduced some changes to the API for defining custom data sources, known within the Spark environment as Custom Data Source. These were previously used through DatasourceV2, which generated confusion and an unintuitive…

Kafka + Spark for Batch processing

How to leverage Streaming technologies like Apache Kafka and Apache Spark for Batch processing

How to leverage Streaming technologies like Apache Kafka and Apache Spark for Batch processing ETL process. Central piece of the Big Data project Collecting, ingesting, integrating, processing, storing and analyzing large volumes of information are the fundamental activities of a…