Review of some Apache Spark library functions and some practical examples avoiding UDFs
Cross-DAG task and sensor dependencies with Airflow. How to solve problems related to data engineering complexity.
First steps with Apache YARN customization. Introduction to a more advanced configuration setting to improve YARN’s performance
How to set up a Clean Code architecture with Alpakka Kafka
How to configure Apache YARN to execute parallel jobs
How to deploy the Apache Airflow process orchestrator on Kubernetes
Introduction to Apache Hadoop. How to configure and run one of the most common open source tools used in big data contexts.
How to configure the Pyspark development environment in Pycharm with one of the most complete options
Schedule, orchestrate and monitor your Kettle tasks with Airflow with this Pentaho plugin.