Testing in Apache Airflow
Today we are going to talk about two ways of testing in Apache Airflow. Historically, testing in Airflow has been something that has been a headache for all users of the famous framework. The coupling of the code with the…
Today we are going to talk about two ways of testing in Apache Airflow. Historically, testing in Airflow has been something that has been a headache for all users of the famous framework. The coupling of the code with the…
Apache Airflow is an open source tool designed for workflow orchestration especially useful in the field of data engineering. DAGs are defined in Python files and set the relationship and dependencies between the tasks to be executed. You can take…
Usually, when we start working on a new integration where it is necessary to connect to AWS services in the early stages of development, it is easier and faster to work only locally. For this, we can make use of…
In the world of data engineering, the efficient organization and structuring of process flows plays a crucial role. At this point, Apache Airflow has positioned itself as one of the most efficient tools to achieve this task. However, to maximize…
Apache Airflow is an open source tool for workflow orchestration widely used in the field of data engineering. You can take a look at this other blog post where we made an introduction to Basics on Apache Airflow. In this…
Since our Pentaho PDI plugin for Apache Airflow release, we have seen an industry shift towards the usage of Apache Hop for data processing. What is Apache Hop? Apache Hop started (late 2019) as a fork of Kettle PDI, is…
One of the most outstanding new features of Airflow 2.3.0 is Dynamic Task Mapping. This new feature adds the possibility of creating tasks dynamically at runtime. Thanks to this we can change the number of such tasks in our DAG…
Apache Airflow is a free workflow orchestration software, which are created through Python scripts, and can be monitored using its user interface. Some examples of workflows in which this tool could be used are the scheduling of ETL (Extract, Transform,…
It is well known that the use of UDFs (User Defined Functions) in Apache Spark, and especially in using the Python API, can compromise our application performace. For this reason, at Damavis we try to avoid their use as much…