Damavis Summary of week 21, 2021

Avoiding UDFs in Apache Spark, working in Damavis, and a guide to discovering which open-source tool is the right for your project
Avoiding UDFs in Apache Spark, working in Damavis, and a guide to discovering which open-source tool is the right for your project
It is well known that the use of UDFs (User Defined Functions) in Apache Spark, and especially in using the Python API, can compromise our application performace. For this reason, at Damavis we try to avoid their use as much…
Advanced Airflow, creation of machine learning pipelines and artificial intelligence in supermarkets
In this article we are going to tell you some ways to solve problems related to the complexity of data engineering itself. An Airflow DAG can become very complex if we start including all dependencies in it, and furthermore, this…
Basic Single Node Configuration Note: the code of this post has been tested using Apache Hadoop 2.10.1. Please check out our previous post, Introduction to Apache Hadoop, to configure this version of Hadoop, in case you have not done it…
What is Apache Airflow and how does it work? One of the work processes of a data engineer is called ETL (Extract, Transform, Load), which allows organisations to have the capacity to load data from different sources, apply an appropriate…
Schedule, orchestrate and monitor your Kettle tasks with Airflow with this Pentaho plugin. At Damavis we know the importance of data processing. Extracting, cleaning, transforming, aggregating, loading or cross-referencing multiple data sources allows our clients to have Insights or Predictive…