Tutorial DataHub 3 – Main concepts
In this post, we will talk about the main concepts of DataHub at a functional level and we will study the fundamental elements by taking a tour of the application. To be able to follow it, you can use the DataHub…
In this post, we will talk about the main concepts of DataHub at a functional level and we will study the fundamental elements by taking a tour of the application. To be able to follow it, you can use the DataHub…
In the Tutorial DataHub I we analysed the architecture of this platform. In this post, we are going to see a guide on how to deploy DataHub and start working with this tool. DataHub can be deployed in two ways:…
We begin a collection of tutorials on the use and operation of DataHub, a Data Governance platform that we already mentioned in the post What is a Data Catalog and what does it consist of. In this series, we explore…
What is a Data Catalogue seems very intuitive and that anyone minimally initiated in this world would understand. But putting it into practice and implementing it is a bit more complicated. For those who are not familiar with this concept,…
In the world of application development, container-based deployment environments are becoming more and more common, and Kubernetes has established itself as the standard for container-based deployment. However, for many developers, setting up and managing a complete Kubernetes cluster can be…
In the field of Data Engineering, efficient database design is essential to handle large volumes of data and provide effective analysis. Throughout my experience as a Data Engineer, I have worked with the main data relationship systems and have observed…
Today we are going to talk about two ways of testing in Apache Airflow. Historically, testing in Airflow has been something that has been a headache for all users of the famous framework. The coupling of the code with the…
In this post we are going to talk about how DBT integrates with Spark and how this integration can be useful for us. DBT is a framework that facilitates the design of data modeling throughout the different data modeling cycles.…
Today I would like to deal with a topic that, from my point of view, is very important and is probably the holy grail of data engineering projects. However, we rarely reach the necessary level of maturity to be able…
Definitive guide to configure the Pyspark development environment in Pycharm; one of the most complete options. Spark has become the Big Data tool par excellence, helping us to process large volumes of data in a simplified, clustered and fault-tolerant way.…