Tutorial DataHub 3 – Main concepts

In this post, we will talk about the main concepts of DataHub at a functional level and we will study the fundamental elements by taking a tour of the application. To be able to follow it, you can use the DataHub…
In this post, we will talk about the main concepts of DataHub at a functional level and we will study the fundamental elements by taking a tour of the application. To be able to follow it, you can use the DataHub…
In the Tutorial DataHub I we analysed the architecture of this platform. In this post, we are going to see a guide on how to deploy DataHub and start working with this tool. DataHub can be deployed in two ways:…
We begin a collection of tutorials on the use and operation of DataHub, a Data Governance platform that we already mentioned in the post What is a Data Catalog and what does it consist of. In this series, we explore…
What is a Data Catalogue seems very intuitive and that anyone minimally initiated in this world would understand. But putting it into practice and implementing it is a bit more complicated. For those who are not familiar with this concept,…
In a previous article, we gave a theoretical introduction to Spark Structured Streaming where we analysed in depth the high-level API that Spark provides for processing massive real-time data streams (Structured Streaming). There, we looked at the essential theoretical concepts…
In the world of application development, container-based deployment environments are becoming more and more common, and Kubernetes has established itself as the standard for container-based deployment. However, for many developers, setting up and managing a complete Kubernetes cluster can be…
Data processing and consumption are essential elements of the contemporary business world. Therefore, there are mysterious pieces of software, commonly abbreviated as APIs, whose role is fundamental in this traffic of information. APIs (Application Programming Interface) are mechanisms for integration…
In recent years, data processing with low latency, practically in real time, is becoming a requirement increasingly demanded by companies in their big data processes. It is in this context where the concept of stream processing is introduced, which refers…
Apache Spark’s Structured Streaming API is a powerful tool for processing real-time data streams. In this context, there are certain use cases where ensuring the accuracy of the processed data is not trivial due to the time dimension that inherently…
In the field of Data Engineering, efficient database design is essential to handle large volumes of data and provide effective analysis. Throughout my experience as a Data Engineer, I have worked with the main data relationship systems and have observed…