


Aggregation Pipelines on MongoDB
Why use MongoDB? MongoDB is a document oriented NOSQL open source database, which means that data does not necessarily have to follow a certain schema. This makes MongoDB an ideal candidate as a database for big data workloads as it…

Damavis Summary of Week 11, 2021

Clean Code with Alpakka Kafka
At Damavis we are very aware of the importance for our clients to have access to their data in real time. For this reason, one of our strengths is the development of tools and technologies that can move, transform and…

Damavis Summary of week 9, 2021

Introduction to Apache YARN
Basic Single Node Configuration Note: the code of this post has been tested using Apache Hadoop 2.10.1. Please check out our previous post, Introduction to Apache Hadoop, to configure this version of Hadoop, in case you have not done it…

Damavis Summary of week 7, 2021

Cross Compiling In Java
A situation that occurs frequently is having to write code for a project that is in an old version of java. In Damavis, we always like to make use of the latest tools added to the language, so in these…

Damavis Summary of week 7, 2021

Deploying Airflow: CeleryExecutor on Kubernetes
How to deploy the Apache Airflow process orchestrator on Kubernetes What is Apache Airflow and how does it work? One of the work processes of a data engineer is called ETL (Extract, Transform, Load), which allows organisations to have the…