Damavis Summary of week 21, 2021

Avoiding UDFs in Apache Spark, working in Damavis, and a guide to discovering which open-source tool is the right for your project

Damavis Blog

New article on our blog by our Head Data Engineer Cristòfol Torrens.

In Avoiding UDFs in Apache Spark we make a review of some Apache Spark library functions and we also present some practical examples avoiding UDFs. It is well known that the use of UDFs (User Defined Functions) in Apache Spark, and especially in using the Python API, can compromise our application performace. For this reason, at Damavis we try to avoid their use as much as possible infavour of using native functions or SQL .

You can read the full article below

Avoiding UDFs in Apache Spark

Work at Damavis

We are still looking for new candidates to grow our team of Data Engineers.

Damavis is constituted by a team of Data Engineer and Data Scientist with a high experience in BigData and Artificial Intelligence projects, oriented to provide value-added solutions to companies. Our most important values are the quality of our team and the service offered, so we attach great importance to the people who are part of the team.

We’re looking for who wants to be part of the data engineering team, in charge of developing the necessary infrastructure for our customers to manage Big Data effectively and getting the most out of the data. Who will also work with the data science team to assist in the production deployment of machine learning models.

You can read more about our job offer at www.damavis.com/job

Seen on networks

During the week we share the most interesting news from the world of big data and artificial intelligence on our social networks: Twitter, Facebook, Instagram and Linkedin

DeepMind Wants to Reimagine One of the Most Important Algorithms in Machine Learning

In one of the most important papers this year, DeepMind proposed a multi-agent structure to redefine PCA. Redefining PCA sounds ludicrous. And yet, DeepMind’s thesis makes perfect sense the minute you deep dive into it. You can read it in the following link: EigenGame: PCA as a Nash Equilibrium

Which Open-source Data Integration Tool Is Right for Your Project?

An interesting review for those interested in open-source data integration solutions: Data Integration Tooling.

Great New Resource for Natural Language Processing Research and Applications

The NLP Index is a brand new resource for NLP code discovery, combining and indexing more than 3,000 paper and code pairs at launch. If you are interested in NLP research and locating the code and papers needed to understand an implement the latest research, you should check this out

Great New Resource Natural Language Processing Research Applications

And so far, the summary of week 21 of this 2021. We invite you to share this article with your contacts. See you in networks!

Att, Damavis

Corina Schuster

Articles: 30