Damavis Summary of week 21, 2021
Avoiding UDFs in Apache Spark, working in Damavis, and a guide to discovering which open-source tool is the right for your project
It is well known that the use of UDFs (User Defined Functions) in Apache Spark, and especially in using the Python API, can compromise our application performace. For this reason, at Damavis we try to avoid their use as much…
Definitive guide to configure the Pyspark development environment in Pycharm; one of the most complete options. Spark has become the Big Data tool par excellence, helping us to process large volumes of data in a simplified, clustered and fault-tolerant way.…