Avoiding UDFs in Apache Spark

In the world of Data Engineering, it is well known that the use of UDFs (User Defined Functions) in Apache Spark (especially with the Python API) can compromise our application performace. For this reason, at Damavis we try to avoid their use as much as possible in favour of using native functions or SQL. In … Continue reading Avoiding UDFs in Apache Spark