Tag pyspark

The use of Window in Apache Spark

The use of Window in Apache Spark

When processing data we often find ourselves in a situation where we want to calculatevariables over certain subset of observations. For example, we might be interested in theaverage value per group or the maximum value for each group. groupBy and…

First steps with Pyspark and Pycharm

First steps to program in Pyspark and Pycharm

Definitive guide to configure the Pyspark development environment in Pycharm; one of the most complete options. Spark has become the Big Data tool par excellence, helping us to process large volumes of data in a simplified, clustered and fault-tolerant way.…