Himanshu Arora & Nityanand Yadav – 10 things I wish I’d known before using Spark in production

You have recently started working on Spark and your jobs are taking forever to finish ? This talk is for you! We have compiled many spark best practices, optimisation and tweaks that we have applied over the years in production to make our jobs faster and less resource consuming. In this talk we will learn about advanced spark tuning, data serialisation formats, storage formats, hardware tuning, better data locality, control over parallelism and GC tuning etc. We will also discover the appropriate usage of RDD, DataFrame and Dataset in order to take full advantage of spark internal optimisations.


Leave a Reply

Your email address will not be published. Required fields are marked *