Databricks

Databricks is a managed platform for running Apache Spark - that means that you do not have to learn complex cluster management concepts nor perform tedious maintenance tasks to take advantage of Spark.

Databricks is a company founded by the creators of Apache Spark that aims to help clients with cloud-based big data processing using Spark.

Databricks develops a web-based platform for working with Spark, that provides automated cluster management and

IPython -style notebooks

Databricks was founded by the creators of the Apache Spark project and accounts for over 75% of the code base for the Spark project. Aside from being a huge force behind the Spark project with respect to development, Databricks also offers various certifications in Spark for developers, administrators, trainers, and analysts alike. However, Databricks is not the only main contributor to the code base; companies such as IBM, Cloudera, and Microsoft also actively participate in Apache Spark development.

As a side note, Databricks also organizes the Spark Summit (in both Europe and the US), which is the premier Spark conference and a great place to learn about the latest developments in the project and how others are using Spark within their ecosystem.

Throughout this book, we will give recommended links that we read daily that offer great insights and also important changes with respect to the new versions of Spark. One of the best resources here is the Databricks blog, which is constantly being updated with great content. Be sure to regularly check this out at https://databricks.com/blog.

Also, here is a link to see the past Spark Summit talks, which you may find helpful: http://slideshare.net/databricks.

.

Last updated