

Python 3.6 support is deprecated as of Spark 3.2.0. It’s easy to run locally on one machine - all you need is to have java installed on your system PATH, or the JAVA_HOME environment variable pointing to a Java installation. This should include JVMs on x86_64 and ARM64. Linux, Mac OS), and it should run on any platform that runs a supported version of Java. Spark runs on both Windows and UNIX-like systems (e.g.
Download spark for mac install#
Scala and Java users can include Spark in their projects using its Maven coordinates and Python users can install Spark from PyPI. Users can also download a “Hadoop free” binary and run Spark with any Hadoop version Downloads are pre-packaged for a handful of popular Hadoop versions. Spark uses Hadoop’s client libraries for HDFS and YARN.

This documentation is for Spark version 3.2.1. Get Spark from the downloads page of the project website. It also supports a rich set of higher-level tools including Spark SQL for SQL and structured data processing, MLlib for machine learning, GraphX for graph processing, and Structured Streaming for incremental computation and stream processing. It provides high-level APIs in Java, Scala, Python and R,Īnd an optimized engine that supports general execution graphs. Apache Spark is a unified analytics engine for large-scale data processing.
