Spark core-library development workflow

Dependencies:

OpenJDK 17
Git
16GB of memory (RAM+Swap)
Ammonite for Scala 2.13 (https://ammonite.io/) to run the example

Use these commands if you want to develop features related to the core spark library. The example is intended for the Spark 4.0.0 release cycle. If the maven build-process fails with unexplained errors, check the system logs (dmesg) for Out-Of-Memory situations.

# Clone the project, do your changes, build it
git clone https://github.com/apache/spark/
cd spark
# Do changes in the project...

# Build the entire project & install JARs to the local repository...
./build/mvn -DskipTests clean install

# ...or build specific components
./build/mvn -pl :spark-sql_2.13      -DskipTests install
./build/mvn -pl :spark-catalyst_2.13 -DskipTests install

# The spark JARs will be placed under your home-directory, in an .m2/repository subdirectory.

Now, to run some code using the development-jars, adapt the following code:

// dev-test.sc
import coursierapi.MavenRepository

interp.repositories.update(
  // TODO: set the path to your local maven repository
  interp.repositories() ::: List(MavenRepository.of("file:///home/adrian/.m2/repository"))
)
@
// TODO: set the versions to match the spark release cycle
import $ivy.`org.apache.spark::spark-core:4.0.0-SNAPSHOT`
import $ivy.`org.apache.spark::spark-sql:4.0.0-SNAPSHOT`

import org.apache.spark.sql.SparkSession

@main
def main() : Unit = {

  val spark = SparkSession
              .builder()
              .appName("Spark SQL basic example")
              .master("local[*]")
              .getOrCreate()

  spark.sql("""
    CREATE TABLE IF NOT EXISTS hello
    USING csv
    OPTIONS (header=true)
    LOCATION 'store/'
    AS (select 2 as col)
  """)
}

And run with ammonite interpreter:

amm dev-test.sc

To run test-suites:

build/mvn -Dtest=none -DwildcardSuites=org.apache.spark.sql.test.DataFrameReaderWriterSuite test