Spark core-library development workflow

Dependencies:

Use these commands if you want to develop features related to the core spark library. The example is intended for the Spark 4.0.0 release cycle. If the maven build-process fails with unexplained errors, check the system logs (dmesg) for Out-Of-Memory situations.

# Clone the project, do your changes, build it
git clone https://github.com/apache/spark/
cd spark
# Do changes in the project...

# Build the entire project & install JARs to the local repository...
./build/mvn -DskipTests clean install

# ...or build specific components
./build/mvn -pl :spark-sql_2.13      -DskipTests install
./build/mvn -pl :spark-catalyst_2.13 -DskipTests install

# The spark JARs will be placed under your home-directory, in an .m2/repository subdirectory.

Now, to run some code using the development-jars, adapt the following code:

// dev-test.sc
import coursierapi.MavenRepository

interp.repositories.update(
  // TODO: set the path to your local maven repository
  interp.repositories() ::: List(MavenRepository.of("file:///home/adrian/.m2/repository"))
)
@
// TODO: set the versions to match the spark release cycle
import $ivy.`org.apache.spark::spark-core:4.0.0-SNAPSHOT`
import $ivy.`org.apache.spark::spark-sql:4.0.0-SNAPSHOT`

import org.apache.spark.sql.SparkSession

@main
def main() : Unit = {

  val spark = SparkSession
              .builder()
              .appName("Spark SQL basic example")
              .master("local[*]")
              .getOrCreate()

  spark.sql("""
    CREATE TABLE IF NOT EXISTS hello
    USING csv
    OPTIONS (header=true)
    LOCATION 'store/'
    AS (select 2 as col)
  """)
}

And run with ammonite interpreter:

amm dev-test.sc

To run test-suites:

build/mvn -Dtest=none -DwildcardSuites=org.apache.spark.sql.test.DataFrameReaderWriterSuite test