Dependencies:
Use these commands if you want to develop features related to the core spark library.
The example is intended for the Spark 4.0.0 release cycle.
If the maven build-process fails with unexplained errors, check the system logs (dmesg
) for Out-Of-Memory situations.
# Clone the project, do your changes, build it
git clone https://github.com/apache/spark/
cd spark
# Do changes in the project...
# Build the entire project & install JARs to the local repository...
./build/mvn -DskipTests clean install
# ...or build specific components
./build/mvn -pl :spark-sql_2.13 -DskipTests install
./build/mvn -pl :spark-catalyst_2.13 -DskipTests install
# The spark JARs will be placed under your home-directory, in an .m2/repository subdirectory.
Now, to run some code using the development-jars, adapt the following code:
// dev-test.sc
import coursierapi.MavenRepository
interp.repositories.update(
// TODO: set the path to your local maven repository
interp.repositories() ::: List(MavenRepository.of("file:///home/adrian/.m2/repository"))
)
@
// TODO: set the versions to match the spark release cycle
import $ivy.`org.apache.spark::spark-core:4.0.0-SNAPSHOT`
import $ivy.`org.apache.spark::spark-sql:4.0.0-SNAPSHOT`
import org.apache.spark.sql.SparkSession
@main
def main() : Unit = {
val spark = SparkSession
.builder()
.appName("Spark SQL basic example")
.master("local[*]")
.getOrCreate()
spark.sql("""
CREATE TABLE IF NOT EXISTS hello
USING csv
OPTIONS (header=true)
LOCATION 'store/'
AS (select 2 as col)
""")
}
And run with ammonite interpreter:
amm dev-test.sc
To run test-suites:
build/mvn -Dtest=none -DwildcardSuites=org.apache.spark.sql.test.DataFrameReaderWriterSuite test