JVM Benchmark Based on Apache Spark MapReduce Engine

This setup is designed to replicate the Spark benchmark used in the paper [Benchmarking Weak Memory Models] (https://kar.kent.ac.uk/51638/).

This runs Apache Spark GraphX PageRank on 20 million lines of the [SNAP LiveJournal dataset] (http://snap.stanford.edu/data/soc-LiveJournal1.html). It was inspired by a similar benchmark described by Lokesh Gidra et al in their paper on [NumaGiC] (http://dl.acm.org/citation.cfm?id=2694361).

Requirements

Java 7+ (e.g. OpenJDK 1.7 or 1.8)

Generally sudo apt-get install openjdk-7-jdk should be sufficient on Ubuntu.

Setup

Update BENCH_ROOT and JAVA_HOME in env.sh
Download sources (around 500MiB of data), run ./download.sh
Unpack the sources, run ./unpack.sh

Usage

source env.sh
./run.sh <number-of-runs>

Tuning

This benchmark is tuned for systems with around 8 cores and 16 GiB of RAM.

Memory usage can be adjusted by changing SPARK_WORKER_MEMORY in conf/spark-env.sh, and spark.driver.memory and spark.executor.memory in conf/spark-defaults.conf. For increased memory usage the input dataset (in the data directory) should be increased, i.e. use more lines from src/soc-LiveJournal1.txt.gz.

Concurrency can be adjusted by changing spark.akka.threads in conf/spark-defaults.conf. For increased concurrency you should also increase the input data set size and potentially adjust the number of partitions in the run.sh (default 16).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

JVM Benchmark Based on Apache Spark MapReduce Engine

Requirements

Setup

Usage

Tuning

Files

README.md

Latest commit

History

README.md

File metadata and controls

JVM Benchmark Based on Apache Spark MapReduce Engine

Requirements

Setup

Usage

Tuning