sparkbq: Google BigQuery Support for sparklyr

CRAN_Status_Badge Rdoc

sparkbq is a sparklyr extension package providing an integration with Google BigQuery. It builds on top of spark-bigquery, which provides a Google BigQuery data source to Apache Spark.

Version Information

You can install the released version of sparkbq from CRAN via


or the latest development version through

devtools::install_github("miraisolutions/sparkbq", ref = "develop")

The following table provides an overview over supported versions of Apache Spark, Scala, and Google Dataproc:

sparkbq spark-bigquery Apache Spark Scala Google Dataproc
0.1.x 0.1.0 2.2.x and 2.3.x 2.11 1.2.x and 1.3.x

sparkbq is based on the Spark package spark-bigquery which is available in a separate GitHub repository.

Example Usage


config <- spark_config()

sc <- spark_connect(master = "local[*]", config = config)

# Set Google BigQuery default settings
  billingProjectId = "<your_billing_project_id>",
  gcsBucket = "<your_gcs_bucket>",
  datasetLocation = "US",
  serviceAccountKeyFile = "<your_service_account_key_file>",
  type = "direct"

# Reading the public shakespeare data table
hamlet <- 
    name = "hamlet",
    projectId = "bigquery-public-data",
    datasetId = "samples",
    tableId = "shakespeare") %>%
  filter(corpus == "hamlet") # NOTE: predicate pushdown to BigQuery!
# Retrieve results into a local tibble
hamlet %>% collect()

# Write result into "mysamples" dataset in our BigQuery (billing) project
  datasetId = "mysamples",
  tableId = "hamlet",
  mode = "overwrite")


When running outside of Google Cloud it is necessary to specify a service account JSON key file. The service account key file can be passed as parameter serviceAccountKeyFile to bigquery_defaults or directly to spark_read_bigquery and spark_write_bigquery.

Alternatively, an environment variable export GOOGLE_APPLICATION_CREDENTIALS=/path/to/your/service_account_keyfile.json can be set (see for more information). Make sure the variable is set before starting the R session.

When running on Google Cloud, e.g. Google Cloud Dataproc, application default credentials (ADC) may be used in which case it is not necessary to specify a service account key file.

Further Information