Hello, R aficionados and data wranglers! If you’ve been itching to combine the statistical prowess of R programming with the heavyweight data-crunching power of BigQuery, you’ve come to the right place.
As we all know, data is the new gold, and being able to mine it effectively is crucial to uncovering valuable insights. Google BigQuery acts as an expansive mine, loaded with data waiting to be extracted, and R is the tool you need to refine that raw material into actionable intel.
In this friendly guide, we’ll walk through the simple yet impactful process of connecting R to Google BigQuery.
Install bigrquery R package
We will use ‘bigrquery’ package to access the BigQuery database with R. The table fetched will be in the dplyr’s tibble format.
You can install the package using install.packages('bigrquery')
command.
Connect to BigQuery database
Let’s connect to a BigQuery database using the dbConnect()
function.
library(bigrquery)
library(dplyr)
con <- dbConnect(
bigrquery::bigquery(),
project = project-ID,
dataset = dataset-name,
billing = project-billing-ID
)
Find project-ID
The BigQuery projects are listed in the ‘Explorer’ panel in BigQuery Studio.
For example, in the screenshot above, ‘midyear-destiny-379317’ is the project-ID. I created this dummy project for learning purposes.
Find dataset-name
Expand ‘midyear-destiny-379317’ project to reveal the datasets in this project. These datasets contain stats of several YouTube channels.
‘channeldetails’ and ‘videodetails’ are two datasets in this project. We will use the former dataset in this article.
Find project-billing-ID
Go to the Account Management page and there you will find ‘Billing account ID’. Copy it and use it as the project-billing-ID.
Import data from BigQuery table
Now let’s fetch some data from a table in the ‘channeldetails’ dataset.
library(bigrquery)
library(dplyr)
con <- dbConnect(
bigrquery::bigquery(),
project = "midyear-destiny-379317",
dataset = "channeldetails",
billing = "******-******-******"
)
To list the table names in the BigQuery dataset, use the R code below.
dbListTables(con)
Output: [1] “channelstats” “gamingchannelstats”
As you can see there are two tables. We will query “gamingchannelstats” table and save the first 10 rows as a tibble.
# make SQL query
sql = "SELECT date, channel_id, subscribers
FROM `midyear-destiny-379317.channeldetails.gamingchannelstats`
WHERE EXTRACT(YEAR FROM PARSE_DATE('%F', date)) = 2023 AND EXTRACT(MONTH FROM PARSE_DATE('%F', date)) = 12"
# Run the query and store the data in a tibble
tb <- bq_project_query("midyear-destiny-379317", sql)
# save table as a tibble
bg_data = bq_table_download(tb, n_max = 10)
As you can see, we have extracted this data of 3 columns and 10 rows from a BigQuery table using R programming. Similarly, we can easily import more data with additional columns and more rows. All we need to do is modify the SQL query.