Create SQL DB in R

tidyverse
code
sql
database
Author

Ravi C

Published

February 28, 2023

To produce a SQL database, we will utilize the RSQLite and DBI packages and the “iris” data set available in R.

We verify that the data set is available for loading.

iris |> head()
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

Then, using the RSQLite and DBI packages, we create a new in-memory database.

con <- DBI::dbConnect(
  drv = RSQLite::SQLite(),
  dbname = ":memory:"
  )

Next, we utilize the dbWriteTable function from the DBI package to store the “iris” data set into a SQL table in the in-memory database created in the previous step. As shown in the output, a table named “iris_sql_tbl” is created.

In the query below, replace “conn” with your connection name from the previous step, “name” with your preferred name for the SQL table (in this case, it is “iris_sql_tbl”), and “value” with the data set being written to the SQL database.

DBI::dbWriteTable(
  conn = con,
  name = "iris_sql_tbl",
  value = iris
)

DBI::dbListTables(con)
[1] "iris_sql_tbl"

Using the tbl function from the tidyverse package, we can execute a query on the newly created SQL database table.

tbl(con, "iris_sql_tbl")
# Source:   table<iris_sql_tbl> [?? x 5]
# Database: sqlite 3.40.0 [:memory:]
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
          <dbl>       <dbl>        <dbl>       <dbl> <chr>  
 1          5.1         3.5          1.4         0.2 setosa 
 2          4.9         3            1.4         0.2 setosa 
 3          4.7         3.2          1.3         0.2 setosa 
 4          4.6         3.1          1.5         0.2 setosa 
 5          5           3.6          1.4         0.2 setosa 
 6          5.4         3.9          1.7         0.4 setosa 
 7          4.6         3.4          1.4         0.3 setosa 
 8          5           3.4          1.5         0.2 setosa 
 9          4.4         2.9          1.4         0.2 setosa 
10          4.9         3.1          1.5         0.1 setosa 
# ... with more rows

We can also run querries using the SQL connection and not have to pull the data in R.

tbl(con, "iris_sql_tbl") |>
  summarize(
    max_length = max(Sepal.Length, na.rm = TRUE),
    min_length = min(Sepal.Length, na.rm = TRUE)
  )
# Source:   SQL [1 x 2]
# Database: sqlite 3.40.0 [:memory:]
  max_length min_length
       <dbl>      <dbl>
1        7.9        4.3

Alternatively, we can use the dbGetQuery function from DBI to execute actual SQL code.

Warning

I ran into an issue where the querry was not able to find the column name without double quotes, will update here if I come across an explanation to it. For now, I am adding a "" and using “" to escape it.

DBI::dbGetQuery(
  con,
  "SELECT
  MAX(\"Sepal.Length\") AS max_length,
  MIN(\"Sepal.Length\") AS min_length
  FROM iris_sql_tbl"
)
  max_length min_length
1        7.9        4.3

This approach will assist in generating blog posts with SQL connections without having to connect to a live SQL Server.