Interact with SQL Server

tidyverse
code
sql
Author

Ravi C

Published

February 13, 2023

We will go through how to connect to a sql server and fetch tables.

To connect to SQL Server, we will use the DBI and ODBC package. Please note you need to create a ODBC connection in Windows using the built in “ODBC Data Source” application.

con <- DBI::dbConnect(odbc::odbc(), "db_name")

Once the connection is create using the code above you can now use dplyr tbl function to query the tables

tbl(con, "table_name")

The table from tbl is a lazy table and to assign it to tibble or data frame we need the collect function.

tbl_tibble <-
  tbl(con, "table_name") |>
  collect()

The above code works fine if your table is under the “dbo” schema, but what if it is on a different schema?

We can use the in_schema function from the dbplyr package.

tbl(con, dbplyr::in_schema("schema", "table_name"))
Warning

There is an issue with nanodbc where NVARCHAR fields are out of bounds for the connection and it wont return a result, I was able to find a solution on stackoverflow here.

The code below will basically arrrange the columns in order and the NVARCHAR fields that have CHARACTER_MAXIMUM_LENGTH = -1 will be replaced with 100000 so that they are at the end of select query.

col_names <-
  data_type |>
  mutate(
    col_nm = case_when(
      is.na(CHARACTER_MAXIMUM_LENGTH) ~ 10,
      CHARACTER_MAXIMUM_LENGTH == -1 ~ 100000,
      TRUE ~ as.double(CHARACTER_MAXIMUM_LENGTH)
    )
  ) |>
  arrange(col_nm) |>
  pull(COLUMN_NAME) |>
  paste(collapse = ", ")

query <- paste("SELECT", col_names, "FROM schema.table_name")

tbl(con, sql(query))