Updated July 6, 2023

Introduction to Relational Database

A Relational Database is an assortment of relational data that are arranged in the form of tables, columns, and rows, along with their relationship properties. Each table will have a uniquely identified key-value, called as Primary key, which can be connected to other related tables with the help of a Foreign key from those tables. The data from the Relational database can be accessed, modified or managed by querying with the SQL (structured query language) programming language.

Accessing SQL Data with R

R objects are managed within a single memory. With data analysis in R, data needs to be available in the form of data frames. The data frame is a two-dimensional array of heterogeneous data.

R can also help import data if it exists in the form of .csv. We can use functions like read.csv () and to export, we use to write.xxx (). It has the ability to save the data frames to the disk-like saveRDS (), save (), etc.

Why Use a Relational Database?

Relational databases in R can handle large amounts of data, maintains data consistency. By default, R will read all the data into memory before performing any analysis. Database servers will determine an efficient method to obtain results through queries.

Data Types

As we seen above, we use data frame as a table, observation as row (tuple), variable as column (attribute), various ([] subset, order, sort) as select statements.

Connectivity

R can connect to relational databases like MYSQL, Oracle, and sql server. We use the data frames to fetch records. Once, we fetch the records it will be a dataset in R.

We can analyze and manipulate data according to our needs using powerful packages and functions.

Importing Packages

R has an inbuilt package named “RMySQL” to provide connectivity to the MySQL database.

Installing package

Install.packages(“RMySQL”)

After the package is installed, we can create a connection and connect to the database.

#creating connection
mysqlconnection = dbConnect(MySQL(), user = 'root', password = '', dbname = 'table1',
host = 'localhost')
# listing the tables in DB
dbListTables(mysqlconnection)

With dbListTables, it displays all the tables in that particular database.

Querying tables

querying and getting the data

result = dbSendQuery(mysqlconnection, "select * from names")

Store the result in an R data frame object. n = 3 is used to fetch the first 3 rows.

data.frame = fetch(result, n = 3)
print(data.fame)

Output:

Id	First _name	Last _name
1	John	Campbell
2	Carl	Hooper
3	Jenna	Taylor

The output displays the first 3 results from the table ‘names’.

Filter and querying results

We can filter the results by any column name and display results.

Let’s suppose, we want to display all the rows with last name being ‘Taylor’

Query:

result = dbSendQuery(mysqlconnection, "select * from actor where Last_name = 'Taylor'")

Fetch all the records and store them as a data frame.

data.frame = fetch(result, n = -1)
print(data)

Output:

S.no	First_name	Last_name
3	Jenna	Taylor
57	Dan	Taylor
96	Keth	Taylor

Updating the Rows

It is quite easy to modify data in RMySQL.

Let’s look at the query below to understand the same.

Query:

dbSendQuery(mysqlconnection, "update names set last_name = Cooper where First_name = Jenna")

Dropping Tables in RMySQL

We can also drop tables using the drop query.

Query:

dbSendQuery(mysqlconnection, 'drop table if exists names')

We can do such several operations with the RMySQL package and the only requirement is to install the package and connect to the particular database. It is then very similar to SQL queries.

Using the Dpylr Package

This dplyr package is particularly useful when there is already data in the database or in-memory data.

If the data fits in the memory, then if that data is stored in the database. It becomes slower in accessing and retrieving data compared to in-memory data.

If we have to do data analysis in R, most of the data is present in the database. We can import the data into the local database through ‘dpylr’ tools.

Installing Dpylr

In order to use the dpylr package. We need to install the package first.

In the below example, Let’s see how to install dpylr package.

Example:

install.packages(“dbplyr”)

This package allows us to connect to backend databases according to our needs.

RMySQL	My SQL and Maria
RPostgreSQL	Postgres and Redshift
RSQLite	SQLite database
odbc	commercial database
bigrquery	Google’s bigquery

Connecting to the database

After installing dpylr, we need to connect to the specific db first.

To connect to a DB,we use dbconnect().

library(dplyr)
conn <- DBI::dbConnect(RSQLite::SQLite(), path = ":dbname:")

Writing queries

Dplyr makes it easier to write queries using verbs. Most of them can write SQL queries using the dplyr package. Dplyr also automatically generates SQL queries.

Example:

names_db %>% select(Id,First_name,Last_name)

Output:

Id	First _name	Last _name
1	John	Campbell
2	Carl	Hooper
3	Jenna	Taylor

DBI

DBI is used to separate connectivity between the frontend and backend through DBMS. Backend communicates with specific DBMS like SQLite, MySQL, etc. Also, it provides drivers to be invoked using methods.

Quiz Result
Total Questions	Correct Answers	Wrong Answers	Percentage

Introduction to Relational Database

Accessing SQL Data with R

Why Use a Relational Database?

Data Types

Connectivity

Importing Packages

Installing package

Querying tables

Filter and querying results

Updating the Rows

Dropping Tables in RMySQL

Using the Dpylr Package

Installing Dpylr

Connecting to the database

Writing queries

DBI

Recommended Articles

Follow us!

APPS

Blog

Courses

Email