Here we will go over a basic introduction to R Programming

What is OOP and what is an “Object”?

In computer science an object can be a variable, data-structure, function, method or any data of a specific value, which can be classified by an identifier……so pretty much everything in R is an object. Objects can be of multiple different types, based on the nature of the variable.

Types of objects

Atomic Vectors

This is the simplest form of vector data, where vector data is the simplest 1 dimensional array of any data type. There are 6 types of atomic vectors.

# Lets see different types of Atomic Vectors in R
# Atomic vector of type character.
print("abcdef")

# Atomic vector of type double.
print(89.214)

# Atomic vector of type integer.
print(93L)

# Atomic vector of type logical.
print(FALSE)

# Atomic vector of type complex.
print(9+36i)

# Atomic vector of type raw.
print(charToRaw('hi!'))

Attributes

You can attach peices of info to atomic vectors, this can be information about the vectors itself and is helpful when classifying vector data. Think of this as “metadata”

# Lets make a vector
mario <- c("mario", "luigi", "peaches", "donkeykong", "browser", "toad", "giuseppe", "koopa", "spike", "penguinking", "kamek", "goomba")

# Check vector
mario

# Check if mario has any attributes assigned to it
attributes(mario)

Names

Most common metadata assigned to vectors are names, dimensions and classes. You can look up each of these attributes using helper functions, which looks for that specific attribute.

# Check if mario has any name attributes assigned to it
names(mario)

# Lets assign names to mario vector
# Remember that the metadata wont change if you change the actual data.
names(mario) <- c("good", "good", "good", "good", "villan", "good", "good", "villan", "villan", "good", "good", "villan") 

# Check if mario has any name attributes assigned to it
names(mario)

# R also displays metadata when you look at the object
mario

Dims

You can transform an atomic vector into a complex n-dimensional array via the attribute dim. This can be especially useful when organising and classifying datasets.

# Lets check dim
dim(mario)

# Increase data dimensionality
dim(mario) <- c(3, 4)

# Check data dimsnions now
dim(mario)

Matrices

Matrices store data in a 2-dimensional array. Similar to dimensions in dim we can allocate the number of rows using the nrow command in the function matrix.

# Lets add 2 rows
mat <- matrix(mario, nrow = 2)
mat

Array

This is a complex form of dataset organisation, where a dataset can be organised into a n-dimensional array. Think of a hypergeometric mode of data allocation, where you are storing data in the form of a complex hypercube of multiple dimensions. In this instance provide data in the first instance as a vector and dimensions as a dim in the the second instance.

# Lets create a hypergeometric array in the format of a hypercube
ar <- array(c(1:4, 5:8, 9:12, 13:16, ), dim = c(2, 2, 3, 3))

Class

The class describes the format of data in R. You can check the class of data using the class() function.

class(mario)
class(ar)

If a dataset doesnt have a class assigned to it, the class() function will default to the vector type and output what type of vector data exists.

class("hello there!")
class(2019)

Date & Time

We can check date and time that has passed from 12:00 a.m. Jan. 1, 1970 POSIXct

time_now <- Sys.time()
time_now

typeof(time_now)

class(time_now)

Factors

Factors are R’s way of storing categorical information, like ethnicity or eye color. Think of a factor as something like a sex; it can only have certain values (male or female), and these values may have their own idiosyncratic order. This arrangement makes factors very useful for recording the treatment levels of a study and other categorical variables.

To make a factor, pass an atomic vector into the factor function. R will recode the data in the vector as integers and store the results in an integer vector. R will also add a levels attribute to the integer, which contains a set of labels for displaying the factor values, and a class attribute, which contains the class factor

# Create a string, note how I use factor to ensure that the string vector is of type factor
gender <- factor(c("male", "female", "female", "male"))

typeof(gender)
## "integer"

attributes(gender)
## $levels
## [1] "female" "male"  
## 
## $class
## [1] "factor"

Coercion

R likes organization. If your have no idea what you are doing and combining multiple types of data, R will attempt to allocate one unified class to your data by coercing it. Forexample if you are combining a integer vector with a acharacter vector, R will convert all values to character. It will also try to apply logic to mathematical questions, for example if you ask it the sum of logical values it will coerce the logic to a numerical attribute and perform the calculation.

sum(c(TRUE, TRUE, FALSE, FALSE)) #is the same as
sum(c(1, 1, 0, 0))

You can overide R and force R to assign data to a certain type:

as.character(1)
as.logical(1)
as.numeric(FALSE)

Lists

Lists are one dimensional atomic vectors that can contain complex datasets. It is simply an organizational method. Here the function list creats a list similar to how c creates a vector.

list_test <- (100:150, c("R", "C++", "C"), list("TRUE", "TRUE", "FALSE", 3))
list_test

Dataframes

A dataframe is a 2-dimensional mode of a list, lists had 1 dimsnion, but dataframes have 2. This is the amongst the most useful form of data storage in R. Data frames group vectors together into a two-dimensional table. Each vector becomes a column in the table. As a result, each column of a data frame can contain a different type of data; but within a column, every cell must be the same type of data.

Lets load some test data. Click on the link and download this test dataset which contains all 52 playing cards in a classical deck (without jokers) deck.csv. Once you have downloaded the data now lets proceed to load it into R. If you notice the format on the file is .csv or comma seperated value file. We can load such files into R directly using the following commands:

# Load file into R
deck_cards <- write.csv(r"(FILEPATH\deck.csv)", row.names = 1)

# View DF
head(deck_cards)

# Lets check the type of file
typeof(deck_cards)

Now that we have loaded a dataframe, as you can see we can view data and analyze for any purpose.

# Lets add the -log10 in a new column 
deck_cards$neglog10card <- -log10(deck_cards$value)

# Finally lets save our file
write.csv(deck_cards, file=r"(FILEPATH\deck_modified.csv)")

There are many more fun things we can do with R, but in the interest of time we will focus on using R to perfrom Seurat’s built-in functions.