Vectors

I think of a vector as an ordered column of data. It can be consist of numbers or text, but its basic construction is as an ordered column of data. The following sequences are all examples of vectors.

1:15 #generates a sequence from 1 to 15 by 1;
> [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

seq(from = 1, to = 25, by = 1) #generates a sequence from 1 to 25 by 1;
> [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

seq(1, 25, 1) #generates a sequence from 1 to 25 by 1;
> [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Again, we can assign names to these types of objects. Below, the name X is assigned to a vector of integers from 1 to 15 and Y is assigned to a vector with three elements: 1,2,3.

X <- 1:15
Y <- 1:3
X
> [1] 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15
Y
> [1] 1 2 3

The important thing is that vectors need not be sequences. Let’s assign some values to Z, using the concatenate function which takes n elements and makes them a vector using the following command: c(element 1, element 2, … , element n ).

Z <- c(143, 5640, 2601, 902, 506) 
Z
> [1] 143 5640 2601 902 506

Now the vector can be operated upon using typical mathematical functions, which are conducted element-wise meaning each element within the vector has the functino performed on it in order. Below we define a vector Z and then add ten to each element.

Z <- c(143, 5640, 2601, 902, 506) 
Z
> [1] 143 5640 2601 902 506

Z + 10
> [1] 153 5650 2611 912 516

Alternately we could square each element:

Z <- c(143, 5640, 2601, 902, 506) 
Z^2
> [1]  20449 31809600  6765201   813604   256036

However, vectors of numbers are not the only kind of vectors we can make. For example, we can make a vector of text by putting each element inside quotation marks, but we would not be ablet o perform any mathematical operations on that vector.

People <- c("Professor X.", "Professor Y.", "Professor Z.")
People
People + 6
> ## Error in People + 6: non-numeric argument to binary operator
> [1] "Professor X." "Professor Y." "Professor Z."

Confidence Interval Plots Using Tidy

install.packages("tidyr", dependencies = TRUE)
 
 
library(ggplot2)
library(tidyr)
library(dplyr)
theme_set(theme_bw())
 
Group <- sample(LETTERS[1:3], 1000, replace = TRUE)
Group2 <- sample(LETTERS[24:26], 1000, replace = TRUE)
catDV <- sample(1:5, 1000, replace = TRUE)
catDV2 <- sample(1:5, 1000, replace = TRUE)
 
data <- data.frame(Group, Group2, catDV, catDV2)
 
Tidy <- data %>%
group_by(Group, Group2) %>%
  summarise(Mean = mean(catDV, na.rm=TRUE) ,
            stdDev = sd(catDV, na.rm=TRUE),
            NN = length(catDV))
 
Tidy

# A tibble: 9 x 5
# Groups:   Group [?]
   Group Group2     Mean   stdDev    NN
  <fctr> <fctr>    <dbl>    <dbl> <int>
1      A      X 3.117021 1.382349    94
2      A      Y 3.112150 1.268836   107
3      A      Z 3.099099 1.458237   111
4      B      X 3.185185 1.399281   135
5      B      Y 3.174312 1.489743   109
6      B      Z 2.845528 1.498811   123
7      C      X 2.934066 1.533349    91
8      C      Y 2.761905 1.431270   105
9      C      Z 2.936000 1.401290   125

Tidy$SE <- Tidy$stdDev/sqrt(Tidy$NN)
Tidy$Low <- Tidy$Mean  - 1.96 * Tidy$SE
Tidy$High <- Tidy$Mean + 1.96 * Tidy$SE
 
 
qplot(data = Tidy, x = Group, y = Mean, ymin = Low, ymax = High,
      geom= "pointrange", colour = Group2 , facets = ~Group2)

Objects and Data Structures

Objects and Assigning Values


What makes R a more flexible programming language than, say, Stata, is that it allows us to call objects by ‘name.’ What the heck does that mean? It means that as an object-oriented programming language, R allows us to store objects on the workspace and give them a name. Then, instead of retyping or re-entering whatever data we have stored, we can simply refer to the object we have created and named. We do this using an assignment arrow which is a ‘less than’ symbol followed by a dash, with the name of the object on the left side and whatever is being assigned to it on the right hand side. For example, X <- 2, produces an object ‘X’ (note that R is case sensitive) and assigns to it the numeric value of two. Below, X is a type object called a ‘scalar,’ and its class (as indicated using the class command) is numeric.

X <- 2
X
> [1] 2

sqrt(X)
> [1] 1.414214

X^2
>[1] 4

x # here I typed a lower case x, which returns the error below. 
> ## Error in eval(expr, envir, enclos): object 'x' not found

class(X)
> [1] "numeric"

This might not seem that important to you now, but having the ability to name objects and call them by name is essential when one wants to write a program to solve a particular problem. Given that the size of R’s workspace is only limited by the memory of the computer one is using, it is easier to name more objects if doing so makes your code clearer. The best code is that which is clear not only to the original programmer, but would also be clear to anyone who begins from the top of the script and executes the code line-by-line. Comments can help with clarity, as can clear object names.

Introduction to R

Installing R and R Studio

R Studio is an excellent IDE (integrated development environment) for the R language which provides a variety of tools and quality of life features.

To get set up using R with R Studio, you should first install native R. Navigate here: https://cran.r-project.org/ and follow the links to the version of R that is compatible with your operating system. Once you’ve installed R you should then install R Studio, again paying attention to your particular operating system, by navigating here: https://www.rstudio.com/products/rstudio/download2/. For use on a single machine, choose the Desktop edition.

Working in R Studio: Always Send Commands from a Script

Once you open R studio, you’ll want to create a new R script from the file menu:

Once you type a few lines of code into the new script, it may look something like this:

To run or execute the code, simply click on a line (or highlight the portion of code you’d want to send) and either key ctrl + r (Windows) or cmd + enter (Mac OS). Once executed, the code will produce results in the console shown below:

> my.dice.simulator(10)
[1] 9.5
> my.dice.simulator(100)
[1] 10.53
> my.dice.simulator(1000)
[1] 10.473
> my.dice.simulator(5000)
[1] 10.4862
> 

Setting a Working Directory

A working directory is the place where R looks for any files you’d like to load and saves any output or graphics. I’d advise using designated subfolders for each project.

Notice the direction of the slashes, as they vary between Mac and PC. The getwd() command will print the name of the folder you’ve specified so you can confirm you’ve done things correctly. Having a specified folder to save things to is especially nice when saving graphics, plots, etc.

The folder shown above can be set as my working directory by using the following command:

getwd() #shows you your current working directory

> [1] "C:/Users/cdesante/Dropbox/Stats Book"

setwd( "C:/Users/cdesante/Dropbox/Indiana/Fall 2016/Y575 - Grad Stats I/" )
# Notice which way the / go; if you copy from Windows Explorer, you'll have to reverse them.
getwd() #Hey, look, we changed it!

> [1] "C:/Users/cdesante/Dropbox/Stats Book"

Alternatively, when working in R Studio, you could click on “Session” -> “Set Working Directory” -> “Choose Directory…” and navigate to the folder you would like to set as the working directory.

Assigning Object Names

Objects are things that reside in R’s workspace. There are three main rules for naming them:
1. EvErYThInG in R is CaSe SeNsITiVe
2. Object names cannot begin with a number
3. Object names cannot contain spaces.

Basic Coding Tips

As you first begin to code, everything is going to seem daunting, but there are a few things that you can do to make things easier for yourself:
1. Annotate your code so that someone else who reads it understands what you’re doing
2. Object names should be somewhat intuitive; if you wanted to name an object that contained a set of test scores you might name it “test.scores” as opposed to “ts2016” or “obj1,” etc.
3. Again, Object names cannot begin with a number, and they cannot contain spaces.

R as a Calculator

Now you know that code is written in the script window, processed by R, and then the results are shown in the Console window. From here on out this document will use embedded R code with the console output shown following the commands. For example, the next section shows the same code from above but with each block of code immediately followed by the R output it would generate. The lines of output begin with >. The [1] that begins each output line indicates the output has exactly one element. Chunks of code that all appear together can be thought of as being ‘sent’ to R in one command. You should also note that R is case sensitive; meaning that UPPER CASE and lower case letters will be interpreted differently.

5+4 # Addition
6-3 # Subtraction
34/6 # Division
5*3 # Multiplication
5^4 # Exponents
25^(1/2) # More exponents
sqrt(25) # take the square root of 25
# Pre-stored constants:
pi # And a few others
log(10) #logs in base e
> [1] 9
> [1] 3
> [1] 5.666667
> [1] 15
> [1] 625
> [1] 5
> [1] 5
> [1] 3.141593
> [1] 2.302585

You may notice that there are lines in this code that begin with #; these are comments left by the coder for anyone who may read the code at a later date. When R processes lines that begin with a # it ignores what is written after it until a new line begins.

NA # Missing value
NULL # Nothing
0/0 # NaN means "Not a number"
1/0 # Inf means infinity
# R also handles order of operations:
# Please Excuse My Dear Aunt Sally
2*(3-4)+2
2*(3-4)+2*(4+3)^(1/3)
exp(2) # e to the 2
> [1] NA
> NULL
> [1] NaN
> [1] Inf
> [1] 0
> [1] 1.825862
> [1] 7.389056