Application - R

Installing and Basic use of R via R-Studio

Summary

R is an open source statistical package that runs on Windows, Mac, or Linux.  R-Studio packages essential R windows into a one window tool for ease of use.  Using R is via a command line interface and is relatively easy to use at a basic level and also very powerful for advanced statistical analysis.  For more information see The R Project for Statistical Computing .




1. Download R and Rstudio

download rstudio from http://www.rstudio.com/products/rstudio/download/

download R from: http://cran.rstudio.com


Watch Up and Running with R at lynda.bethel.edu

 

2. Install R and RStudio

a. Run installer for R (Mac click on the r.pkg installer)

 b. Run installer for RStudio (Mac - in RStudio folder window drag RStudio.app to Applications folder)

 

3. Open Programs or Applications folder and open RStudio

Watch R Statistics Essential Training at lynda.bethel.edu

 

 

4.  To import and enter data

To open or enter data:

Open a data file (excel csv for example)

Click on Import Dataset in the Environment Tab and select from text file

 

 

 

To Enter data - one variable (vector) at a time.  

Use the "scan" command, return, and then after the 1: type in your data with a space between each value. (Replace "var6" with the name of your variable.)

>var6=scan()

Use the "c" command

>var7= c(5,10,15,20,25,30)   

View your data by typing your variable name and return or look in the Environment tab Values window.

> var7
[1] 5 10 15 20 25 30

 

5. To Edit Data (edit)

edit(var6)

edit(sample.data$var1)

The above commands open a data edit window.  Change or correct your data and click Save.

Remove data:  rm(data_1,data_2,data_3)

6. Basic Statistical Analyses

To get summary statistics of a data set (sample.data)

summary(sample.data)

 

To get mean of one variable

mean(sample.data$var5)

 

To view a histogram of one variable

hist(sample.data$var5)

 

To view a correlation matrix of var 1-var 5 in sample.data

cor(sample.data)

 

7. Scatterplot of data: 

The basic function is plot(x, y), where x and y are numeric vectors denoting the (x,y) points to plot.

 

plot(sample.data$var1, sample.data$var5, main="Scatterplot of var1 by var 5")

plot(var1 , var6)

Add a regression line to your scatterplot: abline(lm(y, x), col="color")

 abline(lm(sample.data$var5~sample.data$var1), col="green")

 

8. Least Squares Regression

>plot(sample.data$var1,sample.data$var5)  (Step 1 Make a scatterplot to see if the data might fit a linear model.)

>cor(sample.data$var1,sample.data$var5)      (This command returns the correlation (r) between our two variables to see if there is a significant correlation.)

>fit(lm (sample.data$var5 ~ sample.data$var1))     (This command defines the model as linear (lm) our and sets our dependent y  and independent x variables in our linear model.)

>fit    (The fit command then makes a "call" to our linear model defined above and returns)

>summary(fit)      (The summary command returns an F test of the fit to the model for your regression.)

For more on Linear Least Squares Regression in R

 

 

Resources and Links:

Using RStudio: https://support.rstudio.com/hc/en-us/sections/200107586-Using-RStudio

Quick Rhttp://www.statmethods.net

R Tutorialhttp://www.cyclismo.org/tutorial/R/index.html

 

Cheat Sheet for R and RStudio (pdf):  http://www.ocf.berkeley.edu/~janastas/docs/RStudioCheatSheet.pdf

A (very) short introduction to R (pdf): http://cran.r-project.org/doc/contrib/Torfs+Brauer-Short-R-Intro.pdf

 Introduction to Probability and Statistics Using R: (pdf) http://cran.rstudio.com/web/packages/IPSUR/vignettes/IPSUR.pdf