Application - R
Installing and Basic use of R via R-Studio
Summary
R is an open source statistical package that runs on Windows, Mac, or Linux. R-Studio packages essential R windows into a one window tool for ease of use. Using R is via a command line interface and is relatively easy to use at a basic level and also very powerful for advanced statistical analysis. For more information see The R Project for Statistical Computing .
1. Download R and Rstudio
download rstudio from http://www.rstudio.com/products/rstudio/download/
download R from: http://cran.rstudio.com
Watch Up and Running with R at lynda.bethel.edu
2. Install R and RStudio
a. Run installer for R (Mac click on the r.pkg installer)
b. Run installer for RStudio (Mac - in RStudio folder window drag RStudio.app to Applications folder)
3. Open Programs or Applications folder and open RStudio
Watch R Statistics Essential Training at lynda.bethel.edu
4. To import and enter data
To open or enter data:
Open a data file (excel csv for example)
Click on Import Dataset in the Environment Tab and select from text file
To Enter data - one variable (vector) at a time.
Use the "scan" command, return, and then after the 1: type in your data with a space between each value. (Replace "var6" with the name of your variable.)
>var6=scan()
Use the "c" command
>var7= c(5,10,15,20,25,30)
View your data by typing your variable name and return or look in the Environment tab Values window.
> var7
[1] 5 10 15 20 25 30
5. To Edit Data (edit)
edit(var6)
edit(sample.data$var1)
The above commands open a data edit window. Change or correct your data and click Save.
Remove data: rm(data_1,data_2,data_3)
6. Basic Statistical Analyses
To get summary statistics of a data set (sample.data)
summary(sample.data)
To get mean of one variable
mean(sample.data$var5)
To view a histogram of one variable
hist(sample.data$var5)
To view a correlation matrix of var 1-var 5 in sample.data
cor(sample.data)
7. Scatterplot of data:
The basic function is plot(x, y), where x and y are numeric vectors denoting the (x,y) points to plot.
plot(sample.data$var1, sample.data$var5, main="Scatterplot of var1 by var 5")
plot(var1 , var6)
Add a regression line to your scatterplot: abline(lm(y, x), col="color")
abline(lm(sample.data$var5~sample.data$var1), col="green")
8. Least Squares Regression
>plot(sample.data$var1,sample.data$var5) (Step 1 Make a scatterplot to see if the data might fit a linear model.)
>cor(sample.data$var1,sample.data$var5) (This command returns the correlation (r) between our two variables to see if there is a significant correlation.)
>fit(lm (sample.data$var5 ~ sample.data$var1)) (This command defines the model as linear (lm) our and sets our dependent y and independent x variables in our linear model.)
>fit (The fit command then makes a "call" to our linear model defined above and returns)
>summary(fit) (The summary command returns an F test of the fit to the model for your regression.)
For more on Linear Least Squares Regression in R
Resources and Links:
Using RStudio: https://support.rstudio.com/hc/en-us/sections/200107586-Using-RStudio
Quick R: http://www.statmethods.net
R Tutorial: http://www.cyclismo.org/tutorial/R/index.html
Cheat Sheet for R and RStudio (pdf): http://www.ocf.berkeley.edu/~janastas/docs/RStudioCheatSheet.pdf
A (very) short introduction to R (pdf): http://cran.r-project.org/doc/contrib/Torfs+Brauer-Short-R-Intro.pdf
Introduction to Probability and Statistics Using R: (pdf) http://cran.rstudio.com/web/packages/IPSUR/vignettes/IPSUR.pdf