I've uploaded a couple of files you can use for time series regression. To load the first dataframe, use:

`docsUK<-read.csv("http://www.courseserve.info/files/MaternalMortality-DocsUK.csv")`

The dependent variable, Y, is maternal mortality (deaths per 100,000 live births) and the independent variable, X, is number of physicians (per 1,000 population).

To use the second, use:

`suicidewages<-read.csv("http://www.courseserve.info/files/Suicide-Wages.csv")`

Please take a few minutes to complete a qualitative course feedback report. Thank you!

#SOCY7113arima.r

library(TSA)

library(forecast)

library(expsmooth)

# Let's look at two time series. In this data frame,

# Y is the monthly unemployment rate for women, 20+ yrs,

# and X is the same rate for men, 20+.

USunemp<-read.csv("http://www.courseserve.info/files/us-unempl-gender.csv")

plot(USunemp$Month,USunemp$Y)

plot(USunemp$Month,USunemp$X)

# We can ask whether or not the unemployment rate for men predicts

# the unemployment rate for women. We might use regular linear

# regression:

summary(lm(USunemp$Y~USunemp$X))

#SOCY7113arma.r

library(TSA)

# Let's look at a time series

data(airmiles, package="datasets")

plot(airmiles)

# It appears that there is a trend. We can try to fit a model

# of the effect of time.

dv1<-1:length(airmiles)

summary(lm(airmiles~dv1))

# Now, let's model a quadratic function for the trend.

dv2<-dv1^2

summary(lm(airmiles~dv1+dv2))

# We'll look at autocorrection.

acf(airmiles,main="ACF airmiles")

acf(airmiles,type="p",main="ACF airmiles")

plot(y=airmiles, x=zlag(airmiles), type="p")

# We'll look at moving average.

# In R, you can create your own functions

We'll discuss first the nature of different data methods (cross-sectional, longitudinal, panel, time series).

With time series data, we need to become familiar with some basic terms and concepts before we can discuss how to analyze it.

* stochastic process

* "random walk"

* moving average

* stationarity

* autocorrelation

* differencing

The process of modeling time series data has three parts:

a) specification

b) fitting

and c) diagnostics.

R script:

# http://www.courseserve.info/files/SOCY7113trends.r

# SOCY7113trends.r

# Load libraries.

install.packages("TSA")

#SOCY7113lmer.r

# Load libraries.

library(lme4)

library(multcomp)

# Load the data file.

GiniObesity<-read.csv("http://www.courseserve.info/files/GiniObesityLong.csv")

# We can look at the variables with the summary() function.

# In this file, time is measured with 't' and 'statenum' refers

# to the cases.

summary(GiniObesity)

# We'll first test the random intercept model.

GO1<-lmer(obesity~hgini+(1|statenum),data=GiniObesity)

summary(GO1)

# Now we'll test the random intercept and slope model.

GO2<-lmer(obesity~hgini+(t|statenum),data=GiniObesity)

summary(GO2)

* data type

* censored cases

* "survival" function

* "hazard" function

* log-rank test for differences in survival functions

* Cox's regression

R code:

# http://www.courseserve.info/files/SOCY7113survival.r

# SOCY7113survival.r

# Load libraries.

library(survival)

# Open the data file.

data("standford2", package="survival")

# We'll create groups by dividing the cases into "older than median" and "younger than median"

for (i in 1:length(stanford2$age)) { if(stanford2$age[i] > median(stanford2$age)) stanford2$group[i]=1 else stanford2$group[i]=0}

# We're going to compare three techniques for investigating latent structure:

# PCA, MDS, and k-means clustering.

library(psych)

# V085064D feminists

# V085064E federal government

# V085064F Jews

# V085064G liberals

# V085064H middle class people

# V085064J labor unions

# V085064K poor people

# V085064M military

# V085064N big business

# V085064P people on welfare

# V085064Q conservatives

# V085064R working class people

# V085064S environmentalists

# V085064T Supreme Court

# V085064U gays and lesbians

# V085064V Asian-Americans

# V085064W Congress

# V085064Y Blacks

This is the third method for investigating latent structure.

# We're going to look at k-means clustering using a subset of variables from the ANES,

# the feeling thermometers for various social groups, to look for latent structure

# that might underlie grouping of these social objects into clusters.

# To facilitate our interpretation of the clusters, I've listed the variables here:

# V085064D feminists

# V085064E federal government

# V085064F Jews

# V085064G liberals

# V085064H middle class people

# V085064J labor unions

# V085064K poor people

# V085064M military

Our last statistical tool is principal component analysis, a kind of exploratory factor analysis. It is a data reduction strategy. We use patterns of covariance to identify a number of underlying "factors" -- in this case, principal components -- and try to interpret these as indicating latent factors. We can then use these principal components in place of the original variables to do further analysis, such as linear regression.

library(psych)