Time Series Data

I've uploaded a couple of files you can use for time series regression. To load the first dataframe, use:
docsUK<-read.csv("http://www.courseserve.info/files/MaternalMortality-DocsUK.csv")

The dependent variable, Y, is maternal mortality (deaths per 100,000 live births) and the independent variable, X, is number of physicians (per 1,000 population).

To use the second, use:
suicidewages<-read.csv("http://www.courseserve.info/files/Suicide-Wages.csv")

Feedback Requested

Please take a few minutes to complete a qualitative course feedback report. Thank you!

Time Series III: time series regression models

#SOCY7113arima.r
library(TSA)
library(forecast)
library(expsmooth)

# Let's look at two time series. In this data frame,
# Y is the monthly unemployment rate for women, 20+ yrs,
# and X is the same rate for men, 20+.
USunemp<-read.csv("http://www.courseserve.info/files/us-unempl-gender.csv")
plot(USunemp$Month,USunemp$Y)
plot(USunemp$Month,USunemp$X)

# We can ask whether or not the unemployment rate for men predicts
# the unemployment rate for women. We might use regular linear
# regression:
summary(lm(USunemp$Y~USunemp$X))

Time Series II: Moving Average, Autoregresive

#SOCY7113arma.r
library(TSA)

# Let's look at a time series
data(airmiles, package="datasets")
plot(airmiles)

# It appears that there is a trend. We can try to fit a model
# of the effect of time.
dv1<-1:length(airmiles)
summary(lm(airmiles~dv1))

# Now, let's model a quadratic function for the trend.
dv2<-dv1^2
summary(lm(airmiles~dv1+dv2))

# We'll look at autocorrection.
acf(airmiles,main="ACF airmiles")
acf(airmiles,type="p",main="ACF airmiles")
plot(y=airmiles, x=zlag(airmiles), type="p")

# We'll look at moving average.
# In R, you can create your own functions

Time Series I: Trends

We'll discuss first the nature of different data methods (cross-sectional, longitudinal, panel, time series).

With time series data, we need to become familiar with some basic terms and concepts before we can discuss how to analyze it.
* stochastic process
* "random walk"
* moving average
* stationarity
* autocorrelation
* differencing

The process of modeling time series data has three parts:
a) specification
b) fitting
and c) diagnostics.

R script:
# http://www.courseserve.info/files/SOCY7113trends.r
# SOCY7113trends.r

# Load libraries.
install.packages("TSA")

Longitudinal Analysis (Mixed Effects Models)

#SOCY7113lmer.r

# Load libraries.
library(lme4)
library(multcomp)

# Load the data file.
GiniObesity<-read.csv("http://www.courseserve.info/files/GiniObesityLong.csv")

# We can look at the variables with the summary() function.
# In this file, time is measured with 't' and 'statenum' refers
# to the cases.
summary(GiniObesity)

# We'll first test the random intercept model.
GO1<-lmer(obesity~hgini+(1|statenum),data=GiniObesity)
summary(GO1)

# Now we'll test the random intercept and slope model.
GO2<-lmer(obesity~hgini+(t|statenum),data=GiniObesity)
summary(GO2)

Event History Analysis (Survival Analysis)

* data type
* censored cases
* "survival" function
* "hazard" function
* log-rank test for differences in survival functions
* Cox's regression

R code:
# http://www.courseserve.info/files/SOCY7113survival.r
# SOCY7113survival.r

# Load libraries.
library(survival)

# Open the data file.
data("standford2", package="survival")

# We'll create groups by dividing the cases into "older than median" and "younger than median"
for (i in 1:length(stanford2$age)) { if(stanford2$age[i] > median(stanford2$age)) stanford2$group[i]=1 else stanford2$group[i]=0}

Latent Structure

# We're going to compare three techniques for investigating latent structure:
# PCA, MDS, and k-means clustering.

library(psych)

# V085064D feminists
# V085064E federal government
# V085064F Jews
# V085064G liberals
# V085064H middle class people
# V085064J labor unions
# V085064K poor people
# V085064M military
# V085064N big business
# V085064P people on welfare
# V085064Q conservatives
# V085064R working class people
# V085064S environmentalists
# V085064T Supreme Court
# V085064U gays and lesbians
# V085064V Asian-Americans
# V085064W Congress
# V085064Y Blacks

Clustering

This is the third method for investigating latent structure.

# We're going to look at k-means clustering using a subset of variables from the ANES,
# the feeling thermometers for various social groups, to look for latent structure
# that might underlie grouping of these social objects into clusters.
# To facilitate our interpretation of the clusters, I've listed the variables here:

# V085064D feminists
# V085064E federal government
# V085064F Jews
# V085064G liberals
# V085064H middle class people
# V085064J labor unions
# V085064K poor people
# V085064M military

Principal Component Analysis

Our last statistical tool is principal component analysis, a kind of exploratory factor analysis. It is a data reduction strategy. We use patterns of covariance to identify a number of underlying "factors" -- in this case, principal components -- and try to interpret these as indicating latent factors. We can then use these principal components in place of the original variables to do further analysis, such as linear regression.

library(psych)

Syndicate content