September 30th

Analyzing PUMS data using the individual records (Stata + Excel)

—STATA for Pums Analysis—

• Request extract from site
-2008 ACS including : (state, sex, age, race, place of birth, divorced in last year)
*Note person weight can be very important
• Download data file, codebook, STATA do file
• Modify the .do file to change the memory allocation (data set is often to big for the do file)
• Manipulate individual level data in STATA (recoding, subsetting observations, etc.)
• Change the unit of observation (individuals -> age groups, etc)
• Write a .csv file from STATA
• Produce interactive plots of our answers

—Dummy Variables –

• Only has two possible answers (0, 1)
• Mean of a dummy variable is the proportion of the sample with the “1” value
• Pums data doesn’t come in dummy variable form
• We have to regenerate it as (0,1)

—STATA Stuff—

• gen newvar = ( oldvar == some value)
-eg. gen attending = (school == 2)
• Build dummies into single code:
gen code = 1* cat1dummy + 2* cat2dummy + 3* cat3dummy
-eg. gen racecode = 1* white + 2* Black + 3* Othrace
• Drop observations you don’t want
-eg. drop if sex == 2
• Collapse from individual to group level data
-eg. collapse (mean)birth [fweight=perwt], by (statefip agecat)
• Insheet (excel -> STAT) and Outsheet (STATA -> excel)

—Working with the Data—

• First look over the codebook and do file and make sure everything is where it needs to be
• Focus on the variables we want to work with and make a note of how they are coded
• We then decide how we want to recode the data from our variables
• Sorting through the data we have to figure out which data we want to omit or categorize as other

Log File for Stata commands for Table 3 of Handout

CSV file from outsheet command