use teaching_Understsociety.dta
**12680 observations
**This is NOT a longitudinal dataset. It's one wave only, so we can practice

summarize

sum age, d
gen pctile90=r(p90)         

tab sex, sum (age)
ta sex hiqual, col
ta sex hiqual, row
bysort sex: ta sempderived hiqual, col

replace scghq2=. if sghq<0

gen age_band=1 if age<=20
replace age_band=2 if age>20 & age<=30
replace age_band=3 if age>30 & age<=40
replace age_band=4 if age>40 & age<=50
replace age_band=5 if age>50 & age<=60
replace age_band=6 if age>60 & age<=70
replace age_band=7 if age>70 & age<=80
replace age_band=8 if age>80 
replace age_band=. if age==.

label define age_bands 1 "less20" 2 "20-30" 3 "30-40" 4 "40-50" 5 "50-60" 6 "60-70" 7 "70-80" 8 "more80"
label values age_band age_bands

recode age (18 19 = 1 "18 to 19") /// 
(20/29 = 2 "20 to 29") /// 
(30/39 = 3 "30 to 39") (else=.), generate(agegroups) label(agegroups) 

gen agesq=age^2
g female=(sex==2)
gen fem_less50=(sex==2 & age<50)

bysort sex: egen mean_age=mean(age)


ta couple
replace couple=0 if couple==2

gen id=_n
gen ID=_N
bysort sex: gen number=_N


histogram age, frequency
histogram age, frequency normal

**Horizontal bars
graph hbar , over(age_band)
**Vertical bars
graph bar, over(scghq2) by(sex)


tab hiqual_dv, gen (educ)

sort pidp
merge 1:1 pidp using indiv_health