------------------------------------------------------------------------------------------------------------------------------------------- name: log: /Users/bernardofanfani/Desktop/teaching/research_topics_labor/lab_5/fifth_lab/lesson_5.log log type: text opened on: 4 Nov 2024, 11:25:36 . . * this command is used to install the ivreg2 program. . * caution, you need an internet connection for this command to work . * once ivreg2 is installed, it is unnecessary to repeat the installation . ssc install ivreg2, replace checking ivreg2 consistency and verifying not already installed... the following files will be replaced: /Users/bernardofanfani/Library/Application Support/Stata/ado/plus/i/ivreg2.ado /Users/bernardofanfani/Library/Application Support/Stata/ado/plus/i/ivreg2_p.ado installing into /Users/bernardofanfani/Library/Application Support/Stata/ado/plus/... installation complete. . . . //***// ANGRIST & KRUEGER 1991 //***// . //***// DOES COMPULSORY SCHOOL ATENDANCE AFFECT SCHOOLING AND EARNINGS? //***// . . use "NEW7080_class.dta", clear . . des Contains data from NEW7080_class.dta Observations: 329,509 Variables: 5 2 Oct 2020 15:00 ------------------------------------------------------------------------------------------------------------------------------------------- Variable Storage Display Value name type format label Variable label ------------------------------------------------------------------------------------------------------------------------------------------- AGE byte %8.0g EDUC byte %8.0g Years of Education LWKLYWGE float %9.0g Log Weekly Earnings QOB byte %8.0g AGEQSQ float %9.0g ------------------------------------------------------------------------------------------------------------------------------------------- Sorted by: . . * quarter of birth (this is our instrument) . tab QOB QOB | Freq. Percent Cum. ------------+----------------------------------- 1 | 81,671 24.79 24.79 2 | 80,138 24.32 49.11 3 | 86,856 26.36 75.47 4 | 80,844 24.53 100.00 ------------+----------------------------------- Total | 329,509 100.00 . . * structural equation (the equation of interest we want to estimate) . reg LWKLYWGE EDUC AGE AGEQSQ Source | SS df MS Number of obs = 329,509 -------------+---------------------------------- F(3, 329505) = 14654.85 Model | 17874.236 3 5958.07866 Prob > F = 0.0000 Residual | 133963.635 329,505 .406560249 R-squared = 0.1177 -------------+---------------------------------- Adj R-squared = 0.1177 Total | 151837.871 329,508 .460801773 Root MSE = .63762 ------------------------------------------------------------------------------ LWKLYWGE | Coefficient Std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- EDUC | .0710828 .000339 209.67 0.000 .0704183 .0717473 AGE | -.0040677 .0038383 -1.06 0.289 -.0115907 .0034552 AGEQSQ | .0000985 .0000428 2.30 0.021 .0000146 .0001824 _cons | 4.973327 .0852297 58.35 0.000 4.806279 5.140375 ------------------------------------------------------------------------------ . . * since EDUC is endogenous, I use the trimester of birth as the instrument (Z) . . * First-Stage . reg EDUC i.QOB AGE AGEQSQ, robust Linear regression Number of obs = 329,509 F(5, 329503) = 204.35 Prob > F = 0.0000 R-squared = 0.0031 Root MSE = 3.2762 ------------------------------------------------------------------------------ | Robust EDUC | Coefficient std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- QOB | 2 | .0421986 .053966 0.78 0.434 -.0635732 .1479705 3 | .0831214 .0379061 2.19 0.028 .0088265 .1574163 4 | .103829 .0237661 4.37 0.000 .057248 .1504099 | AGE | .0008214 .0685509 0.01 0.990 -.1335364 .1351791 AGEQSQ | -.0006705 .0007609 -0.88 0.378 -.0021618 .0008207 _cons | 14.04041 1.538946 9.12 0.000 11.02412 17.0567 ------------------------------------------------------------------------------ . . * Are instrumental variables relevant? (do they explain the variation in X?) . * we can test the hypothesis with an F-test. . *(H0: the coefficients associated with each quarter of birth are all equal to zero) . test 1.QOB = 2.QOB = 3.QOB = 4.QOB = 0 ( 1) 1b.QOB - 2.QOB = 0 ( 2) 1b.QOB - 3.QOB = 0 ( 3) 1b.QOB - 4.QOB = 0 ( 4) 1b.QOB = 0 Constraint 4 dropped F( 3,329503) = 14.61 Prob > F = 0.0000 . . * now we estimate the predicted values of the first-stage regression . cap drop educ_hat1 . predict educ_hat1, xb . . * one way to obtain the estimated IV coefficient is as follows. . * with this procedure the standard error is incorrect . reg LWKLYWGE educ_hat1 AGE AGEQSQ, robust Linear regression Number of obs = 329,509 F(3, 329505) = 9.33 Prob > F = 0.0000 R-squared = 0.0001 Root MSE = .6788 ------------------------------------------------------------------------------ | Robust LWKLYWGE | Coefficient std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- educ_hat1 | .1534995 .0313977 4.89 0.000 .0919609 .2150381 AGE | -.0008237 .0042954 -0.19 0.848 -.0092426 .0075952 AGEQSQ | .0001183 .0000465 2.55 0.011 .0000272 .0002094 _cons | 3.735769 .4805662 7.77 0.000 2.793873 4.677665 ------------------------------------------------------------------------------ . . . * to correctly estimate the IV coefficient and standard error . ivreg2 LWKLYWGE AGE AGEQSQ (EDUC = i.QOB), robust IV (2SLS) estimation -------------------- Estimates efficient for homoskedasticity only Statistics robust to heteroskedasticity Number of obs = 329509 F( 3,329505) = 8.95 Prob > F = 0.0000 Total (centered) SS = 151837.8708 Centered R2 = -0.0405 Total (uncentered) SS = 11621827.82 Uncentered R2 = 0.9864 Residual SS = 157989.7196 Root MSE = .6924 ------------------------------------------------------------------------------ | Robust LWKLYWGE | Coefficient std. err. z P>|z| [95% conf. interval] -------------+---------------------------------------------------------------- EDUC | .1534996 .0320643 4.79 0.000 .0906547 .2163445 AGE | -.0008237 .0043861 -0.19 0.851 -.0094202 .0077729 AGEQSQ | .0001183 .0000475 2.49 0.013 .0000252 .0002113 _cons | 3.735766 .4906762 7.61 0.000 2.774058 4.697473 ------------------------------------------------------------------------------ Underidentification test (Kleibergen-Paap rk LM statistic): 43.816 Chi-sq(3) P-val = 0.0000 ------------------------------------------------------------------------------ Weak identification test (Cragg-Donald Wald F statistic): 14.571 (Kleibergen-Paap rk Wald F statistic): 14.607 Stock-Yogo weak ID test critical values: 5% maximal IV relative bias 13.91 10% maximal IV relative bias 9.08 20% maximal IV relative bias 6.46 30% maximal IV relative bias 5.39 10% maximal IV size 22.30 15% maximal IV size 12.83 20% maximal IV size 9.54 25% maximal IV size 7.80 Source: Stock-Yogo (2005). Reproduced by permission. NB: Critical values are for Cragg-Donald F statistic and i.i.d. errors. ------------------------------------------------------------------------------ Hansen J statistic (overidentification test of all instruments): 2.854 Chi-sq(2) P-val = 0.2401 ------------------------------------------------------------------------------ Instrumented: EDUC Included instruments: AGE AGEQSQ Excluded instruments: 2.QOB 3.QOB 4.QOB ------------------------------------------------------------------------------ . . ************ . * MONOTONICITY ASSUMPTION . * this analysis is not part of the original paper results . . * maybe the Monotonicity assumption is not likely to be satisfied, because being born in Jan-March could be negatively correlated with ed > ucation among drop-outs, but it also has a positive effect on school performance (thus schooling) among those that stay longer in school > and obtain a higher education (this would also be a violation of the exclusion restriction, i.e. instrument validity) . . * first check if QUOB has the same effect for different education levels . * NB: THIS IS NOT A FORMAL TEST ON THE MONOTONICITY ASSUMPTION (WHICH IS UNTESTABLE) . * IT IS MORE OF A ROBUSTNESS CHECK (HEURISTIC TEST) . reg EDUC i.QOB AGE AGEQSQ if inrange(EDUC,0,11), robust Linear regression Number of obs = 75,412 F(5, 75406) = 96.14 Prob > F = 0.0000 R-squared = 0.0062 Root MSE = 2.0939 ------------------------------------------------------------------------------ | Robust EDUC | Coefficient std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- QOB | 2 | -.1901107 .0719963 -2.64 0.008 -.3312232 -.0489982 3 | -.0861126 .0505902 -1.70 0.089 -.1852691 .0130439 4 | -.0038165 .0317314 -0.12 0.904 -.06601 .0583769 | AGE | -.2721012 .0914209 -2.98 0.003 -.4512857 -.0929168 AGEQSQ | .0024034 .0010125 2.37 0.018 .0004188 .004388 _cons | 15.96273 2.056815 7.76 0.000 11.93139 19.99408 ------------------------------------------------------------------------------ . reg EDUC i.QOB AGE AGEQSQ if inrange(EDUC,12,20), robust Linear regression Number of obs = 254,097 F(5, 254091) = 3.40 Prob > F = 0.0045 R-squared = 0.0001 Root MSE = 2.4529 ------------------------------------------------------------------------------ | Robust EDUC | Coefficient std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- QOB | 2 | .1082066 .0460029 2.35 0.019 .0180421 .1983711 3 | .0583297 .0322925 1.81 0.071 -.0049628 .1216223 4 | .0422909 .0202255 2.09 0.037 .0026494 .0819324 | AGE | .073471 .0585255 1.26 0.209 -.0412374 .1881795 AGEQSQ | -.0008301 .000649 -1.28 0.201 -.0021021 .000442 _cons | 12.3594 1.31504 9.40 0.000 9.781955 14.93684 ------------------------------------------------------------------------------ . * indeed it seems that quarter of birth has an opposite effect on schooling among likely drop-outs and among higher-educated people . . . * let's estimate the IV separately for the two groups . ivreg2 LWKLYWGE AGE AGEQSQ (EDUC = i.QOB) if inrange(EDUC,0,11), robust IV (2SLS) estimation -------------------- Estimates efficient for homoskedasticity only Statistics robust to heteroskedasticity Number of obs = 75412 F( 3, 75408) = 11.85 Prob > F = 0.0000 Total (centered) SS = 38864.20818 Centered R2 = -0.0193 Total (uncentered) SS = 2399705.59 Uncentered R2 = 0.9835 Residual SS = 39613.53297 Root MSE = .7248 ------------------------------------------------------------------------------ | Robust LWKLYWGE | Coefficient std. err. z P>|z| [95% conf. interval] -------------+---------------------------------------------------------------- EDUC | .151147 .0806989 1.87 0.061 -.0070199 .3093139 AGE | .0099169 .0094461 1.05 0.294 -.008597 .0284308 AGEQSQ | .0000397 .0001021 0.39 0.698 -.0001605 .0002398 _cons | 3.764732 .8830386 4.26 0.000 2.034008 5.495456 ------------------------------------------------------------------------------ Underidentification test (Kleibergen-Paap rk LM statistic): 18.740 Chi-sq(3) P-val = 0.0003 ------------------------------------------------------------------------------ Weak identification test (Cragg-Donald Wald F statistic): 6.154 (Kleibergen-Paap rk Wald F statistic): 6.248 Stock-Yogo weak ID test critical values: 5% maximal IV relative bias 13.91 10% maximal IV relative bias 9.08 20% maximal IV relative bias 6.46 30% maximal IV relative bias 5.39 10% maximal IV size 22.30 15% maximal IV size 12.83 20% maximal IV size 9.54 25% maximal IV size 7.80 Source: Stock-Yogo (2005). Reproduced by permission. NB: Critical values are for Cragg-Donald F statistic and i.i.d. errors. ------------------------------------------------------------------------------ Hansen J statistic (overidentification test of all instruments): 1.638 Chi-sq(2) P-val = 0.4409 ------------------------------------------------------------------------------ Instrumented: EDUC Included instruments: AGE AGEQSQ Excluded instruments: 2.QOB 3.QOB 4.QOB ------------------------------------------------------------------------------ . . ivreg2 LWKLYWGE AGE AGEQSQ (EDUC = i.QOB) if inrange(EDUC,12,20), robust IV (2SLS) estimation -------------------- Estimates efficient for homoskedasticity only Statistics robust to heteroskedasticity Number of obs = 254097 F( 3,254093) = 24.77 Prob > F = 0.0000 Total (centered) SS = 103889.8602 Centered R2 = -0.0069 Total (uncentered) SS = 9222122.229 Uncentered R2 = 0.9887 Residual SS = 104607.4227 Root MSE = .6416 ------------------------------------------------------------------------------ | Robust LWKLYWGE | Coefficient std. err. z P>|z| [95% conf. interval] -------------+---------------------------------------------------------------- EDUC | -.003464 .0985793 -0.04 0.972 -.196676 .1897479 AGE | -.0109623 .0067194 -1.63 0.103 -.0241321 .0022074 AGEQSQ | .0001617 .0000736 2.20 0.028 .0000174 .000306 _cons | 6.199586 1.497905 4.14 0.000 3.263746 9.135426 ------------------------------------------------------------------------------ Underidentification test (Kleibergen-Paap rk LM statistic): 7.070 Chi-sq(3) P-val = 0.0697 ------------------------------------------------------------------------------ Weak identification test (Cragg-Donald Wald F statistic): 2.347 (Kleibergen-Paap rk Wald F statistic): 2.357 Stock-Yogo weak ID test critical values: 5% maximal IV relative bias 13.91 10% maximal IV relative bias 9.08 20% maximal IV relative bias 6.46 30% maximal IV relative bias 5.39 10% maximal IV size 22.30 15% maximal IV size 12.83 20% maximal IV size 9.54 25% maximal IV size 7.80 Source: Stock-Yogo (2005). Reproduced by permission. NB: Critical values are for Cragg-Donald F statistic and i.i.d. errors. ------------------------------------------------------------------------------ Hansen J statistic (overidentification test of all instruments): 6.804 Chi-sq(2) P-val = 0.0333 ------------------------------------------------------------------------------ Instrumented: EDUC Included instruments: AGE AGEQSQ Excluded instruments: 2.QOB 3.QOB 4.QOB ------------------------------------------------------------------------------ . . * THE INSTRUMENT IS WEAK IN BOTH SPECIFICATIONS (WEAKER ABOVE 12 YEARS OF EDUCATION) . * BELOW 12 YEARS OF EDUCATION THE MAIN RESULT IS SIMILAR TO THE FULL-SAMPLE RESULT . * IT SEEMS THAT THE IV RESULT IS DRIVEN BY LOW EDUCATED (DROP-OUTS) . . . . . . cap log close