------------------------------------------------------------------------------------------------------------------------------------------- name: log: /Users/bernardofanfani/Desktop/teaching/research_topics_labor/lab_3/third_lab/lecture3.log log type: text opened on: 25 Oct 2024, 19:29:59 . . * 2. ESPOLORATIVE ANALYSIS OF THE panel_rl.dta DATABASE AND POLYNOMIAL SPECIFICATION . use panel_rl.dta, clear . . describe Contains data from panel_rl.dta Observations: 515,414 Variables: 10 13 Oct 2022 11:56 ------------------------------------------------------------------------------------------------------------------------------------------- Variable Storage Display Value name type format label Variable label ------------------------------------------------------------------------------------------------------------------------------------------- id_soggetto float %9.0g Codice identificativo lavoratore id_azienda float %9.0g Codice identificativo azienda anno int %9.0g retrib03 float %9.0g retribuzione annuale riportata ad euro del 2003 uomo byte %8.0g tempo_d byte %9.0g contratto a tempo determinato occ_manuale byte %8.0g occupazione manuale n_dipendenti float %9.0g Numero dipendenti azienda settore float %13.0g sect Settore di attività azienda anno_nascita float %9.0g ------------------------------------------------------------------------------------------------------------------------------------------- Sorted by: id_soggetto . summarize Variable | Obs Mean Std. dev. Min Max -------------+--------------------------------------------------------- id_soggetto | 515,414 507190.3 299598.6 7 1078769 id_azienda | 515,414 42654.73 25273.63 1 87323 anno | 515,414 2000.01 .81552 1999 2001 retrib03 | 515,414 36980.14 24171.15 539 399924 uomo | 515,414 .6665942 .4714306 0 1 -------------+--------------------------------------------------------- tempo_d | 515,414 .095789 .294302 0 1 occ_manuale | 515,414 .6766425 .4677583 0 1 n_dipendenti | 515,414 381.0298 1037.647 1 9354 settore | 515,414 1.489403 .603487 1 3 anno_nascita | 515,414 1964.37 9.474107 1939 1982 . . * I generate the variable age and its powers . ge eta=anno-anno_nascita . . ge eta2=eta^2 . . ge eta3=eta^3 . . . * multivariate regression: . * I can insert "fixed effects" with the prefix i. . * the prefix i.variable inserts a dummy variable into the regression for each value of "variable" . . reg retrib03 eta n_dipendenti tempo_d occ_manuale uomo i.settore i.anno Source | SS df MS Number of obs = 515,414 -------------+---------------------------------- F(9, 515404) = 30749.89 Model | 1.0520e+14 9 1.1689e+13 Prob > F = 0.0000 Residual | 1.9592e+14 515,404 380137771 R-squared = 0.3494 -------------+---------------------------------- Adj R-squared = 0.3494 Total | 3.0113e+14 515,413 584244651 Root MSE = 19497 -------------------------------------------------------------------------------- retrib03 | Coefficient Std. err. t P>|t| [95% conf. interval] ---------------+---------------------------------------------------------------- eta | 562.7524 2.950287 190.74 0.000 556.97 568.5349 n_dipendenti | 3.655495 .0269801 135.49 0.000 3.602615 3.708375 tempo_d | -13667.28 94.3575 -144.85 0.000 -13852.22 -13482.35 occ_manuale | -21026.54 61.51605 -341.81 0.000 -21147.11 -20905.97 uomo | 12947.99 59.18251 218.78 0.000 12831.99 13063.98 | settore | Servizi | -5496.835 59.10781 -93.00 0.000 -5612.685 -5380.986 Altri settori | -5476.379 119.5129 -45.82 0.000 -5710.621 -5242.138 | anno | 2000 | -308.8896 66.75735 -4.63 0.000 -439.7319 -178.0473 2001 | -466.7969 66.68498 -7.00 0.000 -597.4973 -336.0964 | _cons | 25071.9 127.1806 197.14 0.000 24822.63 25321.17 -------------------------------------------------------------------------------- . . * specify income as a cubic function of age . . reg retrib03 eta eta2 eta3 n_dipendenti tempo_d occ_manuale uomo i.settore i.anno Source | SS df MS Number of obs = 515,414 -------------+---------------------------------- F(11, 515402) = 25374.36 Model | 1.0579e+14 11 9.6170e+12 Prob > F = 0.0000 Residual | 1.9534e+14 515,402 379005318 R-squared = 0.3513 -------------+---------------------------------- Adj R-squared = 0.3513 Total | 3.0113e+14 515,413 584244651 Root MSE = 19468 -------------------------------------------------------------------------------- retrib03 | Coefficient Std. err. t P>|t| [95% conf. interval] ---------------+---------------------------------------------------------------- eta | -1428.29 115.2209 -12.40 0.000 -1654.12 -1202.461 eta2 | 63.96913 3.101201 20.63 0.000 57.89087 70.04738 eta3 | -.6341895 .0268471 -23.62 0.000 -.6868091 -.5815699 n_dipendenti | 3.655694 .0269519 135.64 0.000 3.602869 3.708519 tempo_d | -13470.54 94.88028 -141.97 0.000 -13656.5 -13284.57 occ_manuale | -21003.45 61.56471 -341.16 0.000 -21124.11 -20882.78 uomo | 12878.27 59.1266 217.81 0.000 12762.39 12994.16 | settore | Servizi | -5437.054 59.04858 -92.08 0.000 -5552.788 -5321.321 Altri settori | -5387.755 119.359 -45.14 0.000 -5621.695 -5153.815 | anno | 2000 | -329.3663 66.66388 -4.94 0.000 -460.0254 -198.7072 2001 | -500.4556 66.60292 -7.51 0.000 -630.9953 -369.916 | _cons | 44085.8 1379.985 31.95 0.000 41381.07 46790.53 -------------------------------------------------------------------------------- . . *check the null hypothesis of linearity against the alternative hypothesis that the population regression is quadratic or cubic . . test eta2 eta3 ( 1) eta2 = 0 ( 2) eta3 = 0 F( 2,515402) = 771.00 Prob > F = 0.0000 . . * Effect of going from 39 to 40: first we predict fitted values . . predict y_hat (option xb assumed; fitted values) . . * the marginal effect is given by the variable "dif" in the output below. . * warning: this procedure is not correct for making statistical inference! . * the standard error of y_hat is calculated without taking into account that y_hat is a parameter derived from a regression model! . . ttest y_hat if inrange(eta, 39,40), by(eta) Two-sample t test with equal variances ------------------------------------------------------------------------------ Group | Obs Mean Std. err. Std. dev. [95% conf. interval] ---------+-------------------------------------------------------------------- 39 | 14,443 40823.07 103.9485 12492.43 40619.31 41026.82 40 | 13,685 41524.54 107.2645 12548.11 41314.29 41734.79 ---------+-------------------------------------------------------------------- Combined | 28,128 41164.35 74.67613 12524.23 41017.98 41310.72 ---------+-------------------------------------------------------------------- diff | -701.4744 149.3506 -994.2088 -408.74 ------------------------------------------------------------------------------ diff = mean(39) - mean(40) t = -4.6968 H0: diff = 0 Degrees of freedom = 28126 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 Pr(T < t) = 0.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 1.0000 . . . * to make statistical inference on marginal effects we replicate the regression from before. . * we use c.eta##c.eta##c.eta to specify a triple interaction of age (equivalent to entering eta eta2 and eta3, but this time we do not ha > ve to create the variables eta2 and eta3) . . reg retrib03 c.eta##c.eta##c.eta n_dipendenti tempo_d occ_manuale uomo i.settore i.anno Source | SS df MS Number of obs = 515,414 -------------+---------------------------------- F(11, 515402) = 25374.36 Model | 1.0579e+14 11 9.6170e+12 Prob > F = 0.0000 Residual | 1.9534e+14 515,402 379005318 R-squared = 0.3513 -------------+---------------------------------- Adj R-squared = 0.3513 Total | 3.0113e+14 515,413 584244651 Root MSE = 19468 ----------------------------------------------------------------------------------- retrib03 | Coefficient Std. err. t P>|t| [95% conf. interval] ------------------+---------------------------------------------------------------- eta | -1428.29 115.2209 -12.40 0.000 -1654.12 -1202.461 | c.eta#c.eta | 63.96913 3.101201 20.63 0.000 57.89087 70.04738 | c.eta#c.eta#c.eta | -.6341895 .0268471 -23.62 0.000 -.6868091 -.5815699 | n_dipendenti | 3.655694 .0269519 135.64 0.000 3.602869 3.708519 tempo_d | -13470.54 94.88028 -141.97 0.000 -13656.5 -13284.57 occ_manuale | -21003.45 61.56471 -341.16 0.000 -21124.11 -20882.78 uomo | 12878.27 59.1266 217.81 0.000 12762.39 12994.16 | settore | Servizi | -5437.054 59.04858 -92.08 0.000 -5552.788 -5321.321 Altri settori | -5387.755 119.359 -45.14 0.000 -5621.695 -5153.815 | anno | 2000 | -329.3663 66.66388 -4.94 0.000 -460.0254 -198.7072 2001 | -500.4556 66.60292 -7.51 0.000 -630.9953 -369.916 | _cons | 44085.8 1379.985 31.95 0.000 41381.07 46790.53 ----------------------------------------------------------------------------------- . . * the command below calculates the predicted values for each age level with the corrected error standards . . margins, over(eta) post Predictive margins Number of obs = 515,414 Model VCE: OLS Expression: Linear prediction, predict() Over: eta ------------------------------------------------------------------------------ | Delta-method | Margin std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- eta | 19 | 19230.04 133.2874 144.27 0.000 18968.8 19491.28 20 | 21474.25 110.0205 195.18 0.000 21258.62 21689.89 21 | 22793.53 90.3051 252.41 0.000 22616.53 22970.52 22 | 24188.3 74.00049 326.87 0.000 24043.26 24333.34 23 | 25499.78 61.17986 416.80 0.000 25379.87 25619.7 24 | 26625.19 51.73812 514.61 0.000 26523.78 26726.59 25 | 27759.93 45.52405 609.79 0.000 27670.7 27849.15 26 | 29026.03 42.07624 689.84 0.000 28943.57 29108.5 27 | 30131.31 40.59441 742.25 0.000 30051.75 30210.88 28 | 31211.02 40.23718 775.68 0.000 31132.15 31289.88 29 | 32410.52 40.35027 803.23 0.000 32331.43 32489.61 30 | 33494.65 40.46723 827.70 0.000 33415.34 33573.97 31 | 34400.8 40.38986 851.72 0.000 34321.64 34479.97 32 | 35226.95 40.06394 879.27 0.000 35148.42 35305.47 33 | 36024.07 39.54731 910.91 0.000 35946.56 36101.58 34 | 37137.38 38.95379 953.37 0.000 37061.03 37213.73 35 | 37880.88 38.42756 985.77 0.000 37805.56 37956.2 36 | 38831.76 38.17608 1017.18 0.000 38756.93 38906.58 37 | 39514.26 38.3303 1030.89 0.000 39439.14 39589.39 38 | 40284.39 38.95597 1034.10 0.000 40208.03 40360.74 39 | 40823.07 40.06522 1018.92 0.000 40744.54 40901.59 40 | 41524.54 41.63038 997.46 0.000 41442.95 41606.14 41 | 42165.15 43.5217 968.83 0.000 42079.85 42250.45 42 | 43038.68 45.58422 944.16 0.000 42949.33 43128.02 43 | 43619.92 47.68154 914.82 0.000 43526.47 43713.38 44 | 44266.88 49.6446 891.68 0.000 44169.58 44364.18 45 | 44947.72 51.38156 874.78 0.000 44847.02 45048.43 46 | 45460.12 52.79558 861.06 0.000 45356.64 45563.59 47 | 45926.56 53.87067 852.53 0.000 45820.97 46032.14 48 | 46685.34 54.74497 852.78 0.000 46578.05 46792.64 49 | 47117.91 55.60013 847.44 0.000 47008.94 47226.89 50 | 47428.02 56.82762 834.59 0.000 47316.64 47539.4 51 | 47573.36 58.89974 807.70 0.000 47457.92 47688.8 52 | 48017.91 62.57108 767.41 0.000 47895.27 48140.54 53 | 48204.87 68.5105 703.61 0.000 48070.59 48339.15 54 | 48515.1 77.54412 625.65 0.000 48363.12 48667.08 55 | 47943.4 89.67202 534.65 0.000 47767.64 48119.15 56 | 47020.14 105.3309 446.40 0.000 46813.7 47226.59 57 | 46512.09 124.6647 373.10 0.000 46267.75 46756.43 58 | 45325.22 147.7136 306.85 0.000 45035.71 45614.74 59 | 44656.17 174.3824 256.08 0.000 44314.38 44997.95 60 | 43943.29 204.8838 214.48 0.000 43541.73 44344.86 ------------------------------------------------------------------------------ . . * now we calculate the difference between the predicted values at 40 and 39 years old . . di _b[40.eta]-_b[39.eta] 701.47442 . . * we test whether this difference is equal to 0 . . test _b[40.eta]==_b[39.eta] ( 1) - 39.eta + 40.eta = 0 F( 1,515402) =13491.72 Prob > F = 0.0000 . . * the conclusions of the test are identical to those obtained with the wrong procedure ("ttest y_hat"). . * however, if we tested a more uncertain hypothesis (e.g., that the predicted value of y is 41400 at 40 years) we might get different con > clusions! . . test _b[40.eta]==41400 ( 1) 40.eta = 41400 F( 1,515402) = 8.95 Prob > F = 0.0028 . ttest y_hat==41400 if eta==40 One-sample t test ------------------------------------------------------------------------------ Variable | Obs Mean Std. err. Std. dev. [95% conf. interval] ---------+-------------------------------------------------------------------- y_hat | 13,685 41524.54 107.2645 12548.11 41314.29 41734.79 ------------------------------------------------------------------------------ mean = mean(y_hat) t = 1.1611 H0: mean = 41400 Degrees of freedom = 13684 Ha: mean < 41400 Ha: mean != 41400 Ha: mean > 41400 Pr(T < t) = 0.8772 Pr(|T| > |t|) = 0.2456 Pr(T > t) = 0.1228 . . . * 2. LOGARITHMIC SPECIFICATIONS . . ******************* . ******************* . * model with firm size as a linear function: . * _b gives us the increase of Y for a unit increase of X (increase of one employee) . . reg retrib03 n_dipendenti c.eta##c.eta##c.eta tempo_d occ_manuale uomo i.settore i.anno Source | SS df MS Number of obs = 515,414 -------------+---------------------------------- F(11, 515402) = 25374.36 Model | 1.0579e+14 11 9.6170e+12 Prob > F = 0.0000 Residual | 1.9534e+14 515,402 379005318 R-squared = 0.3513 -------------+---------------------------------- Adj R-squared = 0.3513 Total | 3.0113e+14 515,413 584244651 Root MSE = 19468 ----------------------------------------------------------------------------------- retrib03 | Coefficient Std. err. t P>|t| [95% conf. interval] ------------------+---------------------------------------------------------------- n_dipendenti | 3.655694 .0269519 135.64 0.000 3.602869 3.708519 eta | -1428.29 115.2209 -12.40 0.000 -1654.12 -1202.461 | c.eta#c.eta | 63.96913 3.101201 20.63 0.000 57.89087 70.04738 | c.eta#c.eta#c.eta | -.6341895 .0268471 -23.62 0.000 -.6868091 -.5815699 | tempo_d | -13470.54 94.88028 -141.97 0.000 -13656.5 -13284.57 occ_manuale | -21003.45 61.56471 -341.16 0.000 -21124.11 -20882.78 uomo | 12878.27 59.1266 217.81 0.000 12762.39 12994.16 | settore | Servizi | -5437.054 59.04858 -92.08 0.000 -5552.788 -5321.321 Altri settori | -5387.755 119.359 -45.14 0.000 -5621.695 -5153.815 | anno | 2000 | -329.3663 66.66388 -4.94 0.000 -460.0254 -198.7072 2001 | -500.4556 66.60292 -7.51 0.000 -630.9953 -369.916 | _cons | 44085.8 1379.985 31.95 0.000 41381.07 46790.53 ----------------------------------------------------------------------------------- . . * we draw the relationship beteween income and firm size (we ignore the constant and other variables) . ge lineare=_b[n_dipendenti]*n_dipendenti . sort n_dipendenti . twoway (line lineare n_dipendenti), ytitle("Predicted income") . . . . ******************* . ******************* . * Lin-log model: _b/100 = increase in Y for a 1% increase in X . . * we try to transform the variable number of employees into log . . gen ln_dipendenti=ln(n_dipendenti) . . reg retrib03 ln_dipendenti c.eta##c.eta##c.eta tempo_d occ_manuale uomo i.settore i.anno Source | SS df MS Number of obs = 515,414 -------------+---------------------------------- F(11, 515402) = 26340.79 Model | 1.0837e+14 11 9.8515e+12 Prob > F = 0.0000 Residual | 1.9276e+14 515,402 374001147 R-squared = 0.3599 -------------+---------------------------------- Adj R-squared = 0.3599 Total | 3.0113e+14 515,413 584244651 Root MSE = 19339 ----------------------------------------------------------------------------------- retrib03 | Coefficient Std. err. t P>|t| [95% conf. interval] ------------------+---------------------------------------------------------------- ln_dipendenti | 2222.358 13.90607 159.81 0.000 2195.102 2249.613 eta | -1035.216 114.5259 -9.04 0.000 -1259.683 -810.7486 | c.eta#c.eta | 51.60939 3.083037 16.74 0.000 45.56674 57.65205 | c.eta#c.eta#c.eta | -.5181018 .0266919 -19.41 0.000 -.5704171 -.4657864 | tempo_d | -13798.83 94.27199 -146.37 0.000 -13983.6 -13614.06 occ_manuale | -20765.25 61.1386 -339.64 0.000 -20885.08 -20645.42 uomo | 12720.38 58.73119 216.59 0.000 12605.27 12835.49 | settore | Servizi | -4110.436 58.63029 -70.11 0.000 -4225.349 -3995.522 Altri settori | -5018.586 118.6005 -42.32 0.000 -5251.039 -4786.133 | anno | 2000 | -321.7241 66.22227 -4.86 0.000 -451.5177 -191.9305 2001 | -307.3366 66.13546 -4.65 0.000 -436.9601 -177.7132 | _cons | 32529.89 1374.518 23.67 0.000 29835.88 35223.9 ----------------------------------------------------------------------------------- . . * we draw the relationship beteween income and ln firm size (we ignore the constant and other variables) . . ge lin_log=_b[ln_dipendenti]*ln_dipendenti . sort n_dipendenti . twoway (line lin_log n_dipendenti) (line lineare n_dipendenti), ytitle("Predicted income") . . . ******************* . ******************* . * Log-lin model: _b*100 = % increase in Y for a unit increase in X . . * let's try to transform the income variable into log . . gen ln_income=ln(retrib03) . . reg ln_income n_dipendenti c.eta##c.eta##c.eta tempo_d occ_manuale uomo i.settore i.anno Source | SS df MS Number of obs = 515,414 -------------+---------------------------------- F(11, 515402) = 23330.40 Model | 75109.5102 11 6828.13729 Prob > F = 0.0000 Residual | 150843.322 515,402 .292671201 R-squared = 0.3324 -------------+---------------------------------- Adj R-squared = 0.3324 Total | 225952.832 515,413 .438391799 Root MSE = .54099 ----------------------------------------------------------------------------------- ln_income | Coefficient Std. err. t P>|t| [95% conf. interval] ------------------+---------------------------------------------------------------- n_dipendenti | .0000837 7.49e-07 111.73 0.000 .0000822 .0000851 eta | .0552059 .0032018 17.24 0.000 .0489304 .0614814 | c.eta#c.eta | -.0005496 .0000862 -6.38 0.000 -.0007185 -.0003807 | c.eta#c.eta#c.eta | -1.53e-07 7.46e-07 -0.20 0.838 -1.61e-06 1.31e-06 | tempo_d | -.5617764 .0026366 -213.07 0.000 -.5669441 -.5566088 occ_manuale | -.4811108 .0017108 -281.22 0.000 -.4844639 -.4777577 uomo | .3322136 .001643 202.19 0.000 .3289933 .3354339 | settore | Servizi | -.2086282 .0016409 -127.14 0.000 -.2118443 -.2054121 Altri settori | -.2283771 .0033168 -68.85 0.000 -.2348779 -.2218762 | anno | 2000 | -.011653 .0018525 -6.29 0.000 -.0152839 -.0080222 2001 | -.0103882 .0018508 -5.61 0.000 -.0140157 -.0067607 | _cons | 9.342333 .0383479 243.62 0.000 9.267172 9.417494 ----------------------------------------------------------------------------------- . . * we draw the relationship (this time we insert the constant to visualize better) . . ge log_lin=exp(_b[_cons]+_b[n_dipendenti]*n_dipendenti) . . sort n_dipendenti . twoway (line log_lin n_dipendenti) (line lineare n_dipendenti), ytitle("Predicted income") . . . ******************* . ******************* . * Log-log model: _b = % increase in Y for a 1% increase in X . . . reg ln_income ln_dipendenti c.eta##c.eta##c.eta tempo_d occ_manuale uomo i.settore i.anno Source | SS df MS Number of obs = 515,414 -------------+---------------------------------- F(11, 515402) = 23814.22 Model | 76142.2162 11 6922.01966 Prob > F = 0.0000 Residual | 149810.616 515,402 .29066751 R-squared = 0.3370 -------------+---------------------------------- Adj R-squared = 0.3370 Total | 225952.832 515,413 .438391799 Root MSE = .53914 ----------------------------------------------------------------------------------- ln_income | Coefficient Std. err. t P>|t| [95% conf. interval] ------------------+---------------------------------------------------------------- ln_dipendenti | .0492245 .0003877 126.97 0.000 .0484647 .0499844 eta | .0636319 .0031928 19.93 0.000 .0573742 .0698896 | c.eta#c.eta | -.0008155 .0000859 -9.49 0.000 -.0009839 -.000647 | c.eta#c.eta#c.eta | 2.35e-06 7.44e-07 3.16 0.002 8.96e-07 3.81e-06 | tempo_d | -.56906 .0026281 -216.53 0.000 -.574211 -.5639089 occ_manuale | -.4769417 .0017044 -279.83 0.000 -.4802824 -.4736011 uomo | .3286403 .0016373 200.72 0.000 .3254313 .3318494 | settore | Servizi | -.1787718 .0016345 -109.37 0.000 -.1819754 -.1755683 Altri settori | -.2202609 .0033063 -66.62 0.000 -.2267413 -.2137806 | anno | 2000 | -.0114759 .0018461 -6.22 0.000 -.0150943 -.0078575 2001 | -.0059171 .0018437 -3.21 0.001 -.0095308 -.0023035 | _cons | 9.090765 .0383188 237.24 0.000 9.015661 9.165868 ----------------------------------------------------------------------------------- . . * we draw the relationship (again we insert the constant to visualize better) . . ge log_log=exp(_b[_cons]+_b[ln_dipendenti]*ln_dipendenti) . . sort n_dipendenti . twoway (line log_log n_dipendenti) (line lineare n_dipendenti), ytitle("Predicted income") . . . * 3. INTERACTIONS . . ******************* . ******************* . * interaction between two dummy variables . gen int_uomo_manuale=uomo * occ_manuale . . * the coefficient associated with int_uomo_manuale tells us what the additional effect of being in a manual occupation is for men compare > d to women . . reg ln_income occ_manuale uomo int_uomo_manuale n_dipendenti c.eta##c.eta##c.eta tempo_d i.settore i.anno Source | SS df MS Number of obs = 515,414 -------------+---------------------------------- F(12, 515401) = 21422.30 Model | 75194.192 12 6266.18266 Prob > F = 0.0000 Residual | 150758.64 515,401 .292507466 R-squared = 0.3328 -------------+---------------------------------- Adj R-squared = 0.3328 Total | 225952.832 515,413 .438391799 Root MSE = .54084 ----------------------------------------------------------------------------------- ln_income | Coefficient Std. err. t P>|t| [95% conf. interval] ------------------+---------------------------------------------------------------- occ_manuale | -.4453301 .0027106 -164.29 0.000 -.4506428 -.4400174 uomo | .3688581 .0027086 136.18 0.000 .3635494 .3741669 int_uomo_manuale | -.0577615 .0033948 -17.01 0.000 -.0644152 -.0511078 n_dipendenti | .0000829 7.50e-07 110.53 0.000 .0000814 .0000844 eta | .0551339 .0032009 17.22 0.000 .0488602 .0614076 | c.eta#c.eta | -.0005504 .0000862 -6.39 0.000 -.0007192 -.0003815 | c.eta#c.eta#c.eta | -1.48e-07 7.46e-07 -0.20 0.843 -1.61e-06 1.31e-06 | tempo_d | -.5614093 .0026359 -212.98 0.000 -.5665757 -.556243 | settore | Servizi | -.2056324 .0016498 -124.64 0.000 -.2088661 -.2023988 Altri settori | -.2287694 .003316 -68.99 0.000 -.2352686 -.2222702 | anno | 2000 | -.0114922 .001852 -6.21 0.000 -.0151221 -.0078623 2001 | -.0101612 .0018503 -5.49 0.000 -.0137878 -.0065346 | _cons | 9.324282 .0383519 243.12 0.000 9.249114 9.399451 ----------------------------------------------------------------------------------- . . * you can also achieve the same result with the following procedure, without having to create a new interaction variable . . reg ln_income i.occ_manuale##i.uomo n_dipendenti c.eta##c.eta##c.eta tempo_d i.settore i.anno Source | SS df MS Number of obs = 515,414 -------------+---------------------------------- F(12, 515401) = 21422.30 Model | 75194.192 12 6266.18266 Prob > F = 0.0000 Residual | 150758.64 515,401 .292507466 R-squared = 0.3328 -------------+---------------------------------- Adj R-squared = 0.3328 Total | 225952.832 515,413 .438391799 Root MSE = .54084 ----------------------------------------------------------------------------------- ln_income | Coefficient Std. err. t P>|t| [95% conf. interval] ------------------+---------------------------------------------------------------- 1.occ_manuale | -.4453301 .0027106 -164.29 0.000 -.4506428 -.4400174 1.uomo | .3688581 .0027086 136.18 0.000 .3635494 .3741669 | occ_manuale#uomo | 1 1 | -.0577615 .0033948 -17.01 0.000 -.0644152 -.0511078 | n_dipendenti | .0000829 7.50e-07 110.53 0.000 .0000814 .0000844 eta | .0551339 .0032009 17.22 0.000 .0488602 .0614076 | c.eta#c.eta | -.0005504 .0000862 -6.39 0.000 -.0007192 -.0003815 | c.eta#c.eta#c.eta | -1.48e-07 7.46e-07 -0.20 0.843 -1.61e-06 1.31e-06 | tempo_d | -.5614093 .0026359 -212.98 0.000 -.5665757 -.556243 | settore | Servizi | -.2056324 .0016498 -124.64 0.000 -.2088661 -.2023988 Altri settori | -.2287694 .003316 -68.99 0.000 -.2352686 -.2222702 | anno | 2000 | -.0114922 .001852 -6.21 0.000 -.0151221 -.0078623 2001 | -.0101612 .0018503 -5.49 0.000 -.0137878 -.0065346 | _cons | 9.324282 .0383519 243.12 0.000 9.249114 9.399451 ----------------------------------------------------------------------------------- . . ******************* . ******************* . * Interaction between dummy variable and continuous variable . gen int_uomo_dipendenti=uomo * ln_dipendenti . . * the coefficient associated with int_uomo_dipendenti tells us what the additional effect of a unit increase in ln_dipendenti is for men > compared with women . . reg ln_income ln_dipendenti uomo int_uomo_dipendenti c.eta##c.eta##c.eta tempo_d occ_manuale i.settore i.anno Source | SS df MS Number of obs = 515,414 -------------+---------------------------------- F(12, 515401) = 21831.06 Model | 76145.4533 12 6345.45444 Prob > F = 0.0000 Residual | 149807.379 515,401 .290661794 R-squared = 0.3370 -------------+---------------------------------- Adj R-squared = 0.3370 Total | 225952.832 515,413 .438391799 Root MSE = .53913 ------------------------------------------------------------------------------------- ln_income | Coefficient Std. err. t P>|t| [95% conf. interval] --------------------+---------------------------------------------------------------- ln_dipendenti | .0475542 .0006331 75.11 0.000 .0463133 .048795 uomo | .3181581 .0035421 89.82 0.000 .3112157 .3251006 int_uomo_dipendenti | .0026467 .0007931 3.34 0.001 .0010923 .0042012 eta | .0636794 .0031928 19.94 0.000 .0574217 .0699371 | c.eta#c.eta | -.0008168 .0000859 -9.50 0.000 -.0009853 -.0006484 | c.eta#c.eta#c.eta | 2.37e-06 7.44e-07 3.18 0.001 9.07e-07 3.82e-06 | tempo_d | -.568994 .0026282 -216.50 0.000 -.5741451 -.5638429 occ_manuale | -.4763324 .0017142 -277.88 0.000 -.4796921 -.4729727 | settore | Servizi | -.1785183 .0016362 -109.10 0.000 -.1817253 -.1753113 Altri settori | -.2204488 .0033068 -66.67 0.000 -.22693 -.2139676 | anno | 2000 | -.0114542 .0018461 -6.20 0.000 -.0150726 -.0078359 2001 | -.0058732 .0018438 -3.19 0.001 -.0094869 -.0022595 | _cons | 9.096424 .038356 237.16 0.000 9.021247 9.1716 ------------------------------------------------------------------------------------- . . * again, the same result can be obtained with the following procedure . . reg ln_income c.ln_dipendenti##i.uomo c.eta##c.eta##c.eta tempo_d occ_manuale i.settore i.anno Source | SS df MS Number of obs = 515,414 -------------+---------------------------------- F(12, 515401) = 21831.06 Model | 76145.4533 12 6345.45444 Prob > F = 0.0000 Residual | 149807.379 515,401 .290661794 R-squared = 0.3370 -------------+---------------------------------- Adj R-squared = 0.3370 Total | 225952.832 515,413 .438391799 Root MSE = .53913 -------------------------------------------------------------------------------------- ln_income | Coefficient Std. err. t P>|t| [95% conf. interval] ---------------------+---------------------------------------------------------------- ln_dipendenti | .0475542 .0006331 75.11 0.000 .0463133 .048795 1.uomo | .3181581 .0035421 89.82 0.000 .3112157 .3251006 | uomo#c.ln_dipendenti | 1 | .0026467 .0007931 3.34 0.001 .0010923 .0042012 | eta | .0636794 .0031928 19.94 0.000 .0574217 .0699371 | c.eta#c.eta | -.0008168 .0000859 -9.50 0.000 -.0009853 -.0006484 | c.eta#c.eta#c.eta | 2.37e-06 7.44e-07 3.18 0.001 9.07e-07 3.82e-06 | tempo_d | -.568994 .0026282 -216.50 0.000 -.5741451 -.5638429 occ_manuale | -.4763324 .0017142 -277.88 0.000 -.4796921 -.4729727 | settore | Servizi | -.1785183 .0016362 -109.10 0.000 -.1817253 -.1753113 Altri settori | -.2204488 .0033068 -66.67 0.000 -.22693 -.2139676 | anno | 2000 | -.0114542 .0018461 -6.20 0.000 -.0150726 -.0078359 2001 | -.0058732 .0018438 -3.19 0.001 -.0094869 -.0022595 | _cons | 9.096424 .038356 237.16 0.000 9.021247 9.1716 -------------------------------------------------------------------------------------- . . . * 4. FIXED-EFFECTS REGRESSION . . * This database has a panel structure (the same workers are observed several times over time) . . xtset id_soggetto anno Panel variable: id_soggetto (unbalanced) Time variable: anno, 1999 to 2001, but with gaps Delta: 1 unit . . . * In the "fixed effects" regression we basically enter a dummy variable for each individual (we do not show the coefficients, that would > be too many!) . * this regression is used to condition income on unobservable characteristics of the individual, as long as they are constant characteris > tics over time . * adding fixed effects could make more credible our identifying assumptions . /* > given a model > Y = X b + E > we need Cov(X, E)=0 to casually interpret b > Restricting E by adding more information in X is a way > of making this assumption more credible > */ . . * there are two equivalent ways of estimating fixed-effects regression: . . * areg: includes dummies for each individual (without showing the coefficients associated with these dummies) . . * xtreg: transforms the regression model without fixed effects into a model in which Y and X are expressed . * as deviation from the individual mean across all periods. . * This command exploits an equivalence between the models described above that can be derived analytically . . areg ln_income ln_dipendenti c.eta##c.eta##c.eta tempo_d occ_manuale uomo i.settore i.anno, absorb(id_soggetto) note: uomo omitted because of collinearity. note: 2001.anno omitted because of collinearity. Linear regression, absorbing indicators Number of obs = 515,414 Absorbed variable: id_soggetto No. of categories = 215,444 F(9, 299961) = 1061.12 Prob > F = 0.0000 R-squared = 0.8697 Adj R-squared = 0.7761 Root MSE = 0.3133 ----------------------------------------------------------------------------------- ln_income | Coefficient Std. err. t P>|t| [95% conf. interval] ------------------+---------------------------------------------------------------- ln_dipendenti | .0295582 .0009916 29.81 0.000 .0276147 .0315018 eta | .2341365 .0093305 25.09 0.000 .215849 .2524241 | c.eta#c.eta | -.0046856 .0002542 -18.43 0.000 -.0051839 -.0041873 | c.eta#c.eta#c.eta | .0000297 2.21e-06 13.46 0.000 .0000254 .000034 | tempo_d | -.1812044 .0031272 -57.94 0.000 -.1873337 -.1750752 occ_manuale | -.1255315 .0059201 -21.20 0.000 -.1371348 -.1139282 uomo | 0 (omitted) | settore | Servizi | -.1077109 .0044205 -24.37 0.000 -.116375 -.0990467 Altri settori | -.0632246 .0084458 -7.49 0.000 -.0797781 -.0466711 | anno | 2000 | -.0160424 .0009685 -16.57 0.000 -.0179405 -.0141443 2001 | 0 (omitted) | _cons | 6.750483 .1101107 61.31 0.000 6.534669 6.966297 ----------------------------------------------------------------------------------- F test of absorbed indicators: F(215443, 299961) = 6.245 Prob > F = 0.000 . . xtreg ln_income ln_dipendenti c.eta##c.eta##c.eta tempo_d occ_manuale uomo i.settore i.anno, fe note: uomo omitted because of collinearity. note: 2001.anno omitted because of collinearity. Fixed-effects (within) regression Number of obs = 515,414 Group variable: id_soggetto Number of groups = 215,444 R-squared: Obs per group: Within = 0.0309 min = 1 Between = 0.2254 avg = 2.4 Overall = 0.2031 max = 3 F(9,299961) = 1061.12 corr(u_i, Xb) = 0.1398 Prob > F = 0.0000 ----------------------------------------------------------------------------------- ln_income | Coefficient Std. err. t P>|t| [95% conf. interval] ------------------+---------------------------------------------------------------- ln_dipendenti | .0295582 .0009916 29.81 0.000 .0276147 .0315018 eta | .2341365 .0093305 25.09 0.000 .215849 .2524241 | c.eta#c.eta | -.0046856 .0002542 -18.43 0.000 -.0051839 -.0041873 | c.eta#c.eta#c.eta | .0000297 2.21e-06 13.46 0.000 .0000254 .000034 | tempo_d | -.1812044 .0031272 -57.94 0.000 -.1873337 -.1750752 occ_manuale | -.1255315 .0059201 -21.20 0.000 -.1371348 -.1139282 uomo | 0 (omitted) | settore | Servizi | -.1077109 .0044205 -24.37 0.000 -.116375 -.0990467 Altri settori | -.0632246 .0084458 -7.49 0.000 -.0797781 -.0466711 | anno | 2000 | -.0160424 .0009685 -16.57 0.000 -.0179405 -.0141443 2001 | 0 (omitted) | _cons | 6.750483 .1101107 61.31 0.000 6.534669 6.966297 ------------------+---------------------------------------------------------------- sigma_u | .62375506 sigma_e | .31332145 rho | .7985178 (fraction of variance due to u_i) ----------------------------------------------------------------------------------- F test that all u_i=0: F(215443, 299961) = 6.24 Prob > F = 0.0000 . . * variables that do not change for an individual over time (e.g., the variable "man") cannot be included in the regression because they a > re collinear with individual fixed effects . . * the 2001 variable is collinear because it is linearly dependent on age and individual fixed effects: . * the reason is a bit more subtle: . /* > including individual fixed effects is like estimating a model of this type > > EY-Y = B_0 + B_1 (EX-X) + error > > where EY and EX are the average of each individual's Y and X across all years. > In the case of X = age, assuming everyone is in the sample for 3 years we will have that for all individuals > > Eage - age = -1 in the first year > Eage - age = 0 in the second year > Eage - age = 1 in the third year > > In the second year i.2000 captures the conditional difference in Y between 1999 and 2000 (when age changes by 1 unit) > In the third year i.2001 captures the conditional difference in Y between 1999 and 2001 (when age changes by 2 units) > The conditional difference between 2001 and 2000 is always mechanically equal to i.2001-i.2000 > > As can be seen, there are not enough comparisons to estimate both b_eta and b_i.2000 and b_i.2001 > > Omitting b_i.2000 we will have that > b_eta: conditional difference between 1999 and 2000 > b_i.2001: conditional difference between 2001 and 1999-2000 > > */ . . . * 4. AKM REGRESSION MODEL AND ITS VARIANCE DECOMPOSITION . . * this is a time-consuming estimation method. For this reason, we reduce the sample size by keeping a random 10% of workers . gen step=runiform() . bys id_soggetto: keep if step[1]<.1 (463,764 observations deleted) . drop step . . /* > Card et al (2014) analyse the evolution of German wage inequality using an AKM regression model. > > This is a fixed effects regression with individual and firm fixed effects, originally invented by Abowd Kramarz and Margolis (1999) - hen > ce the name AKM > > A firm fixed effect is a potentially causal estimate of the wage returns for working at a given firm, conditioned on workers' quality com > position of the firm > > */ . . * reghdfe is a user-written command to estimate high-dimensional fixed effects models . * it has to be installed the first time you use it with the following command (you need an internet connection!) . . *ssc install reghdfe, replace . . * let's estimate the AKM regression model, saving the estimated firm and individual fixed effects into two new variables called fe_ind an > d fe_firm . reghdfe ln_income ln_dipendenti c.eta##c.eta##c.eta tempo_d occ_manuale uomo i.settore i.anno, absorb(fe_ind=id_soggetto fe_firm=id_azien > da) resid (dropped 8046 singleton observations) (MWFE estimator converged in 310 iterations) note: uomo is probably collinear with the fixed effects (all partialled-out values are close to zero; tol = 1.0e-09) note: 2bn.settore is probably collinear with the fixed effects (all partialled-out values are close to zero; tol = 1.0e-09) note: 3bn.settore is probably collinear with the fixed effects (all partialled-out values are close to zero; tol = 1.0e-09) note: 2001.anno omitted because of collinearity HDFE Linear regression Number of obs = 43,604 Absorbing 2 HDFE groups F( 7, 26250) = 90.01 Prob > F = 0.0000 R-squared = 0.8587 Adj R-squared = 0.7653 Within R-sq. = 0.0234 Root MSE = 0.2792 ----------------------------------------------------------------------------------- ln_income | Coefficient Std. err. t P>|t| [95% conf. interval] ------------------+---------------------------------------------------------------- ln_dipendenti | .1090076 .0099405 10.97 0.000 .0895236 .1284916 eta | .1461114 .0295136 4.95 0.000 .0882631 .2039598 | c.eta#c.eta | -.0026077 .000792 -3.29 0.001 -.0041602 -.0010552 | c.eta#c.eta#c.eta | .000013 6.79e-06 1.91 0.056 -3.22e-07 .0000263 | tempo_d | -.1679257 .0121522 -13.82 0.000 -.1917447 -.1441067 occ_manuale | -.0457004 .0289056 -1.58 0.114 -.1023569 .0109562 uomo | 0 (omitted) | settore | Servizi | 0 (omitted) Altri settori | 0 (omitted) | anno | 2000 | -.0174663 .0028666 -6.09 0.000 -.0230849 -.0118477 2001 | 0 (omitted) | _cons | 7.658825 .355674 21.53 0.000 6.961685 8.355965 ----------------------------------------------------------------------------------- Absorbed degrees of freedom: -----------------------------------------------------+ Absorbed FE | Categories - Redundant = Num. Coefs | -------------+---------------------------------------| id_soggetto | 16061 0 16061 | id_azienda | 9545 8259 1286 | -----------------------------------------------------+ . . * reghdfe does not include singletons in the estimation sample (observations uniquely identified by a linear combination of the independe > nt variables) . * the reason is that they inflate the number of observations, but do not contribute to the estimation of the parameters of interest in th > e regression, does including them could lead to an under-estimation of the standard errors . . * we keep in the analysis only observations that were included in the estimation sample by the command reghdfe . keep if e(sample) (8,046 observations deleted) . . * reghdfe has created new variables equal to the estimated firm and worker fixed effects and the estimated residual . de Contains data from panel_rl.dta Observations: 43,604 Variables: 25 13 Oct 2022 11:56 ------------------------------------------------------------------------------------------------------------------------------------------- Variable Storage Display Value name type format label Variable label ------------------------------------------------------------------------------------------------------------------------------------------- id_soggetto float %9.0g Codice identificativo lavoratore id_azienda float %9.0g Codice identificativo azienda anno int %9.0g retrib03 float %9.0g retribuzione annuale riportata ad euro del 2003 uomo byte %8.0g tempo_d byte %9.0g contratto a tempo determinato occ_manuale byte %8.0g occupazione manuale n_dipendenti float %9.0g Numero dipendenti azienda settore float %13.0g sect Settore di attività azienda anno_nascita float %9.0g eta float %9.0g eta2 float %9.0g eta3 float %9.0g y_hat float %9.0g Fitted values lineare float %9.0g ln_dipendenti float %9.0g lin_log float %9.0g ln_income float %9.0g log_lin float %9.0g log_log float %9.0g int_uomo_manu~e float %9.0g int_uomo_dipe~i float %9.0g _reghdfe_resid double %10.0g Residuals fe_ind double %10.0g [FE] 1.id_soggetto fe_firm double %10.0g [FE] 1.id_azienda ------------------------------------------------------------------------------------------------------------------------------------------- Sorted by: id_soggetto anno Note: Dataset has changed since last saved. . rename _reg residuals . . * we predict the wage using the estimated coefficients associated to time-varying controls . predict xb (option xb assumed; fitted values) . . * double check that the sum of all wage components is equal to actual wages . gen y=xb+fe_ind+fe_firm+residuals . . su y ln_income Variable | Obs Mean Std. dev. Min Max -------------+--------------------------------------------------------- y | 43,604 10.43929 .5764528 7.164721 12.79185 ln_income | 43,604 10.43929 .5764532 7.164721 12.79185 . . * we can estimate the AKM variance decomposition . /* > given the following regression model > > Y = XB + FE_ind + FE_firm + RES > > assuming E(RES|XB,FE_ind,FE_firm) = E(RES) > > we have that > > Var(Y) = Var(XB) + Var(FE_ind) + Var(FE_firm) + 2*Cov(XB,FE_ind) + 2*Cov(XB,FE_firm) + 2*Cov(FE_ind,FE_firm) + Var(RES) > */ . . corr xb fe_firm, cov (obs=43,604) | xb fe_firm -------------+------------------ xb | .069684 fe_firm | -.000278 .170261 . . gen var_xb=`r(Var_1)' . gen var_firm=`r(Var_2)' . gen cov_x_firm=2*`r(cov_12)' . . corr fe_ind fe_firm, cov (obs=43,604) | fe_ind fe_firm -------------+------------------ fe_ind | .365772 fe_firm | -.143951 .170261 . . gen var_ind=`r(Var_1)' . gen cov_ind_firm=2*`r(cov_12)' . . corr fe_ind xb, cov (obs=43,604) | fe_ind xb -------------+------------------ fe_ind | .365772 xb | -.015953 .069684 . gen cov_x_ind=2*`r(cov_12)' . . corr resid , cov (obs=43,604) | residu~s -------------+--------- residuals | .046945 . . gen var_res=`r(Var_1)' . . gen total_var=var_xb + var_firm + var_ind + cov_x_firm +cov_x_ind + cov_ind_firm + var_res . . * AKM DECOMPOSITION OF THE TOTAL INCOME VARIANCE . su total_var var_xb var_firm var_ind cov_x_firm cov_x_ind cov_ind_firm var_res Variable | Obs Mean Std. dev. Min Max -------------+--------------------------------------------------------- total_var | 43,604 .3322979 0 .3322979 .3322979 var_xb | 43,604 .0696838 0 .0696838 .0696838 var_firm | 43,604 .1702614 0 .1702614 .1702614 var_ind | 43,604 .3657721 0 .3657721 .3657721 cov_x_firm | 43,604 -.0005563 0 -.0005563 -.0005563 -------------+--------------------------------------------------------- cov_x_ind | 43,604 -.0319058 0 -.0319058 -.0319058 cov_ind_firm | 43,604 -.2879024 0 -.2879024 -.2879024 var_res | 43,604 .0469451 0 .0469451 .0469451 . . * is the total variance actually equal to the variance of log income? (double check) . corr ln_income, cov (obs=43,604) | ln_inc~e -------------+--------- ln_income | .332298 . . . log close name: log: /Users/bernardofanfani/Desktop/teaching/research_topics_labor/lab_3/third_lab/lecture3.log log type: text closed on: 25 Oct 2024, 19:30:54 -------------------------------------------------------------------------------------------------------------------------------------------