---------------------------------------------------------------------------------------------- log: D:\WERS 2004 Information & Advice Service\Guide to Analysis\Complex samples estim > ation in Stata - sample output.log log type: text opened on: 26 Jul 2006, 12:22:32 . . * COMPLEX SAMPLES ESTIMATION IN STATA 9 . * Sample output . . * Produced by WERS 2004 Information and Advice Service (www.wers2004.info) . * 24/7/06 . . version 9.2 . . * Read in Cross-Section MQ data and generate additional sample design variables (using syn > tax from FAQ 5.5) . . use "d:\WERS 2004 Information & Advice Service\Deposited Stata data\xs04_mq.dta", clear . do "D:\WERS 2004 Information & Advice Service\Guide to Analysis\WERS 2004 Cross-Section - > wpstr04.do" . * First read in WERS 2004 Cross-Section MQ data (XS04_MQ.DTA) . . . ********************** . . * Create WPSTR04 . ********************** . . . * Identifies the stratum of the sampling matrix from which the workplace was sampled. . . * Values can be mapped to Table 2.1 in the WERS 2004 Technical Report (numbers ascend acro > ss the size categories ow . * before moving to the next industry sector, as follows: . . * wpstr04==1 ---> IDBR employment=5-9 emps & IDBR industry=SIC(2003) Section D . * wpstr04==2 ---> IDBR employment=10-24 emps & IDBR industry=SIC(2003) Section D . . * Strata in which there is only one observation are collapsed with an adjacent stratum, as > Stata does . * not allow one PSU per strata. This is done in such a way that the strata variable can be . * used with the full sample for each of the WERS Cross-Section datasets (MQ, ERQ, SEQ, FPQ > ) . . . gen wpstr04=cell_no if cell_no<=17 (1945 missing values generated) . replace wpstr04=(cell_no+1) if cell_no>=18 (1945 real changes made) . recode wpstr04 (11/12=13) (17=16) (19/21=22) (25/26=27) (36=35) (37=38) (40=41) (44=43) (5 > 9=60) (64=65) (82=83) (107=108) (wpstr04: 154 changes made) . . . end of do-file . . * Specify svyset command . . svyset [pweight=estwtnr], strata(wpstr04) vce(linearized) pweight: estwtnr VCE: linearized Strata 1: wpstr04 SU 1: FPC 1: . . * Run a svy-based logistic regression of eanyemp, using workplace size and % female as cov > ariates . . recode eanyemp 2=0 (eanyemp: 969 changes made) . egen nempsiz1=cut(zallemps), at(5,10,25,50,100,200,500,99999) icodes . tabulate nempsiz1, gen(size) nempsiz1 | Freq. Percent Cum. ------------+----------------------------------- 0 | 233 10.15 10.15 1 | 414 18.04 28.19 2 | 334 14.55 42.75 3 | 308 13.42 56.17 4 | 287 12.51 68.67 5 | 303 13.20 81.87 6 | 416 18.13 100.00 ------------+----------------------------------- Total | 2,295 100.00 . gen nfemprop=((zfemfull+zfemprt)/zallemps) if zfemfull>=0 & zfemprt>=0 (10 missing values generated) . . svy: logit eanyemp size2-size7 nfemprop (running logit on estimation sample) Survey: Logistic regression Number of strata = 89 Number of obs = 2285 Number of PSUs = 2285 Population size = 99.975637 Design df = 2196 F( 7, 2190) = 44.00 Prob > F = 0.0000 ------------------------------------------------------------------------------ | Linearized eanyemp | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- size2 | .2931972 .2055803 1.43 0.154 -.1099549 .6963494 size3 | 1.266746 .2112842 6.00 0.000 .8524082 1.681084 size4 | 1.321859 .2157311 6.13 0.000 .8988003 1.744917 size5 | 2.319219 .2470596 9.39 0.000 1.834724 2.803714 size6 | 2.851428 .2588701 11.01 0.000 2.343772 3.359084 size7 | 2.862374 .2652304 10.79 0.000 2.342245 3.382502 nfemprop | 1.078711 .2704255 3.99 0.000 .5483949 1.609028 _cons | -2.005686 .2641975 -7.59 0.000 -2.523789 -1.487583 ------------------------------------------------------------------------------ . . *** Sensitivity analysis *** . . * Show effect of not specifying strata . * Impact on standard errors seen at 2nd decimal place when compared with full svyset comma > nd . . svyset [pweight=estwtnr], vce(linearized) pweight: estwtnr VCE: linearized Strata 1: SU 1: FPC 1: . svy: logit eanyemp size2-size7 nfemprop (running logit on estimation sample) Survey: Logistic regression Number of strata = 1 Number of obs = 2285 Number of PSUs = 2285 Population size = 99.975637 Design df = 2284 F( 7, 2278) = 38.21 Prob > F = 0.0000 ------------------------------------------------------------------------------ | Linearized eanyemp | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- size2 | .2931972 .2100931 1.40 0.163 -.118796 .7051905 size3 | 1.266746 .2191324 5.78 0.000 .8370265 1.696465 size4 | 1.321859 .2247774 5.88 0.000 .8810693 1.762648 size5 | 2.319219 .2530374 9.17 0.000 1.823011 2.815426 size6 | 2.851428 .2655604 10.74 0.000 2.330663 3.372193 size7 | 2.862374 .2723734 10.51 0.000 2.328249 3.396499 nfemprop | 1.078711 .2745854 3.93 0.000 .5402484 1.617174 _cons | -2.005686 .2664506 -7.53 0.000 -2.528196 -1.483175 ------------------------------------------------------------------------------ . . * Show additional effect of specifying iweights rather than using svy commands . * Standard errors hugely inflated when compared with full svyset command . * Primarily due to scaling of estwtnr so that sum of weights=100 . . logit eanyemp size2-size7 nfemprop [iweight=estwtnr] Iteration 0: log likelihood = -60.614249 Iteration 1: log likelihood = -55.349299 Iteration 2: log likelihood = -55.270235 Iteration 3: log likelihood = -55.27014 Logistic regression Number of obs = 100 LR chi2(7) = 10.69 Prob > chi2 = 0.1528 Log likelihood = -55.27014 Pseudo R2 = 0.0882 ------------------------------------------------------------------------------ eanyemp | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- size2 | .2931972 .5591048 0.52 0.600 -.8026281 1.389023 size3 | 1.266746 .6858989 1.85 0.065 -.0775912 2.611083 size4 | 1.321859 .9020533 1.47 0.143 -.4461335 3.089851 size5 | 2.319219 1.33863 1.73 0.083 -.304448 4.942885 size6 | 2.851428 1.99039 1.43 0.152 -1.049665 6.752521 size7 | 2.862374 3.424856 0.84 0.403 -3.85022 9.574968 nfemprop | 1.078711 .7514037 1.44 0.151 -.3940128 2.551435 _cons | -2.005686 .6062874 -3.31 0.001 -3.193987 -.8173843 ------------------------------------------------------------------------------ . . * Show alternative where weight scaled to sum to 2,295 (total number of observations) . * Mixture of Type I and Type II errors when compared with full svyset command . . gen newwt=(estwtnr*2295/100) . logit eanyemp size2-size7 nfemprop [iweight=newwt] Iteration 0: log likelihood = -1391.097 Iteration 1: log likelihood = -1270.2664 Iteration 2: log likelihood = -1268.4519 Iteration 3: log likelihood = -1268.4497 Logistic regression Number of obs = 2294 LR chi2(7) = 245.29 Prob > chi2 = 0.0000 Log likelihood = -1268.4497 Pseudo R2 = 0.0882 ------------------------------------------------------------------------------ eanyemp | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- size2 | .2931973 .1167083 2.51 0.012 .0644531 .5219414 size3 | 1.266746 .1431755 8.85 0.000 .9861271 1.547365 size4 | 1.321859 .1882959 7.02 0.000 .9528054 1.690912 size5 | 2.319219 .2794275 8.30 0.000 1.771551 2.866887 size6 | 2.851428 .4154769 6.86 0.000 2.037108 3.665748 size7 | 2.862374 .7149093 4.00 0.000 1.461177 4.26357 nfemprop | 1.078711 .1568491 6.88 0.000 .7712928 1.38613 _cons | -2.005686 .1265573 -15.85 0.000 -2.253734 -1.757638 ------------------------------------------------------------------------------ . . * Show unweighted model . * Coefficients now biased . . logit eanyemp size2-size7 nfemprop Iteration 0: log likelihood = -1557.0844 Iteration 1: log likelihood = -1272.8913 Iteration 2: log likelihood = -1265.9982 Iteration 3: log likelihood = -1265.9261 Iteration 4: log likelihood = -1265.9261 Logistic regression Number of obs = 2285 LR chi2(7) = 582.32 Prob > chi2 = 0.0000 Log likelihood = -1265.9261 Pseudo R2 = 0.1870 ------------------------------------------------------------------------------ eanyemp | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- size2 | .5384288 .1945385 2.77 0.006 .1571403 .9197173 size3 | 1.383319 .196885 7.03 0.000 .9974313 1.769206 size4 | 1.524964 .2003352 7.61 0.000 1.132315 1.917614 size5 | 2.444518 .2126712 11.49 0.000 2.02769 2.861345 size6 | 3.198244 .2302925 13.89 0.000 2.746879 3.649609 size7 | 3.332105 .2207426 15.09 0.000 2.899457 3.764752 nfemprop | 1.014011 .1680824 6.03 0.000 .6845752 1.343446 _cons | -1.942909 .1946019 -9.98 0.000 -2.324322 -1.561496 ------------------------------------------------------------------------------ . . log close log: D:\WERS 2004 Information & Advice Service\Guide to Analysis\Complex samples est > imation in Stata - sample output.log log type: text closed on: 26 Jul 2006, 12:22:36 --------------------------------------------------------------------------------------------