Sample Size and Statistical Significance of Hazard Regression Parameters
In this paper, we explore the relation between sample sizes of female respondents aged 18 to 44 and the statistical significance of parameter estimates in four piecewise constant proportional hazard regression models by means of microsimulation. The underlying models for first marriage, first birth, second birth, and first divorce are estimated from Hungarian GGS data and interpreted and used as typical event-history models for the analysis of GGS data in general. The models are estimated from the full biographies as well as from three- and six-year inter-panel biographies of the simulated samples. The simulation results indicate that there is great sensibility of the parameters that reach statistical significance to the sample size precisely in the sample range of the GGS. This means that any reduction or increase in the sample size will notably affect the statistical analysis of the data. Marginal gains in terms of the number of significant parameters are especially high up to 3.000 respondents when applying rather modest thresholds of significance. For higher thresholds, marginal gains remain steep for sample sizes up to 5.000 respondents. When analysing inter-panel histories, especially for a single three-year interval, the likelihood that parameter estimates are significant is very moderate. For 6-year inter-panel histories, we get better results, at least for a sample size of at least 3.000. When reducing the sample size to below 3.000, the number of significant results for inter-panel histories deteriorates rapidly.