By William Greene, Department of Economics, Stern School of Business, New York University, USA, wgreene@stern.nyu.edu
This study presents several extensions of the most familiar models for count data, the Poisson and negative binomial models. We develop an encompassing model for two well-known variants of the negative binomial model (the NB1 and NB2 forms). We then analyze some alternative approaches to the standard log gamma model for introducing heterogeneity into the loglinear conditional means for these models. The lognormal model provides a versatile alternative specification that is more flexible (and more natural) than the log gamma form, and provides a platform for several "two part" extensions, including zero inflation, hurdle, and sample selection models. (We briefly present some alternative approaches to modeling heterogeneity.) We also resolve some features in Hausman, Hall and Griliches (1984, Economic models for count data with an application to the patents–R&D relationship, Econometrica52, 909–938) widely used panel data treatments for the Poisson and negative binomial models that appear to conflict with more familiar models of fixed and random effects. Finally, we consider a bivariate Poisson model that is also based on the lognormal heterogeneity model. Two recent applications have used this model. We suggest that the correlation estimated in their model frameworks is an ambiguous measure of the correlation of the variables of interest, and may substantially overstate it. We conclude with a detailed application of the proposed methods using the data employed in one of the two aforementioned bivariate Poisson studies.
Functional Form and Heterogeneity in Models for Count Data surveys practical extensions of the Poisson and negative binomial (NB) models that practitioners can employ to refine the specifications or broaden their reach into new situations. The author resolves some inconsistencies of the panel data models with other more familiar results for the linear regression model.
Functional Form and Heterogeneity in Models for Count Data is focused on two large issues: the accommodation of overdispersion and heterogeneity in the basic count framework and the functional form of the conditional mean and the extension of models of heterogeneity to models for panel data and sources of correlation across outcomes. The first is more straightforward since, in principle, these are elements of the conditional variance of the distribution of counts that can be analyzed apart from the conditional mean. Robust inference methods for basic models can be relied upon to preserve the validity of estimation and inference procedures. The second feature motivates the development of more intricate models such as the two part, panel and bivariate models presented in the text.