Bahadur-type representation
, 217
Balanced repeated replication (BRR)
, 96
Bandwidths
, 194–196
selection
, 186–188
Bank of Canada
, 36, 47, 98, 104
Bernoulli distributions
, 99
Bernoulli log likelihood
, 128
Bernoulli sampling. See Variable probability sampling (VP sampling)
“Biased sampling” problems
, 147
Black market liquor in Colombia
, 301–302
adulterated sales
, 302
average marginal effects for contraband and adulterated liquor–levels
, 307
average marginal effects for MICE
, 309
coefficients for specifications in salary imputation model
, 308
contraband goods
, 288–289
data
, 290–292
multiple imputation
, 297–301, 310–313
percentage of adulterated purchases
, 304
percentage of contraband purchases
, 303
percentage of illegal purchases
, 289
residual analysis
, 313–314
results
, 292–296
Bootstrap
, 100
conventional
, 211
sample
, 211
variance estimation via
, 96–98
wild bootstrap
, 61–84, 211, 219
Business
heterogeneity
, 240
surveys
, 239–240
Calibration
equations
, 92
procedure
, 88
Canadian Internet Usage Survey (CIUS)
, 98
Card acceptance
, 36, 39, 49
estimates
, 42–43
and usage
, 41
Card payments, regression models of
, 43–46
Card-acceptance model
, 45
Cash on hand variable
, 89, 100–101
Checkable deposits (CHKD)
, 241
Choice-based sampling
, 162n6
Classical nearest neighbor imputation
, 213
Classical raking estimator
, 90–92
Cluster-means regression (CMR)
, 74
Cluster-robust inference
, 63
WCB
, 65–67
Cluster-robust standard error
, 67
Cluster-robust test statistics
, 62
Cluster-robust variance estimator (CRVE)
, 64
Colombian National Administrative Department of Statistics (DANE)
, 290–291
Combined inference approach
, 176–177
Commercial banks (CMB)
, 241
Complementary log–log (cloglog)
, 294
Complex
sampling plan
, 174
survey methods
, 122, 176
Conditional mean
, 127
functions
, 130
Conditional moment restriction models
, 138
conditional moment equalities
, 139–140
endogenous and exogenous stratification
, 143–146
identification
, 142–143
simulation study
, 156–161
VP sampling
, 140–142
Contactless (tap-and-go) credit card usage binary variable
, 89
Conventional bootstrap
, 211
Conventional jackknife variance estimation
, 221
Conventional replication method
, 211
Correlated random effects approach
, 128
County
, 39
data
, 56
effect
, 43
Covariate information
, 215
Criminal Procedure Law
, 263
Cross-validation function
, 187
Cumulative standard normal (CDF)
, 73
Current Population Survey (CPS)
, 6, 110
socioeconomic variables in
, 11–14
Data-driven approach
, 177, 187
Data-generating processes (DGPs)
, 177, 190
in Monte Carlo simulations
, 188
Degree of “desirability”
, 8–9
Demographic variables
, 11–12
Dependent variable
, 51, 55
Depository and financial institutions
, 240
Descriptive inference. See Design-based inference
Design-based approach
, 180
Design-based estimators
, 174, 176
Design-based inference
, 88, 176
Design-consistent estimator
, 176
Difference in objective functions
, 119
Difference-in-differences (DiD)
, 62
Discounts Asked indicator
, 292, 297
Discounts Offered indicator
, 292
Drugs and weapon crimes
, 264
Dynamic regression models
, 110–111
Economic theory
, 174, 260
Efficiency bounds
, 148–152
E–M algorithm-based imputation method
, 244
Empirical application of 2013 MOP
, 98–103
Empirical distribution
, 67
Empirical likelihood approach
, 148, 155
Empirical loglikelihood (EL)
, 169
Empirical researchers
, 210
Endogenous stratification
, 143–146, 150–151, 157, 262
median bandwidths for WLC and LC under
, 194
median MSE for WLC, LC, WLS and OLS under
, 190
Estrato (strata) variable
, 292, 299
Exogenous stratification
, 111, 126–127, 143–146, 150–151, 157
comparing asymptotic variance of LS and GMM estimators under
, 166–169
median bandwidths for WLC and LC under
, 194
median MSE for WLC, LC, WLS and OLS under
, 191
Explanatory variables
, 41
Hausman specification tests
, 273
Health and Retirement Study (HRS)
, 5–7
socioeconomic variables in
, 11–14
survey outcomes in UAS and
, 14–26
Heckman two-step estimator
, 312
Heteroskedasticity-robust standard errors
, 63–64, 74
Higher order orthogonal polynomials
, 45
Horvitz–Thompson estimator
, 218
Horvitz–Thompson weight (H–T weight)
, 90
Hungarian payments system
, 38
Illegal
alcohol
, 290
cigarettes
, 290
goods
, 288
liquor
, 288
Imputation models
, 238, 239, 297, 299, 310–311
Indicator
function
, 179
variables
, 90, 94
Industry
, 39
effect
, 43
industry-based stratas
, 44
Inference
, 138, 146
efficiency bounds
, 148–152
efficient estimation
, 152–155
modes
, 176
related literature and contribution
, 146–148
testing
, 155–156
see also Conditional moment restriction models
Informative sampling
, 175
Internet match high-quality traditional surveys
health insurance coverage
, 30
home ownership
, 30
HRS
, 6–7
individual earnings
, 31
methods and outline
, 5–6
predicted mean health status by age and survey mode
, 32
predicted probability
, 33
satisfied with life
, 32
self-reported health
, 30–31
socioeconomic variables in HRS, UAS and CPS
, 11–14
survey outcomes in HRS and UAS
, 14–26
UAS
, 7–11
whether retired
, 31
Interview
face-to-face
, 4, 18, 22
mode
, 4–5
online
, 14
telephone
, 4
Inverse probability weighted estimator (IPW estimator)
, 162n12
Item response rate
, 244–245
Iterative E–M algorithm approach
, 241
Iterative estimators of regression coefficients
, 146
Iterative proportional fitting (IPF)
, 88
Labor market duration (LMD)
, 178, 196
Lagrange multipliers
, 154
Least absolute deviations (LAD)
, 120
Least squares cross-validation (LSCV)
, 177, 189, 204–208
Leave-one-out kernel estimator
, 204
Length biased sampling problem
, 147
Linear regression model
, 63, 139, 144–146
Linearization
of estimator
, 223
methods
, 89
variance estimation via
, 95–96
Local constant estimation (LC estimation)
, 188
median bandwidths
, 194–196
median MSE for
, 190–194
Local constant estimator
, 175–176, 178–180
Local empirical loglikelihood
, 155
Log-likelihood functions
, 110, 270
negative of
, 113–114
Logistic regression analysis
, 37
Logistic regression models
, 39–40, 51
card acceptance
, 51–55
card usage
, 55–57
Logit
, 311
predictive model
, 299
Lognormal distributions
, 129
Lyapunov DoubleArray Central Limit Theorem
, 202–203
Mahalanobis distance
, 213
Matching
discrepancy
, 215
estimators
, 210
scalar variable
, 221
variable
, 215
Maximum likelihood raking estimator
, 92–93
Median absolute deviations (MAD)
, 190
Misclassification
, 260–261
Missing at random (MAR)
, 213, 297
Missing completely at random (MCAR)
, 297
Missing data process
, 213
Missing not at random (MNAR)
, 297
data
, 312
see also Planned missing data design
Missing-by-design approaches
, 239
Model-assisted estimator
, 176
local constant estimator
, 180
Model-assisted nonparametric regression estimator
, 178
asymptotic properties
, 180–186
local constant estimator
, 178–180
model-assisted local constant estimator
, 180
Model-based inference
, 176, 180
Model-selection statistic
, 122
Model-selection tests
for complex survey samples
, 110
examples
, 128–129
exogenous stratification
, 126–127
nonnested competing models and null hypothesis
, 111–114
with panel data
, 124–126
rejection frequencies
, 131–132
small simulation study
, 129–133
statistics under multistage sampling
, 122–124
under stratified sampling
, 114–122
Modern imputation methods
, 239
Modified kernel density estimator
, 204
Modified plug-in method
, 186
Money market deposits (MMDA)
, 241
Monte Carlo simulations
, 188, 212
bandwidths
, 194–196
DGPs in
, 188, 190
median bandwidths for WLC and LC
, 194–196
median MSE for WLC, LC, WLS and OLS
, 190–193
sample mean squared error
, 189–194
strata borders
, 189
Multiple imputation
, 242, 297–301
See also Nearest neighbor imputation
Multiple Imputation by Chained Equations (MICE)
, 310
average marginal effects for
, 309
binary variables
, 311
imputation models
, 310–311
MNAR data
, 312
ordered variables
, 311–312
Rubin’s rules
, 312–313
Multiple-matrix sampling
, 239
Multiplicative nonresponse models
, 99
Multistage sampling, tests statistics under
, 122–124
Multivariate extensions
, 139
Multivariate normal distribution
, 310
National statistical institutes (NSIs)
, 89
National Survey of Families and Households (NSFH)
, 110, 122
Nearest neighbor imputation
, 210
basic setup
, 212–214
Nearest neighbor imputation estimator
, 210–213, 217–218, 222, 230
proofs of theorems
, 228–234
replication variance estimation
, 218–220
results
, 214–217
simulation study
, 221–224
Nearest neighbor ratio imputation
, 225
New York City Police Department (NYPD)
, 277n1
Non-negligible sampling fraction
, 224–225
Nonlinear least squares estimators
, 129
Nonlinear regression models
, 139
Nonnested competing models
, 111–114
Nonparametric kernel regression
, 174
application
, 196–197
bandwidth selection
, 186–188
model-assisted nonparametric regression estimator
, 178–186
Monte Carlo simulations
, 188–196
proofs
, 199–209
Nonparametric maximum likelihood estimators (NPMLE)
, 147
Nonparametric methods for estimating conditional mean functions
, 174
Nonprobability Internet panels
, 8
Nonrandom weight for stratum-cluster pair
, 122
Nonresponse
, 100
bias
, 240
linearization method
, 101–102
Normalized asymptotic distribution
, 95
Novel bootstrap procedure
, 211
Null hypothesis
, 111–114, 117, 126
Pairs cluster bootstrap
, 65
Panel data, model-selection tests with
, 124–126
Panel Survey of Income Dynamics
, 110
Payment card acceptance and usage
data description
, 38–39
estimates of card acceptance
, 42–43
estimating card acceptance
, 41
Hungarian payments landscape
, 38
key variables
, 39–40
logistic regression models of card acceptance and usage
, 51–57
methodology
, 41
models of card acceptance and usage
, 41
regression models of card payments
, 43–46
review of literature
, 49–51
“Period treated” dummy
, 64
Planned missing data design
, 239, 242, 244
full-coverage and partial-coverage items
, 248
original 2016 sample counts
, 251
problem of dividing survey form
, 248–249
sampling scheme
, 247
survey for subsample
, 244–245
survey form
, 245–246, 250
survey stratification, counts and rates
, 246
Plug-in method for bandwidth
, 186–187
Poisson distribution
, 132
Polychotomous choice
, 262
Pooled estimation methods
, 125
Pooled nonlinear least squares
, 125
Pooled objective functions
, 126
Population minimization problem
, 112
Post-stratification
factors
, 15
weights
, 10
Post-stratification weights
, 9
Power series estimators
, 219, 231
Primary sampling units (PSUs)
, 88, 290
Probability
density function
, 124
distributions
, 141
models of illegal sales
, 294
probability-based Internet surveys
, 6
probability-based panels
, 8
Product kernel
for continuous variables
, 179
function for vector of discrete variables
, 179
Propensity score matching
, 223
Pseudo-empirical likelihood objective function
, 93
Pure design-based inference
, 180
Racial discrimination
, 260, 261, 262, 272, 273, 275
Rademacher distribution
, 76
Raking ratio estimator
, 88–89
classical raking estimator
, 90–92
GREG
, 93–94
maximum likelihood raking estimator
, 92–93
RAND American Life Panel
, 8
RAND-HRS indicator of retirement
, 18
Randomization inference (RI)
, 62, 67
problem of interval p values
, 69–71
Randomized planned missing items
, 239
Re-randomizations
, 69
re-randomized test statistics
, 73
“Realized” population
, 138
Recency effects
, 21–22, 24, 27
Regression analysis
, 288–289
Regression models of card payments
, 43
acceptance
, 43–45
usage
, 45–46
Replication variance
estimation
, 211–212, 218–220
estimator
, 218
Residual analysis
, 313–314
Respondents
, 22
feedback
, 242
Response linearization method
, 101–102
Response quality improvement
business surveys
, 239–240
challenge
, 242–244
mean number of item responses
, 256
planned missing data design
, 239, 244–251
response counts
, 252
response rates by stratum
, 254
standard approach
, 240–242
survey design
, 251
2016 survey stratification, counts and rates
, 255
unit response rate
, 253
unit responses
, 238
Rule-of-thumb method
, 162–163n14, 177
Salary
, 298
imputation model
, 308
ratio of missing values
, 298
Sample mean squared error
, 190–194
Sample selection
correction
, 262
procedure
, 241
Savings institutions (SAV)
, 241
Scalar matching variable
, 211, 228
Scalar population parameter
, 96
Secondary sampling units (SSUs)
, 290
Selection correction
, 271
model
, 273
Selective crime reporting
characterization of equilibrium
, 269
coefficient estimation
, 283
empirical strategy
, 269–272
estimates of hit rates test on summons
, 284
existence of equilibrium
, 268
hit rates test
, 262
misclassification
, 260–261
Pedestrians
, 267–268
police officers
, 268
reason for Stop and Pedestrian characteristics
, 285
results
, 272–275
robustness checks for sample correction estimates
, 286
Stop-and-Frisk program
, 263–267
theory
, 267
Self-reported health with CPS limited to HH respondents
, 30
Self-reported mammography
, 277
Self-reported Transaction Price (STP)
, 292, 294–295, 301–302
Sequential importance sampling (SIS)
, 8
Shrinkage empirical-likelihood-based estimator
, 94
Sieves estimation
, 231–233
Simple random sampling (SRS)
, 100–101, 174, 210
median bandwidths for WLC and LC under
, 194
median MSE for WLC, LC, WLS and OLS under
, 191
Simulation study
, 156, 160–161, 221–222
aggregate shares for
, 157
design
, 156–157
inconsistency of LS estimator
, 158, 159
Size
, 292
key variables
, 39
size-based stratas
, 44
Small simulation study
, 129–133
Smoothed empirical likelihood estimator (SEL estimator)
, 152, 153, 154, 159, 162n14, 169
Smoothing parameters
, 186
Social desirability effects
, 22, 24–25
Socioeconomic variables in HRS, UAS and CPS
, 11–14
Split questionnaire survey design
, 239
Standard likelihood ratio testing principle
, 110
Standard ratio estimators
, 241
Standard stratified sampling (SS sampling)
, 111, 114–121
Stata
, 128
implementation under
, 103
Statistical inference
, 138
Stop-and-Frisk program
, 260–262, 263–267
Strategic money counterfeiting model
, 290
Stratification
, 44–45, 122, 241
approach
, 42–43
in terms of individual income
, 121
Stratified random samples
, 41
Stratified sampling
, 114, 153
SS sampling
, 114–121
testing under
, 114–122
VP sampling
, 121–122
Superpopulation
model
, 215
nonparametric regression model
, 175
parameters
, 181
Survey
of banks
, 240
business
, 239–240
cross-validation
, 187
face-to-face
, 4–5
form
, 245
Internet-based
, 5
large-scale
, 174
modes
, 6
nonmandatory
, 239
online
, 4–5
statistics
, 37–38
telephone surveys
, 4–5
web
, 5
See also 2013 methods-of-payment survey (2013 MOP survey); Internet match high-quality traditional surveys
Survey of Labour and Income Dynamics (SLID)
, 196
Survey outcomes in HRS and UAS
, 14
health economics literature
, 20
health insurance coverage
, 17
home ownership
, 16
individual earnings
, 18
mode effects
, 21–26
post-stratification factors
, 15
self-reported health
, 19
whether retired
, 17
“Synthetic controls” method
, 63
Unconditional empirical probabilities
, 170
Unconditional summary statistics
, 261
Underrepresents high-income households
, 14
Understanding America Study (UAS)
, 5–6, 7
sampling and weighting procedures
, 8–11
socioeconomic variables in
, 11–14
survey outcomes in HRS and
, 14–26
Unit response rate
, 244–245, 254
Unordered multinomial logit model
, 270
Unplanned missing items
, 239
Unweighted HRS race/ethnicity distribution
, 13
Unweighted life-satisfaction distribution
, 21
Unweighted objective function
, 119
Unweighted statistic
, 131
Urban Area to ZIP Code Tabulation Area Relationship File of US Census Bureau
, 9