Attachment H: Sample Size and Sample Plan
Sample Strategy
A two-stage sampling approach will be used in the National Survey of U.S. Long-Haul Truck Driver Safety and Health. The first stage will be selection of truck stops where survey administration will occur, and the second stage will be selection of truck drivers at each truck stop to whom the survey will be administered. This is the most efficient and effective way of obtaining a representative sample of drivers for administration. In order to ensure that long-haul truck drivers stopping at truck stops along both heavily-traveled routes and lesser-traveled ones will be represented in the survey sample, a sample strategy has been developed in which a randomly selected sample of drivers stopping at each type of truck stop will be included. Survey administration including interview administration and anthropometric measurements will be done during personal interviews of selected drivers.
An optimal sample design balances survey cost factors with estimation procedures in order to obtain estimators with minimum variance at lowest cost. It is generally most cost efficient to plan to have the interviewers spend a fixed amount of time at each site. This allows for scheduling site visits in a straightforward way. Our sample design includes a fixed number of truck drivers m to be interviewed at each of n sampled truck stops. The n sampled truck stops will be sampled with probability proportional to estimated size (pps), where estimated size is the number of paved parking spaces for trucks at the truck stop.
Sample Size
In the National Survey of U.S. Long-Haul Truck Driver Injury and Health, long-haul truck drivers stopping at selected truck stops will be interviewed. Interviews will be administered at each truck stop for 3 days. Since the number of interviews to be completed at each truck stop during the data collection period is not expected to be the same and will depend on the traffic flow through that truck stop, truck stops are defined as ‘high-flow’ and ‘low-flow’ truck stops. High- or low-flow truck stops are determined by the truck stop’s location on traffic corridors with truck traffic flows of 10,000 or more trucks per day, based on the Freight Analysis Framework Version 2 [Federal Highway Administration 2007]. The Freight Analysis Framework (FAF) integrates data from a variety of sources to estimate commodity flows and related freight transportation activity among states, regions, and major international gateways between 2002 and 2035. Because of the lower truck traffic flow in the low-flow truck stops, we assume that only half as many interviews will be completed in each low-flow truck stop as are completed in a high-flow truck stop. Under these assumptions, the best design is to have roughly 80% of the truck stops as high-flow and 20% low-flow, resulting in 90% of all interviews being done in high-flow truck-stops and 10% of all interviews being done at low-flow truck-stops. Travel costs to low-flow sites are assumed to be twice those for high-flow sites.
The
number of truck drivers m to be interviewed at each of the n
sampled truck stops was determined by minimizing the variance of
estimator
of
characteristic Y under the sample design. Considering that the truck
stops are stratified into high/low-flow strata, and accounting for
both between-site variability and within-site variability for each
type of truck stop, the total variance for an estimator in this two
stage sample would be [Levy and Lemeshow 1980]:
σ2(Y) = σ2h(b) + σ2h(w) + σ2l(b) + σ2l(w) , (1)
σ2(Y)
=
+
(2),
where
σ2(Y) = Variance for characteristic Y
σ2h(b) = Between site variance for high-flow truck stops
σ2l(b) = Between site variance for low-flow truck stops
σ2h(w) = Within site variance for high-flow truck stops
σ2l(w) = Within site variance for low-flow truck stops
M = Total number of truck drivers across all truck stops
Mh = Number of truck drivers at high-flow truck stops
Ml = Number of truck drivers at low-flow truck stops
nh = Number of high-flow truck stops
nl = Number of low-flow truck stops
mh = Number of truck drivers interviewed at high-flow truck stops
ml = Number of truck drivers interviewed at low-flow truck stops
Individual truck stops may be considered to be clusters of truck drivers to which the interview is administered. Formula (2) above does not take into account clustering effects. Such effects tend to decrease variability within clusters, thus increasing sample size as compared to a simple random sample. Clustered samples tend to have less precision than simple random samples of the same size, because units within the same cluster usually are more homogeneous than units from different clusters. Based upon a previous data collection effort at truck stops [Belman 2005], we expect a within-truck stop correlation of responses (ρ) to be less than one percent (0.01). We will make the additional assumptions for sample design optimization purposes that between-site and within-site variances are the same for high-flow and low-flow sites, and that between-site variances are 1/99 times within-site variances (corresponding to a within-site correlation ρ =0 .01). This is summarized as follows:
(3)
(4)
The expected variance of the proportion P of truck drivers with given characteristics of interest was calculated according to equation 2 above, accounting for correlation effects in (3) and (4) and costs per survey. Since this study will estimate prevalence of several conditions in the truck driver population, P may vary over a wide range. Consequently, P = 0.5 was used in sample size calculations since that prevalence requires the greatest sample size.
Those
values of M, Mh, Ml, m, mh, ml,
nh, and nl corresponding to a standard error
(
)equal
to 1.24% were determined. A standard error of 1.24% corresponds to a
95 percent confidence interval for P=0.5 of plus or minus 0.025
(indicating, in this study, a prevalence of a given health condition
equal to 50% with 95% confidence interval plus or minus 2.5%). Given
the assumptions above (i.e., that the flow of truck drivers in the
low-flow sites is about half that of high-flow sites,, that travel
costs to low-flow sites are twice those for high-flow sites, and that
correlation within truck stops will be 0.01), a total n=2,457
interviews will be needed. Interview administration will take place
at 41 high-flow truck stops and 9 low-flow truck stops. Fifty-four
interviews are expected to be administered in each high-flow truck
stop and 27 administrations at each low-flow truck stop, resulting in
a total 2,214 interview administrations at high-flow truck stops and
243 administrations at low-flow truck stops.
Taking into account an expected 20 percent refusal rate for participation in interview administrations [Federal Highway Administration 2002] and estimated 12% driver ineligibility rate [Belman et al 2005], a total of 3,500 truck drivers should take the eligibility screening interview for study participation in order to obtain the 2,457 long-haul truck driver participants needed.
Factors affecting Precision of Estimator
Within-site correlation
In evaluating equation (2) for variance of the estimator, a critical parameter is the within-truck stop correlation coefficient ρ. If the within-site correlation is 0, then estimators from all designs with the same total number of interview administrations will have the same variance. If on the other hand the within-site correlation is high, then designs with the same total number of interviews but taken in a smaller number of sites will show considerably greater variance. The effect of changes in within-site correlation for the design used in this study is shown in Table H1 below. The increase in standard error of the estimator with increasing correlation coefficient ρ is illustrated in Table H1.
Differing Ratios of Drivers Presenting at high- and low-flow truck stops
An
important parameter is the ratio of the total number of drivers
arriving at high-flow truck stops during the survey period (
)
to the total number of drivers arriving at low-flow truck stops
(
).
Table H2 presents standard errors using equation (2) for differing
ratios of Mh and Ml. The design does well for
ratios of
to
of 4 to 1 or greater, and the standard error is not much higher for
somewhat smaller ratios (down to 2 to 1). But if
is significantly larger than anticipated (making the ratio
significantly smaller and less than 2 to 1), then the variance begins
to increase (i.e., the sample size of 9 becomes too small for
low-flow sites if the true traffic in these sites is greater than
anticipated). A small extra set of low-flow sites will be selected in
the National Survey of Truck Driver Injury and Health if it appears
that
is
larger than anticipated.
Additional factors affecting standard error calculations above have to do with
extra
sources of variability: nonresponse adjustments, variability in the
of truck driver counts
and
with time t of the survey period, and
and
adjustments needed for differences in field time and staff across
sites. Table H2 presents standard errors that include an inflation
factor of 50% to account for these extra sources of variability.
Table H1. Expected Standard Error of Estimator for
Different Values of Within-Site Correlation ρ
Number of High-Flow Truck Stops |
Number of Low-Flow Truck Stops |
Number of Interviews at each high-Flow Truck Stop |
Number of Interviews at each Low-Flow Truck Stop |
Within-Site Correlation Coefficient ρ |
Standard Error of Estimator |
Relative Increase in Standard Error |
41 |
9 |
54 |
27 |
.01 |
.0124 |
|
41 |
9 |
54 |
27 |
.03 |
.0162 |
30.5% |
41 |
9 |
54 |
27 |
.05 |
.0194 |
56.1% |
Percent of All Drivers Stopping at High-Flow Truck Stops Mh/(Mh + Ml)
|
Ratio Mh/Ml
|
Standard error with in-site correlation ρ=0.011
|
98% |
49:1 |
1.59% |
95% |
19:1 |
1.55% |
90% |
9:1 |
1.52% |
85% |
5.7:1 |
1.53% |
80% |
4:1 |
1.57% |
75% |
3:1 |
1.64% |
70% |
2.3:1 |
1.75% |
65% |
1.9:1 |
1.87% |
60% |
1.5:1 |
2.02% |
55% |
1.2:1 |
2.18% |
50% |
1:1 |
2.36% |
45% |
.9:1 |
2.54% |
1 Includes 50% inflation factor
Sample Selection
A two stage sample selection procedure will be used. The first stage will involve selection of truck stops at which survey administration will occur. The second stage will involve selection of individual truck drivers to whom interviews will be administered.
Truck Stop Selection
The National Survey of U.S. Long-Haul Truck Driver Injury and Health will be conducted at selected truck stops located throughout the 48 contiguous states. These truck stops will be stratified by volume of traffic (high-flow or low-flow) and size of truck stop (number of paved parking spaces for trucks).
High-flow truck stops will be those stops along routes in the National Highway System (NHS) which had (in 2002) a traffic volume of 10,000 trucks or more per day for more than half of their length estimated from the Freight Analysis Framework. The National Highway System includes the Interstate Highway System as well as other roads important to the nation's economy, defense, and mobility, such as:
Principal arterials which provide access between an arterial and a major port, airport, public transportation facility, or other intermodal transportation facility.
The Strategic Highway Network of highways which provide defense access, continuity and emergency capabilities for defense purposes.
Major Strategic Highway Network Connectors
Intermodal Connectors
National Highway System routes considered for the National Survey of Truck Driver Injury and Health are listed in Table H3 and shown in Figure H1. The high-flow truck stop sampling frame will be restricted to truck stops along these designated major arteries, which will then be divided into state-artery sections. These state-artery sections will be sampled with probability proportional to the length of the state-artery section. The sampling of state-artery sections will be stratified by region (West, Central, Great Lakes, South, Northeast). Table H4 lists geographic regions included in this study.
Low-flow truck stops will include all truck stops which are not located on the high-flow routes listed in Table H3. They will be selected from truck stops in individual states. Individual states will be sampled with probability proportional to their population. Selecting whole states will avoid needing to designate a set of arteries with truck volume less than 10,000 trucks per day. After excluding the high-flow arteries, it is assumed that other truck traffic in the state will be roughly proportional to population. Low-flow truck stops will be chosen from truck stops in selected states which are not on the high-flow routes listed in Table H3.
Table H3. High Flow National Routes
CA State 99 |
Sacramento, CA to I-5 Jct south of Bakersfield, CA
N |
NJ Turnpike |
All |
I-10 |
Santa Monica, CA to I-20 Jct Texas |
I-10 |
Jacksonville, FL to San Antonio, TX |
I-12 |
All |
I-16 |
All |
I-20 |
I-10 Jct TX to Shreveport, LA |
I-20 |
Meridian, MS to Birmingham, AL |
I-24 |
All |
I-25 |
Fort Collins, CO to Pueblo, CT |
I-26 |
All |
I-30 |
All |
I-35 |
Wichita, KS to San Antonio, TX |
I-40 |
I-15 jct to I-95 jct |
I-44 |
All |
I-45 |
Dallas, TX to Houston, TX |
I-5 |
All |
I-55 |
Memphis, TN to Chicago, IL |
I-57 |
All |
I-65 |
Nashville, TN to Chicago, IL |
I-65 |
Decator, AL to Mobile, AL |
I-69 |
Indianapolis, IN to Flint, MI |
I-70 |
Baltimore, MD to Grand Junction, CO |
I-71 |
All |
I-74 |
Indianapolis, IN to Cincinnati, OH |
I-75 |
Bay City, MI to Naples, FL |
I-76 |
All |
I-77 |
Charleston, WV to Charlotte, NC |
I-80 |
New York City, NY to Salt Lake City, UT |
I-80 |
Sacramento, CA to Oakland, CA |
I-81 |
Scranton, PA to I-40 junction |
I-84 |
All |
I-85 |
Atlanta, GA to Durham, NC |
I-87 |
Albany, NY to New York, NY |
I-90 |
Albany, NY to Rochester, MN |
I-91 |
New Haven, CT to Hartford, CT |
I-94 |
Minneapolis, MN to Detroit, MI |
I-95 |
Miami, FL to Bangor, ME |
I-205 |
San Lorenzo, CA to I-5 Jct |
Table H4. States Included by Geographic Region
Region |
States Included |
|
|
Northeast |
Maine, New Hampshire, Vermont, Massachusetts, Rhode Island, Connecticut, New York, Pennsylvania, New Jersey, Delaware, Maryland |
|
|
|
|
Great Lakes |
Ohio, Indiana, Michigan, Illinois, Wisconsin, Minnesota |
South |
Virginia, West Virginia, North Carolina, South Carolina, Georgia, Florida, Alabama, Tennessee, Kentucky, Louisiana, Mississippi, Arkansas |
|
|
|
|
Central |
North Dakota, South Dakota, Iowa, Nebraska, Missouri |
|
Oklahoma, Kansas, Texas |
West |
Washington, Idaho, Montana, Wyoming, Oregon, California, Nevada, Utah, Colorado, New Mexico, Arizona |
|
The sample frame for truck stops will be based on listings included in the publication Trucker’s Friend: 2008 National Truck Stop Directory [Brice 2008]. Any given truck stop will either be high-flow or low-flow depending on whether it is located on a high-flow national route or not. No truck stop will have two chances of selection.
Truck stops are defined in the Truckers Friend by size as follows:
‘S’: 5 to 24 truck parking spaces;
‘M’: 25 to 84 truck parking spaces;
‘L’: 85 to 149 truck parking spaces;
‘XL’: 150 or more truck parking spaces.
One truck stop will be selected with probability proportional to size for each selected state-artery or state. If the same state-artery or state is selected multiple times, as many truck stops as the number of selections of the state-artery or state will be selected; in this case, the selection of truck stops will be without replacement in order to ensure that the same truck stops are not sampled twice. The measures of size defined for each size category will be as follows:
‘S’: measure of size 15;
‘M’: measure of size 55;
‘L’: measure of size 127;
‘XL’: measure of size 250.
If the management of a particular truck stop refuses participation, that truck stop will be replaced with a replacement truck stop within the same artery-state (for the high-flow stratum) or within the same state for the low-flow stratum. The replacement truck stop will have the same measure of size as the refusing truck stop and will be as close as possible to the refusing truck stop (on the same road if possible for the low-flow stratum).
Table H5 illustrates expected numbers of truck stops to be surveyed and numbers of interviews given, assuming 54 interviews at each high-flow truck stop and 27 interviews at each low-flow truck stop.
Table H5. Expected Sample Allocation
Type of Truck Stop |
All U.S. Truck Stops1 |
Sample |
Number of Interviews |
High Flow |
1802 |
41 |
2214 |
Low Flow |
2412 |
9 |
243 |
Truck Stop Size |
|
|
|
Small |
1784 |
21 |
1026 |
Medium |
1356 |
16 |
783 |
Large |
541 |
6 |
297 |
Extra-Large |
533 |
7 |
351 |
Total |
4214 |
50 |
2457 |
1 listings from Trucker’s Friend: 2008 National Truck Stop Directory [Brice 2008].
Figure H1. High-Flow National Routes by Geographic Region
Random Selection of Truck Drivers Within Truck Stops
A sample of truck drivers entering the truck stop during periods of data collection will be selected for personal interview. To ensure random selection of drivers, as an interviewer is about to become available the very next driver entering the truck stop will be invited to participate in the study by first taking a screening interview to determine study eligibility. Each truck driver who enters during a period when an interviewer is available for administration will then have an equal chance of selection in this process (i.e., the process of selecting the next driver will be objective and not subject to the discretion of the recruiter). It is also assumed that time entries of drivers into the truck stops are sufficiently inherently random so that there will be no systematic bias generated from not being able to interview drivers who enter when another interview is still being done. There will be no recruitment for interviews during periods when an interviewer is not available. It is anticipated that eligible drivers will have less than a one hour wait before being interviewed. A tally will be kept of all truck drivers entering the truck stop during multiple randomly sampled data collection periods. Recruitment and interview administration periods will include morning and lunch periods (8-10 AM and 12-2 PM) as well as the late afternoon/dinner period.
Data Collection Teams and Training
Survey administration and driver recruitment at each truck stop will be done by a team consisting of three individuals: one member to recruit participants and two members to administer personal interviews and collect anthropometric measurements. Interviews are to be conducted during a 3-day period at each truck stop. During the three day period, truck drivers will be selected randomly to be interviewed during different time periods each day. As the interviewers finish their interviews, the next set of truck drivers will be selected for interviewing.
Three teams of three data collectors will be trained, along with two back-up people for a total of 11 trainees. All surveyors will attend training covering the technical and administrative protocols required to successfully complete the data collection activities, and which will provide them general information about health issues and long haul trucking operations. The training will help data collectors understand the context and purpose of various questions. The following topics will be covered in the training sessions and included in the surveyors’ training manual:
Background on the Survey of Truck Driver Injury and Health;
Detailed overview of data collection fundamentals;
Refusal avoidance and conversion techniques;
Instructions for recording and transmitting data;
Discussion of Privacy and Human Subjects Rights;
Overview of health concerns;
Long haul trucking operations;
Research focus of the National Institute for Occupational Safety and Health (NIOSH);
Trucking oversight responsibilities of the Federal Motor Carrier Safety
Administrative procedures.
Estimation and Weighting
High-Flow Truck Stops
The overall
probability of selection
for each high-flow state-artery will be
where
is the mileage length of the state-artery section hi and the
summation is over all high-flow state-artery sections.
One
truck stop will be selected with probability proportionate to size
for each selected state-artery segment. The probability of selection
of
the sampled truck stop is
where
is the measure of size of the truck stop, and
is the number of truck stops listed for state-artery unit hi.
Low-Flow Truck Stops
The overall
probability of selection
at the state level for the low-flow sample will be
where
is the 2008 population of the state, and the summation is over all
states except Alaska and Hawaii.
Within selected states, one truck stop will be sampled with probability proportional to size (number of parking places). The probability of selection for each low-flow truck stop within each state will be
where
is the measure of size of the truck stop (15, 55, 127, 250), and
is the number of low-flow truck stops listed for state s.
Probability of selection of truck drivers
The
probability of selection of each truck driver will then be
for high-flow truck stops, where
is the number of interviewed truck drivers in the truck stop hij
and
is the estimated number of truck drivers who entered the truck stop
hij during the interview period. Similarly for low-flow truck
stops the probability of selection of each truck driver is
,
where
is the number of interviewed truck drivers in the sampled truck stop
lsj and
is the estimated number of truck drivers who entered the truck stop
lsj during the interview period
Weights
The
final weight
assigned to each interview will be a product of the following
factors:
A base weight factor equal to the inverse of the probability of selection of the sampled truck stop;
A base weight factor equal to the inverse of the probability of selection of the truck driver within the sampled truck stop;
An interviewer-day adjustment factor T (if necessary);
A nonresponse adjustment NR applied for nonresponding truck drivers;
The nonresponse adjustment will be based on responses to the nonresponse questions and will be the reciprocal of the weighted response rate to selected questions included in the non-respondent interview (Attachment F2).
A total of 50 truck stops (41 high-flow and 9 low-flow) will be selected. The mean number of completed interviews expected for high-flow sites will be 54 and the mean number of completed interviews for the low-flow sites is expected to be 27. The actual value for any given truck stop will vary, however. The only value that is fixed under this sample design is the number of days interviews are to be administered at each type of truck stop. If for some reason the number of interviews within the time period varies for a given site, then an adjustment in the weighting procedure will be made for that site for estimation purposes. For example, suppose the expected number of interviews at a site could not be completed because of the illness of the interviewer during the interview period. If only half of the expected number of interviews were completed, a weighting factor adjustment of 2 would be applied. Likewise, if a given site is visited for two days rather than three for some reason, then a weighting factor adjustment of 1.5 would be attached.
Estimation
If
,
are the respondent sample sizes for sites hij
(lij),
the estimators of a particular characteristic y
may be given as
The
quantity
)
is the nonresponse adjustment for driver hijk
(lijk).
An overall nationally representative estimator for prevalence of a
characteristic y
among long-haul truck drivers across both high- and low-flow truck
stops (denoted by subscripts ht or lt) may be given as:
,
where the quantities
(
)
are estimates based on short-period counts of the number of truck
drivers who pass through truck stop hij (lij) during
the data collection period, and Thij
(Tlij) is an adjustment to a common
length period for all truck stops (if the length of time varies from
the norm of three days).
We will estimate variance by
using replication, given the complexity of an exact variance formula.
Equation (2) can be seen as an approximation if we leave out extra
components of variability contributed by
(
)
and Tht (Tlt),
as well as the nonresponse adjustments
).
File Type | application/msword |
File Title | Sample Strategy |
Author | wks1 |
Last Modified By | sxw2 |
File Modified | 2010-05-24 |
File Created | 2010-05-24 |