[1205-0453: The National Agricultural Workers Survey, Part B]
1. Description of Universe and Sample
Universe
Entity |
Universe |
Sample |
Agricultural Region |
12 |
12 |
Farm Labor Areas |
497 |
90 |
Farms w/ hired or contract labor |
338,373 |
564 |
Hired Crop Workers (estimated) |
1,400,000 |
1,500 |
The universe for the study is the population of field workers active in crop agriculture in the continental United States (U.S.). The National Agricultural Workers Survey (NAWS) will use multi-stage sampling relying on probabilities proportional to size to interview approximately 1,500 randomly selected crop workers in fiscal year (FY) 2012. To achieve this number of interviews, cyclical targets include an oversample of 200 interviews to account for possible non-response.
b) Response Rate
The sampling design, described below, involves obtaining a random selection of employers. In FY 2009, 66 percent of the randomly selected employers (or their surrogates) who employed workers the day they were contacted by interviewers agreed to cooperate in the survey and interviews were conducted on 59 percent of the eligible establishments. As there are no universe lists of workers, the sampling frame of workers is constructed after contact with the employer.
Once interviewers have a worker frame, a random sample of workers is chosen. The interviewers, who generally work in pairs, approach workers directly to set up interview appointments in their homes or other agreed-upon locations. In 2008-2009, 92 percent of the approached workers agree to be interviewed.
2. Statistical Methodology
Overview
The goal of the NAWS sampling methodology is to select a nationally representative, random sample of farm workers. The NAWS uses stratified multi-stage sampling to account for seasonal and regional fluctuations in the level of farm employment. The stratification includes three interviewing cycles per year and 12 geographic regions, resulting in 36 time-by-space strata. For each interviewing cycle, NAWS staff draws a random sample of locations within all 12 regions from the universe of 497 Farm Labor Areas (FLAs). FLAs are single or multi-county sampling units which form the primary sampling units (PSUs). Employers within PSUs are the secondary level and workers within employers are the tertiary level of sampling units. The number of interviews allocated to each region is based on regional farm worker employment data (number of agricultural hired and contract workers) from USDA’s Farm Labor Survey. Similarly, the number of interviews allocated to each FLA is proportional to the number of farm workers employed at that time of the year. The FLA size measure is obtained by multiplying a seasonality estimate, derived primarily from the BLS QCEW, by local farm labor expenditure data, from USDA’s Census of Agriculture (CoA). Interview allocation is thus proportional to stratum size.
In each FLA, a simple random sample of agricultural employers is drawn from a universe list compiled mainly from public agency records. NAWS interviewers then contact the sampled employers or farm labor contractors, arrange access to the work site, and draw a random sample of workers at the work site. Thus, the sample includes only farm workers actively employed in crop agriculture at the time of the interview.
Stratification
Interviewing cycles
To account for the seasonality of the agricultural industry, interviews are conducted three times a year in cycles lasting ten to twelve weeks. The cycles start in February, June and October. The number of interviews conducted in each cycle is proportional to the number of agricultural field workers hired at that time of the year. The U.S. Department of Agriculture’s (USDA) National Agricultural Statistics Service (NASS) provides the Employment and Training Administration (ETA) with the agricultural employment figures, which come from USDA’s Farm Labor Survey (FLS).
Regions
Regional stratification entails defining 12 distinct agricultural regions that are based on USDA’s 17 agricultural regions. At the start of the survey in 1988, the 17 regions were collapsed into 12 by combining those regions that were most similar e.g., Mountain I and Mountain II, based on statistical analysis of cropping patterns. In each cycle, all 12 agricultural regions are included in the sample. The number of interviews per region is proportional to the size of the seasonal farm labor force in that region at that time of the year, as determined by the NASS using information obtained from the FLS.
Sampling within Strata
Farm Labor Areas
Each region is comprised of several multi-county sampling units called Farm Labor Areas (FLAs). Originally, the NAWS used USDA Crop Reporting Districts, but experience showed that these units were not homogeneous with respect to farm labor. As a result, using Census of Agriculture data and ETA mappings of seasonal farm labor concentrations, the NAWS staff identified aggregates of counties that had similar farm labor usage patterns and were roughly similar in size. The resulting FLAs also account for varying county size across the United States. For example, in the East, a Farm Labor Area may include several counties; in the West, a Farm Labor Area may be composed of a single agriculture-intensive county. FLA size is more homogeneous within region than it is across regions. The approximately 3,000 U.S. counties are reduced to 497 farm labor areas.
Each cycle’s FLAs will be drawn from the universe of 497 FLAs. For each cycle, within each region, a sample of FLAs is drawn using probabilities proportional to size. The size measure used is an estimate of the amount of farm labor in the FLA during a particular cycle. In this case, the measure is based on the hired and contract labor expenses from the most recent Census of Agriculture, available at the time of drawing the sample. The CoA labor expenses are adjusted using seasonality estimates which identify the percentage of labor expenses that fall into each of the NAWS cycles: fall, spring and summer.
The seasonality estimates are constructed from Quarterly Census of Employment and Wages (QCEW) data. The estimates are made by aggregating the reported monthly employment for each month included in the corresponding NAWS cycle e.g., June, July, August, and September for the summer cycle. The percentage of employment corresponding to each cycle becomes that FLA’s seasonality estimate. In cases where there is insufficient information, state seasonality averages are used.
The total number of FLA visits for the NAWS annual sample is 90. To ensure that an adequate number of FLAs are visited in each region, a minimum of two FLAs are assigned per cycle. Thus 12 regions X 2 FLAs X 3 cycles = 72 FLAs. The remaining 18 FLAs are assigned proportionately using the seasonal FLS data. Sampling FLAs within each region for each cycle is done using PPS sampling, using SAS’ PROC SURVEYSELECT. The number of FLAs selected for each region has typically ranged from two to five for each cycle.
Counties
In selecting counties, an iterative sampling procedure is used to ensure an adequate number of counties is selected for each region. In most cases, interviews are completed in the first county and no additional counties are needed. However, because there is tremendous uncertainty about the number of workers in a county, additional counties are occasionally needed to complete the FLA allocation. Counties are selected one at a time without replacement using probabilities proportional to the size of the farm labor expenditures in that county at that time.
The process of selecting a county within a FLA begins with a randomly sorted list of counties within the FLA. A cumulative sum using the size of the seasonal hired and contract labor expenditures is constructed for this list similar to the process described above for FLA selection. When selecting a county, the selection number is the product of a random number selected from the uniform distribution multiplied by the cumulative total of the seasonal hired and contract payroll. The county that includes that number in its selection of the cumulative sum is selected.
Example showing selection algorithm for counties within FLAs
County |
Seasonal labor expenditures |
Cumulative Sum |
Selected |
A |
100,000 |
100,000 |
|
B |
300,000 |
400,000 |
|
C |
800,000 |
1,200,000 |
|
D |
450,000 |
1,650,000 |
<= |
E |
600,000 |
2,250,000 |
|
Selection
Random number |
0.657 |
Random number * cumulative sum |
1,478,250 |
Selected County |
D |
As mentioned above, often only one county is needed. Should additional counties be needed in a multi-county FLA, additional counties are pulled one at a time without replacement using the method described above. To be prepared, this is done in advance of interviewing but only implemented if needed.
Each county is marked with and ordered by its selection number (e.g., 1,2,3…) Interviews begin in the first selected county and, as a county's work force is depleted, interviewing moves to the second randomly selected county on the list and so forth, until all the allocated interviews in that FLA have been completed. In FLAs where farm work is sparse, interviewers may need to travel to several counties to encounter sufficient workers to complete the FLA’s allocation.
Zip Code Clusters
The next level of sampling is randomly selecting Zip Code Clusters within each county. Zip Code Clusters divide the county into smaller areas that are based on geographic proximity and the number of employers in the area. Zip Code Clusters are designed to be roughly equal in size. Selection of Zip Code Clusters is done by randomly sorting the lists. Field staff begin by contacting employers in the first cluster and then move down the list following the random order until the FLA allocation is filled or the county’s workforce is exhausted. In some large counties, the Zip Code Clusters are also large. In these cases, in order to increase the diversity of the employer sample, the NAWS limits the number of interviews that can be done in one cluster to 10.
Employers
Within each selected county, employers are selected at random from a list of agricultural employers. The list is compiled from administrative lists of employers in crop agriculture. An important component of the list is employer names in selected North American Industrial Classification Codes that the Bureau of Labor Statistics (BLS) provides directly to the contractor per the terms of an interagency agreement between the ETA and the BLS. Because of uncertainty about the conditions of seasonal farm labor in each location, the number of employers to be interviewed is not known in advance.
Simple random sampling is done at the employer level for several reasons. First, there is no reliable size information before the cycle, so using probabilities proportional to size (PPS) is difficult. Second, simple random sampling results in a greater variety of farm sizes whereas PPS favors larger farms. This increases the probability of including more information on events such as accidents and safety concerns that are considered more likely to occur at small farms.
For each cycle, a simple random sample of employers is selected for the first county on the county roster in each FLA. Once a county is selected, the corresponding employer list is sorted according to geographic order (e.g., zip code) and a random starting point is selected. Lists are geographically sorted to minimize the distance between farms and to increase interviewer efficiency. To ensure that interviewers cover the entire county, the list is sorted randomly and 50 employers are selected. If the interviewer has not filled the interview allocation after visiting those employers, then another pull of 50 employers is sampled, sorted and given to the interviewers. If, during the interviewing process, it becomes necessary to go to a subsequent county on the county roster for a particular FLA, then a random sample of employers for that additional county will be generated.
Workers
The interview allocation at each employer is roughly proportional to the FLA allocation. If the allocation was based on employer size, it would be possible to collect all interviews from a single employer if the FLA allocation was small and the first employer to participate had a large workforce. To ensure that interviews come from two or more employers per FLA, the following algorithm is used.
If the total number of interviews allowed for the county or clusters is:
Less than 25 interviews, the maximum interviews allowed per employer is 5
26-40 interviews, the maximum interviews allowed per employer is 8
41-75 interviews, the maximum interviews allowed per employer is 10
More than 75 interviews, the maximum interviews allowed per employer is 12
If the number of workers is less than the allocation then all workers at that employer are to be interviewed.
On large farms, workers are usually organized into crews consisting of several workers and a supervisor. While crew size can range from a handful to more than 100, crews of 30 or less are most typical. When the number of crews is large, randomly selecting workers from each crew would be infeasible for interviewers as well as an imposition on the farm employer. For this reason, on farms with more than one crew, the number of crews to be randomly selected will be determined using probabilities proportional to the square root of size. A chart similar to the one below will be provided to interviewers. Interviewers keep track of the number of crews selected and the size of the crew. Interviewers pro-rate the number of interviews across the selected crews based on the size of the crews. Their instructions for this are included in Appendix B.
Number of crews |
Number to select randomly |
1 to 2 |
1 |
3 to 6 |
2 |
7 or more |
3 |
Within each crew, the interviewers follow specific sampling instructions that were designed by a sampling statistician to ensure selection of a random sample of workers at each selected employer. Specifically, if n is the number of interviews for that crew and N is the total number of workers within the crew then interviewers place n marked tags and N-n unmarked tags in a pouch and shuffle them. Workers then draw a tag and those with marked tags are included in the sample. If multiple crews are used, the procedure is completed with each selected crew. These instructions are included in Appendix B
Weighting
The NAWS uses a variety of weighting factors to construct weights for calculating unbiased population estimates:
Sampling weights are used to calculate unbiased population estimates by assigning each sample member a weight corresponding to the inverse of its probability of selection.
Non-response factors are used to correct sampling weights for deviations from the sampling plan, such as discrepancies in the number of interviews planned and collected in specific locations.
Post-sampling adjustment factors are used to adjust the weights given to each interview in order to compute unbiased population estimates from the sample data.
The data used for these weights comes from several sources. The number or workers at the farm on the day of sampling is collected from the employer by the interviewer and is part of the sampling documentation. The number of qualified employers in the county comes from the Employer List universe. The county size information is Census of Agriculture farm labor expenditure data, which has been seasonally adjusted using the BLS QCEW. FLS data are used for the regions and cycles.
As explained below, non-response weights are calculated simultaneously with regional post-sampling adjustment weights.
Sampling weights
with ,
,
Calculating counprob, the county within FLA weight is more complicated. For example, if one of the sampled counties is larger than another, then its probability of selection should be higher than that of the other. If several counties are selected from a particular region, then the selection probability for a particular county is (1) its probability of selection on the first draw, plus (2) the probability of its selection on the second draw, plus (3) the probability of its selection on the third draw, etc.
For the standard method of sampling several items with probabilities proportional to size, without replacement, closed-form formulas for the exact inclusion probabilities do not exist. However, these probabilities can be calculated exactly using multiple summations. This procedure can be implemented in SAS within PROC IML.
Suppose that the population at a particular sampling stage consists of N objects with sizes , having total size . Let be the probability that the jth item is selected on the ith draw. Then for ,
11\* MERGEFORMAT () ,
33\* MERGEFORMAT () ,
44\* MERGEFORMAT () , and so forth.
These ith-draw probabilities each have the property that . Finally, the probability that the jth item is included in a sample of size n is . These inclusion probabilities have the property that .
Both the FLA and county selection probabilities can be calculated exactly using these formulas.
The formula for FLAprob which is the probability that the FLA was selected within the region derives from the systematic probability proportional to size selection process used.
Consider
N is the number of FLAs in the region
s1 through sN are the sizes of the FLAs.
S is the sum of the FLA sizes, so .
n is the number of the FLAs to be selected with probabilities proportional to size.
In selecting the FLAs, they were listed in a random order. A column of cumulative FLA sizes was constructed. That is, the cumulative size at the jth FLA will be .
A random starting point, k0, was chosen between 1 and . The integers can be listed. The jth FLA will be selected if one of these integers falls between and (where is interpreted to be 0).
Without loss of generality, consider the first FLA on the randomized list. It will be selected if k0 lies between 1 and s1. Thus, its probability of selection is .
In general, the probability that the jth FLA is selected is .
Non-Response Weighting
Non-response corrections adjust for deviations from the sampling design. If, for example, ten interviews should have been collected at a farm but only two interviews were collected, those two interviews could be given five times the weight they would have otherwise received. Thus, each interview’s weight needs to be adjusted to represent a certain value in terms of size. Instead of making this adjustment at the farm level, it could be made at any higher level in the sampling plan. For the NAWS this means at the employer list-within-zip code cluster, cluster‑within‑county, county, county list‑within‑FLA, FLA, FLA list‑within‑region, region, or national level.
By raising the level at which adjustments are made, overall size information is, generally, more reliable. This is due to the statistical effect of averaging, greater year‑to‑year stability over larger geographic areas, and the lower likelihood of the absence or suppression of data due to confidentiality considerations. On the other hand, lower‑level adjustments are more sensitive, if the information used for making the adjustments is reasonably accurate.
For two reasons, the NAWS non-response adjustments are made at the region level. First, the region is the lowest level with enough interview coverage to calculate weights for the size adjustment. All of the 12 NAWS regions are visited in every cycle. If, for some reason, there are too few interviews in a region, the region can be combined with adjacent regions for weighting purposes.
Second, the NAWS uses measures of size provided by the USDA Farm Labor Survey, which are reported by quarter and region. The USDA is the only source of quarterly statistics on levels of farm worker employment. The Census of Agriculture, for instance, collects annual data rather than quarterly and the statistics are published every five years. Thus, by using USDA Farm Labor Survey figures to make the size adjustment, the NAWS can adjust the weights by season and region and construct unbiased population estimates. Non-response adjustments for size, therefore, are made at the region-within-cycle level to create corrected region weights.
Post-sampling weights
Post-sampling weights are used in the NAWS to adjust the relative value of each interview in order for national estimates to be obtained from the sample. There are five post-sampling weights. Two of the weights adjust for unequal probabilities of selection that can only be determined after the interviews are conducted. These include the unequal probabilities of finding part-time versus full-time workers (day weight) and the unequal probabilities of finding seasonal versus year- round workers (seasonal weight). The next three weights (region, cycle, and year) adjust for the relative importance of a region’s data, a sampling cycle, and a sampling year. The measures of size used are obtained from the USDA. The region weight, as discussed below, is calculated simultaneously with non-response weighting. The cycle weight and year weight serve slightly different roles in estimation. They allow different cycles and sampling years to be combined for statistical analysis. These weights are also based on USDA measures of size.
It should be noted that the NAWS sampling plan is based on USDA NASS data collected in the year before the interviews. For example, fiscal year 2011 data is used to plan the NAWS 2012 sample. The weights, however, use NASS data from the year during which the interviews were conducted. This corrects for any discrepancies in allocations due to projecting farm worker distributions based on past year data.
The day weight adjusts for the probability of finding part-time versus full-time farm workers. A part-time worker, who works only two or three days per week, has a lower likelihood of being encountered by the interviewing staff than a worker employed six days per week. Therefore, respondents are weighted inversely proportional to the length of their workweek.
A conservative adjustment for the number of days worked is appropriate to avoid excessively large sampling weights. Field reports indicate that relatively few workers are contacted on Sundays, and a review of the interviews indicated that virtually no workers reported Sunday hours without Saturday hours. Accordingly, workers reporting at least six workdays per week nearly always have a full chance of selection. Thus, any workers reporting at least six days of work per week are treated as having a full chance of selection; adjustments are made only for those workers with less than six days of work per week. The day weight (DWTS) is computed as: DWTS = 6 / (length of the workweek) where “length of the workweek” is the number of days per week the respondent reports working at the time of the interview for the current farm task (if two tasks are reported, the one with more days per week is used). Seven-day workweeks are truncated to six, as explained previously. For the few workers not reporting the number of days, DWTS is assigned a default value of 1.
The calculation of worker‑based weights is complicated by the fact that workers could, in general, be sampled several times a year. Furthermore, neither the USDA, CoA nor the FLS information provide figures that can be used for the annual number of farm workers. The USDA CoA reports the number of directly-hired workers employed on each farm, but does not adjust for the fact that some workers are employed on more than one farm in the census year. In addition, CoA farm worker counts exclude labor-contracted farm workers. Similarly, the FLS is administered quarterly and reports the number of farm workers employed each quarter, so the same worker could be reported in multiple quarters. Because of this repetition of workers across quarters, it would be invalid to derive the total number of persons working in agriculture during the year by summing quarterly figures from the FLS.
As employment information is not available for every worker for each quarter of the year, the only way to avoid double‑counting of farm workers is to use the 12-month retrospective work history collected in the NAWS. Specifically, predicting future-period employment is achieved by imposing the assumption that workers who report having worked in a previous season would work in the next corresponding season. For example, a worker sampled in spring 2008 who reported working the previous summer (2007) is assumed to work in the following summer (2008). For some purposes, including the calculation of year-to-year work history changes, this assumption cannot not be used. For purposes such as obtaining demographic descriptions of the worker population, however, this assumption provides satisfactory estimates.
Furthermore, it is assumed that a worker has an equal likelihood of being sampled in each season worked. This assumption is dependent on a balance between the amount of farm work done by the worker in each season and the number of interviews obtained in that region for the season. Recall that the NAWS interview allocation is proportional to FLS seasonal agricultural payroll. Thus, the probability of sampling is related to the amount of work performed by individual workers. With these simplifying assumptions, it is possible to calculate a seasonal weight that is simply the inverse of the number of seasons the interviewee did farm work during the previous year.
For the purposes of the NAWS, there are only three seasons per year. An interviewee always performed farm work during the trimester he\she was sampled. From the NAWS interview, it can be determined during which of the two previous trimesters the respondent also did farm work. If the interviewee only worked during the current trimester, the seasonal weight is 1/1 or 1.00. If the interviewee worked during the current trimester and only one of the two prior trimesters, the seasonal weight is 1/2 or 0.50. Finally, if the interviewee worked during the current and both of the prior trimesters, the seasonal weight is 1/3 or 0.33.
This season weight is similar to the day weight in the sense that respondents who spend more time (seasons) working in agriculture have a greater chance of being sampled. Therefore, the weighting has to be inversely proportional to the number of seasons worked in order to account for the unequal sampling probability.
The region weight adjusts the relative weight of a region’s data in relation to the number of interviews collected in that region. If the number of interviews collected was smaller than the regional allocation in the sampling plan, an adjustment weight greater than one is assigned to each interview in the region, and vice versa. These adjustments ensure that the population estimates are unbiased.
The region weight is based on USDA FLS measures of regional farm employment activity. This is the best source of information available about farm workers. The USDA figures are reported by region and quarter, which allows the weight to be sensitive to seasonal fluctuations.
Correspondence between USDA data and the NAWS sampling cycles
The calculation of the region weight relies on two pieces of information: the USDA regional measures of size and the number of interviews completed in each region. The first step in the process of calculating the region weight is to apportion the USDA quarterly size figures among the three NAWS sampling cycles.
The USDA figures are reported quarterly. NAWS sampling years, however, cover non-overlapping 12-month periods (from September to August), which are divided into three cycles. Accordingly, it is necessary to adjust the USDA figures to fit the NAWS sampling frame by apportioning the four quarters into three cycles.
For example, the number of farm workers in the fall cycle for a region is assumed to be the total number of workers for that region in USDA quarter 1 of the current fiscal year (FYc) plus one‑third the number of workers for that region in USDA quarter 2 of the next fiscal year (FYp). The formula for the winter, spring and summer cycles is constructed similarly.
Determining the NAWS region grouping according to interview coverage
The region weight (within cycle) is calculated as follows for each region j ( ) in cycle i:
,
where USDAij is the USDA Farm Labor Survey estimate of the number of workers for region j in cycle i, Xij is the sum of the sampling weights for region j in cycle i, DWTSij is the sum of farm worker day weights for region j in cycle i. Also, (where k refers to a farm worker), so that if all farm workers in region j in cycle i are working full time and if all farm workers are working 1 day only a week in region j in cycle i.
The NAWS combines data from the different sampling cycles (seasons) within the same sampling year in order to generate more observations for statistical analysis. In order to combine cycles it is necessary to adjust for the number of workers represented in each cycle in relation to the number of interviews collected in the cycle. For instance, suppose the NAWS did not do proportional sampling as explained above but rather interviewed the same number of people in all three cycles in the 2007 fiscal year. If the USDA reported more workers for the fall and spring/summer cycles, as compared to the winter cycle, then the interviews in the fall and spring/summer would be worth relatively more in terms of size than the interviews conducted in the winter cycle. Accordingly, the interviews in the winter would have to be down-weighted in relation to the interviews in the other seasons (cycles) before the cycles could be combined.
The cycle weight is calculated similarly to the region weight, but at the cycle- rather than region- level. The sum of the USDA size for a cycle is divided by the number of interviews in that cycle. The cycle weight (or region weight within year) is calculated as follows for each region j , cycle i in year Y:
where
and
(k refers to a farm worker) and if the farm worker worked only one cycle during the year, so that if all farm workers for region j in cycle i worked full time and only one cycle in the corresponding year and .
The year weight allows different sampling years to be combined for statistical analysis. It follows the same rationale as the cycle weight, but at the sampling-year level. If the same number of interviews are collected in each sampling year, those interviews taking place in years with more farm work activity are weighted more heavily in the combined sample.
Sampling years cannot be combined if the interviews are not comparable in terms of agricultural representation. In an extreme case, suppose that the NAWS budget tripled one of the sampling years, consequently tripling the number of interviews. If the two sampling years were joined without adjustment, the larger sampling year would have an unduly large effect on the results.
To avoid this, the year weight is calculated as a ratio of the total number of farm workers in a sampling year to the number of interviews in that sampling year. The year weight (or region weight related to all years of interviews) is calculated as follow for each region j (1..nij), cycle i (the sum over i,j means all farm workers, all cycles all years):
with the same notations than for the preceding weights.
Once the individual weight components are calculated, final composite weights are calculated as the product of the day weight, the season weight, and region weight and the sampling weight. The cycle and year are also factored into the composite weights when multiple cycles or sampling years are used. The composite weights are adjusted so the sum of the weights is equal to the total number of interviews at the next higher level of stratification. These adjusted composite weights based on farm workers are then used for calculating the estimated proportion of workers with various attributes.
The individual observation weights are obtained at the farm worker level:
This is the weight within cycle; it includes an adjustment for the length of the workweek but no seasonal adjustment.
This is the weight within a year; it includes both the length of the workweek and seasonal adjustment. This weight may be used for the analysis of one particular year of interview.
The composite weight (PWTYCRD) is used for almost all NAWS analysis. This weight allows merging several years of analysis together. It is included in the public access dataset.
3. Statistical Reliability
a) Response
Employer response
To maximize employer response, the contractor sends an advance letter to employers and provides them a brochure explaining the survey. The letter is signed by the survey director and includes the names of the interviewers and their contact information. For further information or questions, the letter and brochure direct employers to contact either the survey contractors (JBS International) at a toll free number or the Department of Labor’s (DOL) Contracting Officer’s Technical Representative (COTR). Employer calls are returned quickly. In addition, and before the start of every interview cycle, JBS provides the COTR a list of scheduled interview trips. The list includes the counties and states where interviews will be conducted, the names of the interviewers who will be visiting the selected counties, and the dates the interviewers will be in the selected counties. The COTR refers to the list whenever he receives an employer call to confirm the interviewers’ association with the survey.
Both DOL and the contractor make presentations on the survey and provide survey information, e.g., questionnaires, to officials and organizations that work with agricultural employers. The NAWS has received the endorsement of several employer organizations. This improves the response rate since agricultural employers sometimes call their organization when considering survey participation.
Intensive and frequent interviewer training is also conducted as a means to increase employer response rates. Interviewers are trained in pitching the survey in various situations and, being well versed in the history, purposes, and use of the survey, are able to easily answer any questions or address any concerns an employer might have. In addition, when explaining the purpose of the survey to employers, interviewers clearly distinguish the survey from enforcement efforts by the Department of Homeland Security, DOL and other Federal agencies, and assure employers that their information is confidential.
Worker response
The survey’s methodology has been adapted to maximize response from this hard-to-survey population. Interviewers pitch workers in English or Spanish, as necessary. All interviewers are bilingual and bicultural. In addition, interviewers make sure that potential respondents know that they are not associated with any enforcement agency, e.g., Immigration and Customs Enforcement. Interviewers explain the survey to workers and obtain their informed consent.
b) Non-response
The $20 honorarium to farm workers enables the survey to achieve an estimated worker response rate of 92 percent. This high level of response greatly aids in protecting the survey estimates from non-response bias. To reduce employer non-response, interviewers are instructed to make several contact attempts at different times of the day and on different days of the week. Interviewer contact attempts are logged and the logs are monitored for compliance. When necessary, interviewers are instructed to accommodate an employer’s preference for scheduling surveys and, if needed, the interviewer can request an extension of the field period.
To measure the effect of employer non-response on the survey’s findings, the survey’s statistician, project manger, and COTR are exploring the possibility of using the minimal information known about and/or collected from the non-cooperating employers, e.g., primary crop, county, number of workers employed, and quarterly hired farm payroll to generate proxy employer types. If it is possible to construct such proxies, then the demographic characteristics of workers employed on farms of cooperating employers of a particular type will be analyzed to determine if there are significant differences in the key demographic and employment characteristics of workers from participating vs. proxy non-participating employers.
c) Reliability
A probability sampling methodology will be used and estimates of the sampling errors will be calculated from the survey data.
Estimation procedure
At the highest level of the sampling design, the region/cycle level, stratified sampling was used. Sampling is then carried out at the lower levels, independently within each stratum.
The following description is excerpted from Obenauf1:
The stratified sampling technique divides the entire population into relatively homogenous groups that are mutually exclusive and exhaustive. Samples are then drawn from each of these groups (strata) by simple random sampling or an alternate method. The entire sample is a compilation of these independent samples from each of the strata.
In stratified sampling, an estimate of the population mean can be made for each of the strata.
Estimate of population mean:
,
where Nk is the population size of stratum k and L is the number of strata into which the population is divided.
If a simple random sample is taken within each stratum (recall that other schemes can be used to draw a sample from each of the strata), the following represents an unbiased estimate of the variance of :
.
The standard error of the estimator is the square root of this estimated variance, or
.
At the second stage of the sampling design, within each stratum, counties (or groups of counties) are treated as clusters.
The following description is another excerpt from Obenauf1.
The population is again divided into exhaustive, mutually exclusive subgroups and samples are taken according to this grouping. Once the population has been appropriately divided into clusters, one or more clusters are selected … to comprise the sample. There are several methods of estimating the population mean for a cluster sample. The method most pertinent to this study is that involving cluster sampling proportional to size (PPS).
With PPS sampling, the probability (zj) that a cluster j is chosen on a specific draw is given by , where Mj is the size of the jth cluster and M is the population size. An unbiased estimate of the population total is given by
,
where yj is the sample total for y in the jth cluster, n is the number of clusters in the sample and represents the average of the cluster means.
To estimate the population mean, this estimate must be divided by M, the population size.
The variance of the estimator of the population total is given by
,
This is estimated by , where is the sample variance of the values.
For an estimate of the population mean,
and .
In two-stage cluster sampling, the estimated variance of the estimator is then given by an iterative formula:
.
This iterative formula is then generalized to compute the variance of the estimators in multi-stage sampling schemes with three or more levels. Exact formulas become intractable at this point, and the various statistical software packages rely upon either re-sampling methodology or linear approximations in order to estimate the variances and standard errors of the estimators.
The following is an excerpt from the SAS documentation for PROC SURVEYMEANS2.
The SURVEYMEANS procedure produces estimates of survey population means and totals from sample survey data. The procedure also produces variance estimates, confidence limits, and other descriptive statistics. When computing these estimates, the procedure takes into account the sample design used to select the survey sample. The sample design can be a complex survey sample design with stratification, clustering, and unequal weighting.
PROC SURVEYMEANS uses the Taylor expansion method to estimate sampling errors of estimators based on complex sample designs. This method obtains a linear approximation for the estimator and then uses the variance estimate for this approximation to estimate the variance of the estimate itself (Woodruff 1971, Fuller 1975)3,4.
SAS (e.g., Proc Surveymeans), allows the user to specify the details of the first two stages of a complex sampling plan. In the present case, the stratification and clustering at the first two levels are specified in Proc Surveymeans (strata region; cluster FLA). At the lower levels of the sampling scheme, the design attempts to mimic, as closely as is practical, simple random sampling. The software is not able to calculate exact standard errors, since it presumes true simple random sampling beyond the first two levels. The sampling weights will remedy any differences in selection probabilities, so that the estimators will be unbiased. The standard errors, however, are only approximate; the within-cluster variances at stages beyond the first two are assumed to be negligible.
In the “Surveymeans” procedure, the STRATA, CLUSTER, and WEIGHT statements are used to specify the variables containing the stratum identifiers, the cluster identifiers, and the variable containing the individual weights.
For the NAWS, the STRATA are defined as the cycle/region combinations used for the first level of sampling. The CLUSTER statement contains the primary sampling unit, which is the FLA. The variable for FLA is county_cluster.
The WEIGHT statement references a variable that is for each observation i, the product of both the sampling weight Wti and the non-response weight PWTYCRDi. This variable is called pwtycrd for historic reasons.
The Surveymeans procedure also allows for a finite population correction. This option is selected using the TOTAL option on the PROC statement. The total statement allows for the inclusion of the total number of PSUs in each strata. SAS then determines the number of PSUs selected per region from the data and then calculates the sampling rate. In cases such as the NAWS where the sampling rate is different for each strata, the TOTAL option includes a reference to a data set that contains information on all the strata and a variable _TOTAL_ that contains the total number of PSUs in that strata.
We include here sample code for Proc Surveymeans to calculate the standard errors for our key estimator WAGET1.
.
proc surveymeans data=naws.crtdvars total=naws.regioninfo;
strata dmaregn cycle;
cluster county_cluster;
var waget1;
weight pwtycrd;
Precision of key estimators
Two of the many variables of interest are FWRDAYS, which is the number of days worked per year by a respondent, and WAGET1, which is the average hourly wage of a respondent.
Based on data collected in 2005, and applying the weights that were revised as part of the 2009 independent evaluation (see below under Statistical Consultation), the 2-standard-error confidence interval for the first variable, FWRDAYS, was 183 ± 8.5. That is, with approximately 95% confidence, the average number of days annually worked, per person, lies between 174.5 and 191.5. This constitutes a margin of error of ±4.6% of the estimated value.
For the second variable, the average wage (WAGET1), the interval is 7.95 ± 0.21. With approximately 95% confidence, the average wage lies between $7.74 and $8.16. This yields a margin of error of ±2.6% of the estimated value.
There are numerous other variables of interest, whose standard errors vary greatly. These two are offered as examples that show some of the range of possible precisions obtained.
4. Tests
The questionnaire to be used in the survey was developed by the DOL with input from various Federal agencies. Apart from adding the Environmental Protection Agency-sponsored questions on the amount of time per day a workers is engaged in a particular crop and task, and on clothes laundering and personal hygiene practices, the questionnaire will be unchanged from the version that OMB approved in the last submission. The majority of the questions have been used for over twenty years, are well understood by the sampled respondents, and the data they provide are of high quality.
5. Statistical Consultation
The following individuals have been consulted on statistical aspects of the survey design:
Stephen Reder and Robert Fountain, Professors, Portland State University, (503) 725-3999 and 503-725-5204; Phillip Martin, Professor, University of California at Davis (916) 752-1530; Jeff Perloff, Professor, University of California at Berkeley (510) 642-9574; and John Eltinge, the Bureau of Labor Statistics (BLS) (202) 691-7404.
In 2009, Mathematica Policy Research, Inc. researchers Daniel Kasprzyk, Ph.D., Frank Potter, Ph.D., and Steve Williams ((609) -799-3535) evaluated the equations for the survey’s sampling weights and the impact of the weights on key national-level findings.
The data will be collected under contract to the ETA by JBS International, Aguirre Division (650) 373-4900. Analysis of the data will be conducted by Daniel Carroll, ETA (202) 693-2795, and by JBS International, Aguirre Division.
Appendix B: Contacting and Selecting Farm Workers
A FARM WORKER QUALIFIES TO PARTICIPATE IN THE NAWS (ELIGIBLE), IF HE/SHE …
WORKS IN any type of crop agriculture in the United States. This includes “crops” produced in nurseries.
WORKS IN the production of plants or flowers (including work done in nurseries like planting, cultivating, fertilizing, grafting and seeding).
has worked in the last 15 days, at least 4 hours per day, for the contacted employer, and meets any of the criteria mentioned above.
A WORKER CANNOT PARTICIPATE IN THE NAWS (INELIGIBLE) IF HE or SHE:
Was interviewed by NAWS within the last 12 months in the same location.
Is an “H-2A worker.” H-2A is a program similar to the “braceros”. An H-2A worker is a foreigner who is in the United States on a temporary work visa to work for a specific agricultural employer or association of agricultural employers for a specific period of time (less than a year). At the end of the period, the worker returns to his/her respective country.
Works exclusively with livestock (animals: such as bees, horses, fishes, pigs, cows, etc).
Hasn’t worked for the contacted employer at least one day for 4 hours or more in the last 15 days.
Does “non-farm work” for the employer (mechanic, sales, office, etc).
Is a family member of the employer and doesn’t draw a salary like other farm workers.
Is the employer or contractor.
Is a sharecropper that makes all operational decisions such as when, where and how to plant, harvest, etc.
Works for a packing house or cannery (packing or canning agricultural products) outside of the ranch. Note: Workers who are packers or caners can be eligible for the NAWS study if they satisfy the following two requisites:
the canning or packing plant is adjacent or located on the farm, AND
at least 50 percent of the produce being packed or canned originated from the ranch of the contacted employer.
Works for a landscaping company that just sells, installs, maintains or preserve trees or plants; this includes the planting of ornamental plants and placement of sod.
Whenever a worker doesn’t qualify to participate, be gracious and thank him/her for their time and proceed to the next worker.
NUMBER OF INTERVIEWS PER EMPLOYER
The Employer Lists indicates the total number of interviews allocated for your assigned county. NEVER can the total county allocation be completed by interviewing workers from one single employer. If this appears likely to happen, call the office for instructions.
Refer to the table below, and find the number of interviews per employer based on the number of workers at the employer on the day visited
Number of workers |
Number of Interviews |
1 - 2 |
1 |
3 - 6 |
2 |
7 to 12 |
3 |
13-20 |
4 |
21-30 |
5 |
31-42 |
6 |
43-56 |
7 |
57-72 |
8 |
73-90 |
9 |
91-110 |
10 |
111-132 |
11 |
133 or more |
12 |
Note: Sample the allocated number of workers at the employer (interviewing those that agree to participate) and if the county allocation is not complete, continue onto the next employer. At the last employer complete the number of interviews allocated to that employer on the chart – EVEN IF YOU EXCEED THE COUNTY ALLOCATION.
LOCATING THE WORKERS
Once you get permission from the employer (and you have documented the number of employed workers) ask the employer where you can find the workers. If they are in different locations ask the employer: “how many workers are in each location?” Also ask the employer (or supervisor assigned by employer) for the best time and location to meet with them.
WORKERS’ LOCATIONS
The best time to contact workers
Unless the employer gives you permission to speak with his/her employees during working hours, do not make any contacts or appointments or try to interview the workers during their work hours.
Changing work locations
Once the employer gives you permission to contact the workers, try to complete your contacts and interviews on the same day the employer gave you permission. You should be aware that from day to day it is common to find that workers in the field change location; and new workers can be in the same field on a different day.
The location of the field is not in the assigned county
If the location of the field or operation of the farm is located outside of the designated county, you cannot interview those workers. The farm workers must be physically working in the NAWS assigned county for the particular cycle. That is, it is not unusual that the same employer may have farm land and workers in two different counties.
HOW TO CHOSE ELIGIBLE WORKERS FOR THE STUDY
Selecting workers located in different areas
If the employer informs you that his employees are distributed over more than one fields/crew (in the same county), do the following. Use the table below to identify the number of crews and then randomly select the crews.
Number of crews |
Number to select randomly |
1 to 2 |
1 |
3 to 6 |
2 |
7 or more |
3 |
Once you have selected the crews, use the proportional formula, below, to calculate how many from each field/crew you need to interview. The same proportional formula should be used if you locate workers in different residencies. For example, if the workers live in two different labor camps or housing then find out how many live in each dwelling and calculate proportionately how many you should interview from each dwelling.
Proportional selection of workers
When you find that workers are divided into different areas, randomly sampling from each group will be necessary to maintain equal likelihood of selection for everyone. The following formula serves as a guide to calculate the number of workers that should be selected when you find that workers are divided into different areas. In this example, there are 3 sampled fields and you are allowed to conduct 12 interviews for this employer.
a |
b |
c |
Number of workers per location |
Number of workers per location ÷ Total of workers |
%X# total of interviews = 12 |
Field A = 20 |
20 ÷ 30 = 66.6% |
.666 x 12 = 08 interviews |
Field B = 05 |
05 ÷ 30 = 16.6% |
.166 x 12 = 02 interviews |
Field C = 05 |
05 ÷ 30 = 16.6% |
.166 x 12 = 02 interviews |
Workers total = 30 |
|
Total = 12 interviews |
Random Selection
As a sample of workers from a employer is needed, the workers are to be chosen at random. All eligible workers of the employer must have an equal chance of being chosen. Everyone has a chance when selecting crews. Then everyone in the selected crews must have an equal chance of selection. The following are the instructions provided to interviewers:
Random Sampling Instructions for NAWS sampled worksites
Before you go to the site, make sure you have:
A set of tags with colored stickers on them (at least 12 for each site you expect to visit)
A set of tags with no stickers (at least 50 for each site you expect to visit)
A bag (or some other dark container to use to hand out the tags, so that workers can pull the tags without seeing what they’re getting)
Sufficient supplies to carry out surveys with the workers that are selected
A Sampling Tracking Sheet for each site you expect to visit
Once you have gotten permission from the employer to interview, identify the number of workers on site for that day. Record that number in Line 1 on the Sampling Tracking Sheet.
NOTE-If the number of workers on the site is less than or equal to the cluster, skip the sampling process and ask all workers to complete the interview. Record the number of workers asked to interview on Line 6 of the Sampling Tracking Sheet and the number completing interviews on Line 7. Leave lines 2-5 blank.
NOTE-for any of these approaches, if any sampled workers refuse the interview- DO NOT REPLACE THEM- move on to the next employer if additional interviews are needed to complete the cluster allocation.
Use the chart above to determine the correct number of interviews to be done; this will be the same number of stickered tags to put into the bag: Record the number of stickered tags you put in the bag on Line 2 on the Sampling Tracking Sheet.
Next, put enough tags without stickers into the bag so that the total number of tags in the bag equals the number of workers at the site. (For example, if there are 20 workers at the site, and you put 5 stickered tags in the bag, then add another 15 tags.) Record the number of unstickered tags you put in the bag on Line 3 on the Sampling Tracking Sheet.
One interviewer will go around to each worker and have them pull a tag from the bag, while the other speaks to the group.
At the end of the introduction, the speaker will ask everyone to look at their tags, and ask those who have stickers to come up. Record the number of workers who come up to you with stickered tags, who you ask for an interview on Line 6 on the Sampling Tracking Sheet.
Carry out the interviews and record the completed number of interviews on Line 7 of the Sampling Tracking Sheet.
Continue, using the same bag, until you’ve talked to all workers in the group.
When you have time, count the number of tags left in the bag (if any) and record this number on Line 4 in the Sampling Tracking Sheet. Count the number of stickered tags left in the bag (if any) and record this number in Line 5 in the Sampling Tracking Sheet.
Sample Tracking Sheet
County Name_______________________ Date Visited_______
Employer/Farm name_________________________ Employer ID______
Allocation (circle 1) 5 8 10 12
Line |
Number of: |
|
|
Workers (from employer) |
|
|
Stickered tags put in bag(s) |
|
|
Unstickered tags put in bag(s) (lines 2+3 should equal line 1) |
|
|
Tags left in bag(s) at end (after all groups/after all workers have been offered a tag) |
|
|
Stickered tags left |
|
|
Workers asked for interview (“contacted” in current system) |
|
|
Workers completing interview |
|
Were there more than one crew: ___ YES ___NO
If yes:
How many crews: _____ How many in each crew (list):
_________________________________________
From how many crews did you “randomly” select workers (list):
____
REFERENCES
Fuller, W.A. (1975), “Regression Analysis for Sample Survey,” Sankhyā, 37, Series C, Pt. 3, 117-132.
Obenauf, W. An Application of Sampling Theory to a Large Federal Survey. Portland State University, Department of Mathematics and Statistics. 2003.
SAS Institute Inc., SAS/STAT® User’s Guide, Version 8, Cary, NC: SAS Institute Inc., 1999, 61, 3.
Woodruff, R. S. (1971), “A Simple Method for Approximating the Variance of a Complicated Estimate,” Journal of the American Statistical Association, 66, 411–414.
1 Obenauf, W. (2003), “An Application of Sampling Theory to a Large Federal Survey,” Portland State University Department of Mathematics and Statistics.
2 SAS Institute Inc., SAS/STAT® User’s Guide, Version 8, Cary, NC: SAS Institute Inc., 1999, 61, 3.
3 Woodruff, R. S. (1971), “A Simple Method for Approximating the Variance of a Complicated Estimate,” Journal of the American Statistical Association, 66, 411–414.
4 Fuller, W. A. (1975), “Regression Analysis for Sample Survey,” Sankhyā, 37, Series C, Pt. 3, 117–132.
File Type | application/vnd.openxmlformats-officedocument.wordprocessingml.document |
Author | carroll.daniel.j |
File Modified | 0000-00-00 |
File Created | 2021-01-28 |