How can I access the questionnaires used for specific country surveys? Can I access the raw data?
Registered users can download the data and corresponding survey questionnaire for each country in the Enterprise Surveys Data Portal. Both registration and downloading data are free.
Which aggregation level do you use for the indicators on your website?
Indicators are created using weighted (weight=w_median) data. For each country for a particular survey question, the indicator is created using the weighted average (or the weighted percentage of firms that responded 'Yes') across firms. Some website indicators are based on the combination of two survey questions. The website also allows users to view indicators by strata variables (firm size, geographic location within a country) and also by a few ex-post variables (exporter status, foreign ownership, gender of top manager). Some indicators are based on questions only asked of manufacturing firms. The Methodology page has a listing of all website indicators. Note that when indicators are presented on the website for a broad geographic region or income group (e.g. Africa region or "Upper Middle Income"), a simple average is computed using the relevant, available country-level indicators.
What types of questions are asked in the surveys?
The Methodology page has the most recent versions of the global questionnaires. The Enterprise Survey covers a wide range of business environment topics including general business characteristics, infrastructure and services, sales and supplies, access to finance, degree of competition, land, crime, business-government relations, investment climate constraints, labor, and productivity. There are manufacturing-specific questions as well as a few retail-specific questions. In collaboration with economists in the regional departments of the World Bank, every Enterprise Survey is customized to include country-specific questions (or region-specific questions). The questions are mostly objective questions aimed at measuring the quality of the business environment and the experience of firms. Less than 10% of the questions are subjective, that is asking the respondent for his/her opinion. The question answers are mostly the following types: yes/no, a percentage or monetary amount, days required to obtain a service, number of times a particular event has occurred, or a 5-point Likert scale.
What is the difference between the country data and the standardized data?
Country data includes all questions that were asked in a survey but may lack comparability across countries and years. Standardized data is country data that has been matched to a standard set of questions. This format allows cross-country comparisons and analysis but sacrifices those country-specific survey questions which cannot be matched. The standardization process requires that certain compromises are made in order to match some of the variables. Thus, we encourage our users to pay close attention to the actual wording of the survey questions and to use the raw country datasets for their analysis.
What monetary values and units are used in the surveys?
For most countries, monetary values are reported in local currency units. When downloading raw data from the portal, data users are encouraged to download the accompanying survey questionnaires and documentation which provide the exact wording of each survey question. No adjustments are made to inflate/deflate the reported monetary values for costs and sales figures.
I am interested in occupational safety and health-related questions. Why are they not included in your surveys?
Unfortunately, some questions are not included in our questionnaire because it is already quite lengthy in size. The current set of questions takes about an hour to conduct, and adding new questions may increase both item and unit non-response. If you have suggestions for improving existing questions or adding new questions to the survey instrument, please send us an email with your suggestions and we will consider them for potential inclusion in future surveys.
In the most recent indicator list, the indicator "% of Women in Senior Positions", is not included. But it was in the old indicator list. Has this indicator been replaced?
The indicator '% of Women in Senior Positions" has been replaced by the indicator "% of Female Permanent Full-Time Non-Production Workers". We have also added another new indicator: "% of Firms with Female Top Manager" (question B.7a).
What is the correct way to use the weights in the full data?
The weights in the more recent Enterprise Surveys data are probability weights. Using these weights allows inference on the population of non-agricultural private firms (that meet the Enterprise Surveys eligibility criteria) in a country. In Stata, a survey design should be declared before performing any analysis. Specifically, this command should be used: svyset idstd [pweight=wt], strata(strata) singleunit(scaled). The survey commands using ‘svy’ should be used in calculating any statistics to be interpreted for the population of non-agricultural private firms. For statistics related to specific types of firms, analysts should use the subpopulation option in Stata.
Do you provide a correspondence table between the question numbers in some countries' questionnaires (e.g. A0, A1, B1, B2…) and the variables in Stata data files (e.g. idstd, a0, a1, a2…)?
We do not provide such correspondence tables but they can be made by matching the questionnaires available on the Portal by hand. There are however standardized datasets spanning 2002-2005 and 2005-Present on the Data Portal that contain a core set of matched variables.
How is a firm's business activity classified? Are standard industry/sector codes used?
Most of the new surveys conducted after 2006 contain ISIC Rev. 3.1 industry code, which is a 4 digit code used to describe a firm's business activity (question D.1a2). Question D.1a1 contains the text description of the business' main product. Please see the raw data available on the Enterprise Surveys Data Portal for details.
Where can I find the labels for region variables?
When the sampling methodology calls for stratification by geographic location (within a country), the region labels can be found in the dataset itself or the questionnaire instrument will list the geographic locations along with the coding scheme. The implementation report will describe how many interviews were conducted in the various geographic locations.
How is "principal ownership" defined in the Enterprise Surveys?
The term principal owner is only used in the older surveys when asking about female participation in ownership. Newer surveys use a slightly different wording to avoid ambiguity: "Amongst the owners of the firm, are there any females?"
I am having trouble opening the data even though I have Stata. Why might this be?
You may be using an older version of Stata (we are currently using Stata/SE v.13). Also sufficient memory for Stata is required to open some of the large datasets. Note that Stata can not open a zipped file without it first being extracted to another location. If you still have problems, try re-downloading the data in case it was corrupted during the download.
What is the format of the data and can it be converted into other formats?
The data are in Stata 13 format and may be converted into other formats such as SPSS, SAS or Access using a translation program (e.g., Stat/Transfer or DBMS). However, some of the data attributes (e.g., labels) might be lost in some of the non-native formats.
What is the unit of measurement for the indicator "Average total time of power outages per month"? What is its relationship with "Average duration of power outages (hours)"?
The indicator "average total time of power outages per month" is the average total hours of being without power during a month. It is calculated by multiplying (at the firm level) the number of outages by the duration of the average outage. Then, an average is computed at the country level using this new measure. The "average duration of power outages (hours)" is the average duration of one power outage.
Why are some variables not available in the standardized data sets?
Due to the natural evolution of questionnaire design and survey methodology over time, some variables may not be available in certain countries. Datasets that contain more than one country (across different regions) will likely lack the variables from country-specific questions.
Why are the summary statistics results I calculated different from what you presented on your website?
The indicators shown on our website are computed using sampling weights. The sampling weights are available in most datasets. Some datasets have more than one weight calculated based on different assumptions about the data. Whenever this is the case, we use the Median weights. In order to use sample weights you need to use the survey commands in Stata as demonstrated below:
svyset idstd [pweight=wmedian], strata(strata) singleunit(scaled)
svy: tab k8 if a1==79
Note that the weight variable may be called 'wt', 'weight', 'w_median', etc., depending on the country and dataset. Another way to get to the same result is shown below. Note that many functions do not support p-weights but you can often 'cheat' by using the i-weights specification instead:
tab k8 if a1==79 [iw=wmedian]
Note that for cross-country comparability, we remove "outliers", values that are +/- 3 SDs for some of the continuous, unbounded variables. This may result in website summary statistics being different from what you compute on your own. For more information see the Methodology page.