Data and Methods
Data and Methods
We examine the contiguous United States due to its significance as a major economy with diverse economic activities and well-documented geographies and spatial patterns. The primary data source for this study is the County Business Patterns (CBP) datasets, produced by the Bureau of Labor and Statistics (BLS). These datasets provide detailed information on variables such as:
- Average annual employment
- Number of establishments
- Total annual wages
These variables are disaggregated across more than 3,200 counties and 300 NAICS 4-digit industries.
Certain activities, which are dependent on administrative decisions and vary by state conventions, are excluded from the analysis. These activities typically involve non-productive sectors. Some examples include:
- NAICS 2213: Water, sewage, and other systems
- NAICS 4854: School and employee bus transportation
- NAICS 4911: Postal service
- NAICS 6111: Elementary and secondary schools
- NAICS 6113: Colleges and universities
- NAICS 6241: Individual and family services
- NAICS 7132: Gambling industries
- NAICS 8131: Religious organizations
- NAICS 8141: Private households
- NAICS 9211: Executive, legislative, and general government
- NAICS 9221: Justice, public order, and safety activities
- NAICS 9231: Administration of human resource programs
- NAICS 9241: Administration of environmental programs
- NAICS 9261: Administration of economic programs
Review of the Formal Framework
The design and interpretation of activity classifications, along with the researchers' transformation methods, can significantly influence study outcomes. This introduces a gap between actual phenomena and the data used in analyses, such as regressions. While many studies focus on answering specific questions, they often overlook the methodological steps that can affect comparability of results.
In the following sections, we will review the formalisms that allow us to view various methods as variants of a single similarity approach (see Table: Review). We will also examine the data processing choices and particularities that can make datasets from different studies inherently distinct.