Skip Navigation

HCUP Sample Design: National Databases - Accessible Version

HCUP Sample Design: National Databases - Accessible Version


Contents:

 

Welcome

Thank you for joining us for this Healthcare Cost and Utilization Project or HCUP online tutorial on Sample Design of National Databases. This tutorial was created for researchers who are using HCUP national databases, and who have some background in basic research methods.

In order for you to create accurate and unbiased estimates in your research, it is essential for you to understand the sampling methods of HCUP national databases.

In this tutorial, you'll learn about three national HCUP databases created from the state-level HCUP databases. The three national databases are: the National Inpatient Sample (NIS); the Nationwide Emergency Department Sample (NEDS); and the Kids' Inpatient Database (KID). The Nationwide Readmissions Database (NRD), is another national database, but it won't be covered in this tutorial. To learn more about the NRD please visit the NRD tutorial on the HCUP Online Tutorial Series page.

Because the HCUP national databases each serve a different purpose, each one is designed slightly differently. Understanding these differences and how they impact your research is critical to ensure your data estimates are accurate and unbiased, and that you draw sound conclusions. This course will take approximately 30 minutes to complete.

Return to Contents

 

About

Before we get started, a quick word about HCUP:

HCUP is sponsored by the Agency for Healthcare Research and Quality or AHRQ. HCUP is a family of databases, software tools, and related research products that enable research on a variety of healthcare topics.

If you are unfamiliar with HCUP or would like a refresher, please consider taking our general Overview Course.

Return to Contents

 

Learning Objectives

The underlying goal of this module is to ensure that you select the best HCUP databases for your research. One important factor in doing so, is understanding the sample designs of the national databases. By the end of this module, you will:

  • Understand the sample designs for three of the HCUP national databases.
  • Understand how the sample designs can influence which database is best for your research.
  • Avoid common errors that result from misunderstanding of sample design.

Return to Contents

 

Key Terms

Because this module is focused on sample design, there are a few key terms that are helpful to review. We will be tying this information to the HCUP database design, later in the module.

  • Target Universe: The Target Universe is all people or entities such as hospitals or emergency departments that we wish to understand.
  • Sample Frame: The Sample Frame is a subset of the target universe that we will study to make inferences about the target universe.
  • Sample Strata: The Sample Strata are relatively homogeneous groups from the sample frame. A sample is selected from each stratum.
  • Sample Unit: The Sample Unit is the level at which we sample within each strata, such as at the discharge level or hospital level.

Return to Contents

 

HCUP Databases

We have state and national-level databases, that reflect inpatient, emergency department, and ambulatory surgery care. The national databases are derived from our state-level databases. In other words, the state-level databases serve as the sample frame for the national databases: the NIS, the NEDS, and the KID.

The State Inpatient Databases (SID), are a set of databases that include all inpatient hospital discharges from community hospitals in participating states.

The State Emergency Department Databases (SEDD), are a set of databases that contain all treat-and-release hospital emergency department visits from community hospitals in participating states.

The The State Ambulatory Surgery and Services Databases, or SASD, are a set of databases that include encounter-level data for ambulatory surgery and other outpatient services from hospital-owned facilities in participating states. In addition, some States provide ambulatory surgery and outpatient services from nonhospital-owned facilities.

Return to Contents

 

Summary

In summary, HCUP has seven types of databases that cover inpatient, emergency department, and ambulatory surgery data at state, regional, and or national levels.

The next three sections will focus on the design of three of HCUP's national-level databases:

The NIS, the NEDS, and the KID. To learn more about the NRD, refer to the NRD tutorial on the HCUP Online Tutorials page.

Return to Contents

 

National Inpatient Sample (NIS)

The National Inpatient Sample (NIS), is a unique and powerful database of hospital inpatient stays. Researchers and policymakers use the NIS to identify, track, and analyze national and regional trends in hospital utilization, access, charges, and quality.

The NIS contains annual data from 1988 forward.

The NIS is a sample of discharges from the State Inpatient Databases (SID).

Return to Contents

 

NIS Sample

  • NIS Target Universe The Target Universe for the NIS, is all US community hospital discharges. We define the target universe from the American Hospital Association or AHA Annual Survey of Hospitals.
  • NIS Sample Frame The sampling frame for the NIS is the State Inpatient Databases, which includes over 95% of the target universe.
  • NIS Sample Strata The strata used in creating the NIS, are U.S. census division, urban or rural location, teaching status, ownership, and bed size. These strata will be described in more detail later in this tutorial. Please note that because the NIS sample is not designed with "state" as a stratification variable, state-level analyses cannot be conducted. If you are interested in analyses by state, you should use the State-specific SID.
  • NIS Sample Unit The NIS sample unit is a systematic random sample of discharges stratified by hospital characteristics drawn from all HCUP-participating hospitals. This sample includes approximately 20% of discharges from US community hospitals. This type of sampling design is referred to as a stratified systematic random sample. A stratified random sample of discharges is systematically drawn from a list of discharges sorted on discharge characteristics such as DRG and admission month. This ensures a more representative sample of discharges than a simple random sample would yield.

Return to Contents

 

NIS prior to 2012

Starting with 2012 data year, the NIS was redesigned to improve national estimates.

To highlight the design changes, AHRQ renamed the "Nationwide Inpatient Sample (NIS)" to the "National Inpatient Sample (NIS)".

Previously, the NIS sample design was defined by: The target universe was all US community hospitals; The sample frame remained constant as the SID but included only 90% of the target universe; The sample strata was US region, and remaining , constant are the urban and rural location, teaching status, ownership, and bed size strata's; The sample unit was a 20% stratified sample of hospitals, 100% of the discharges from each of the sampled hospitals included in the NIS. This was a single-stage cluster sample.

Return to Contents

 

NIS Strata

In creating the National Inpatient Sample, the first step was to stratify the SID hospitals according to five strata: census division, location, teaching status, ownership, and bed size.

  • 9 Census Division The Census Division is New England, Middle Atlantic, East North Central, West North Central, South Atlantic, East South Central, West South Central, Mountain, and Pacific. Practice patterns have been shown to vary substantially by census division. Divisions are defined by the US Census Bureau.


  • Location The Location is urban or rural. Government payment policies often differ according to this designation. Also, rural hospitals are generally smaller and offer fewer services than urban hospitals. The classification of urban or rural hospital location is based on Core Based Statistical Area (CBSA) codes. Hospitals with a CBSA type of Metropolitan or Division are classified as urban, while hospitals with a CBSA type of Micropolitan or Rural are classified as rural.


  • Teaching Status The Teaching Status is teaching or non-teaching. The missions of teaching hospitals differ from non-teaching hospitals. In addition, financial considerations differ between these two hospital groups. A hospital is considered a teaching hospital if it meets any one of the following three criteria: Residency training approval by the Accreditation Council for Graduate Medical Education (ACGME); Membership in the Council of Teaching Hospitals (COTH); A ratio of full-time equivalent interns and residents to beds of .25 or higher.


  • Ownership Ownership is Government non-federal or public, private not-for-profit or voluntary, or private investor-owned or proprietary. Depending on their control, hospitals tend to have different missions and different responses to government regulations and policies. When there are enough hospitals of each type to allow it, the NIS stratifies hospitals as public, voluntary, and proprietary. When necessary, ownership strata are collapsed to ensure a minimum of two hospitals in each stratum.


  • Bed Size Hospital size or bed size is small, medium, or large. Bed size categories were based on the number of hospital beds and were specific to the hospital's region, location, and teaching status. About one-third of the hospitals in a given region, location, and teaching status combination fall within each bed size category (small, medium, or large). The NIS uses different cutoff points for rural, urban non-teaching, and urban teaching hospitals because hospitals in those categories tend to be small, medium, and large, respectively.

Return to Contents

 

NIS Weight Variable

To produce national or regional estimates, the HCUP databases provide a "weight" variable that you can apply to your data. If you're interested in learning more about the weighting on the National Inpatient Sample (NIS), please access the NIS Trend Weights Files for more details.

Return to Contents

 

Sample Design Changes Over Time

Now that you understand the NIS sample design, you should know that revisions have been made to the NIS sample design that could affect estimates calculated from the NIS.

You should always check the NIS online documentation on the HCUP User Support Web site, before starting your research project.

Over time there have been changes to the NIS. States have been added to the sampling frame. In 1988, the NIS was based on 8 states. The more recent years of the NIS have 40+ states. There were important sample design changes in 1998. The NIS excluded short term rehabilitation hospitals from frame, changed the definition of discharges from total discharges to hospital discharges, discontinued the preference for NIS hospitals that were in the sample in prior years, and redefined the hospital stratification variables for sampling.

There were also important design changes in 2012. The 2012 NIS excluded long-term acute care hospitals from the sampling frame, improved the estimates of discharges in the universe, used State hospital identifiers rather than AHA hospital identifiers, and drew a sample of discharges from all hospitals in the sampling frame, rather than draw all discharges from a sample of hospitals.

The sample designs are refined over time in other databases as well. There is useful documentation on the HCUP User Support Web site that details how you can account for these sample design changes.

Return to Contents

 

NIS Summary

In this section you have learned the following information about the NIS database:

  • The NIS is constructed from the State Inpatient Databases (SID)
  • The NIS is a stratified systematic sample of discharges
  • The NIS cannot be used to conduct state-level analyses. On the left, you can see a summary of the target universe, sample frame, sample strata, and sample unit for the NIS.

Return to Contents

 

Nationwide Emergency Department Sample (NEDS)

The Nationwide Emergency Department Sample (NEDS), is a unique and powerful database of emergency department visits. Researchers and policymakers use the NEDS to identify, track, and analyze national and regional hospital emergency department: care, utilization, access, charges, and quality. The NEDS contains annual data from 2006 forward.

Return to Contents

 

NEDS Sample

The NEDS is a stratified sample of hospitals from the State Emergency Department Databases (SEDD), and the State Inpatient Databases (SID).

  • NEDS Target Universe The target universe for the NEDS is all US community hospital-owned emergency departments. We define the target universe from the American Hospital Association or AHA Annual Survey of Hospitals.
  • NEDS Sample Frame Both the State Emergency Department Databases, the SEDD, and the State Inpatient Databases, the SID, are used to construct the NEDS or, in other words, they are the frame for the NEDS. The SEDD provides data on treat-and-release emergency department visits, which account for more than 80% of all emergency department visits. The SID provides data on the emergency department visits that resulted in an inpatient admission. The NEDS includes data on care that began in the emergency department regardless of whether the patient was treated and released or admitted to the hospital.
  • NEDS Sample Strata The strata used in creating the NEDS are US region, urban or rural location, teaching status, ownership, and trauma-level. These strata will be described in more detail later in this tutorial. As in the NIS sample design, "state" is not included as a stratum; therefore state-level analyses cannot be conducted. If you are interested in analyses by state, you should use the state-specific SID or SEDD.
  • NEDS Sample Unit Once the hospital-owned emergency departments have been stratified, a sample that approximates a 20% stratified sample of US hospital-owned emergency departments (the target universe) is constructed. 100% of all emergency department visits from the selected hospitals are included in the NEDS. This type of sampling design is referred to as a stratified, single-stage cluster sample. A stratified random sample of hospitals or clusters is drawn, and then all discharges are included from each selected hospital.

Return to Contents

 

NEDS Strata

The NEDS is stratified, single-stage cluster sample. The NEDS is constructed by categorizing hospitals according to five strata. The strata include geographic region, location, teaching status, ownership, and trauma-level designation.

  • Geographic Regions The Geographic Region is Northeast, Midwest, West, and South as defined by the U.S Census Bureau. Practrice patterns have been shown to vary substantially by region.
  • Location The Location is urban or rural. Government payment policies often differ according to this designation. Also, rural hospitals are generally smaller and offer fewer services than urban hospitals. The classification of urban or rural hospital location is based on the County in which the hospital is located and the categorization determined by the Urban Influence Codes (UIC).
  • Teaching Status The Teaching Status is teaching or non-teaching. The missions of teaching hospitals differ from non-teaching hospitals. In addition, financial considerations differ between these two hospital groups. A hospital is considered a teaching hospital if it meets any one of the following three criteria:
    • Residency training approval by the American Medical Association (AMA)
    • Membership in the Council of Teaching Hospitals (COTH)
    • A ratio of full-time equivalent interns and residents to beds of .25 or higher
  • Ownership Ownership is Government non-federal or public, private not-for-profit or voluntary, or private investor-owned or proprietary. Depending on their control, hospitals tend to have different missions and different responses to government regulations and policies. When there are enough hospitals of each type to allow it, the NEDS stratifies hospitals as public, voluntary, and proprietary. For smaller strata the NEDS uses a collapsed stratification of public versus private, with the voluntary and proprietary hospitals combined to form a single "private" category. For all other combinations of region, location, and teaching status, no stratification based on control is advisable, given the number of hospitals in these cells.
  • Trauma-Level Trauma-level designation is a modified version of the Trauma Information Exchange Program, or TIEP, trauma-level designation. A trauma center is a hospital equipped to provide comprehensive emergency medical services to patients suffering traumatic injuries 24 hours a day, 365 days per year. The NEDS distinguishes between Trauma Levels one, two, and three. Trauma designation is made by a state or local authority or verified by the American College of Surgeons: Level I: Full range of specialists/equipment 24 hours a day, has surgical residency program, has program of research, referral resource for communities in nearby regions; Level II: Comprehensive trauma care in collaboration with Level I center, essential specialties/equipment available 24 hours a day, not required to have teaching and research; Level III: Resources for resuscitation, surgery and intensive care but not full availability of specialists, transfer agreements with Level I and II centers; Level IV/V: Resources for advanced trauma life support in remote areas.

Return to Contents

 

NEDS Weight Variable

To produce national or regional estimates, the HCUP databases provide a "weight" variable that you can apply to your data. If you're interested in learning more about weighting the national databases, please access the HCUP tutorial on weighting entitled Producing HCUP National Estimates.

Return to Contents

 

NEDS Summary

In this section you have learned the following information about the NEDS database:

  • The NEDS is constructed from the State Emergency Department Databases (SEDD), and the State Inpatient Databases (SID)
  • The NEDS is a stratified sample of hospital-owned emergency departments
  • The NEDS cannot be used to conduct state-level analyses

Return to Contents

 

Kids' Inpatient Database (KID)

The third national-level database, we will review the Kids' Inpatient Database (KID), is specifically designed for pediatric research, particularly for the study of rare pediatric conditions.

The KID is produced every three years starting with 1997 data.

The way the KID is created is quite different than the NIS and the NEDS.

  • Children are not hospitalized that often, and that's a good thing. In fact, the most common reason for a child to be hospitalized is for their own birth!
  • About two thirds of all pediatric hospital stays are for newborns.
  • And, the vast majority of these newborn stays are uncomplicated, routine births.

This is great from a public health perspective, but all these healthy, uncomplicated births overwhelm the data making it difficult to identify rare pediatric hospitalizations.

The KID is designed to accommodate research on rare pediatric conditions that require hospitalization, such as congenital anomalies, as well as rare pediatric medical procedures, such as heart surgery and organ transplantation.

While the NIS does include pediatric discharges, the NIS is not optimized for research on rare pediatric inpatient hospitalizations. It's best to use the KID for this kind of research.

Note that the NEDS is well-suited for research on pediatric emergency care.

Return to Contents

 

KID Sample

The KID is a stratified sample of discharges from the State Inpatient Databases (SID).

  • KID Target Universe: The target universe for the KID is US community, non-rehabilitation hospitals with pediatric discharges, again, based on the American Hospital Association or AHA Annual Survey of Hospitals.

  • KID Sample Frame: While the NIS is a sample of discharges and the NEDS is a sample of hospitals and hospital emergency departments, the KID is a sample of individual discharges of pediatric patients. The definition of pediatric is 20 years and under. The sampling frame for the KID is the same as the sampling frame for the NIS: the State Inpatient Databases, or SID. Unlike the NIS, the KID includes a sample of pediatric discharges from all hospitals with pediatric stays in the sampling frame.
  • KID Sample Strata: For sampling, pediatric discharges are stratified into three categories: uncomplicated in-hospital births, complicated in-hospital births, and all other pediatric hospital stays.
  • KID Sample Unit: Systematic random sampling is used to select 10% of uncomplicated in-hospital births and 80% of complicated in-hospital births and other pediatric cases from each frame hospital. This over-sampling of complicated births and pediatric non-births, ensures that we get a good representation of rare pediatric hospitalizations. We do not need to sample many of the uncomplicated births because there is little difference in the characteristics of one uncomplicated birth compared to another uncomplicated birth. So we only need a small representation of uncomplicated births.

Return to Contents

 

KID Strata

The KID is stratified by uncomplicated in-hospital births; complicated in-hospital births, and all other pediatric hospital stays. Unlike the NIS or NEDS, the KID records are post-stratified in order to enable users to create national and regional estimates. The discharges are post-stratified in proportion to the number of AHA newborns and the total number of non-newborn AHA admissions.

Return to Contents

 

KID Weight Variable

In order to produce national or regional estimates of pediatric hospitalizations using the KID, discharge weights are developed using the American Hospital Association or AHA target universe as the standard.

To do so, KID records are post-stratified by US region, urban or rural location, teaching status, ownership, and bed size, with the addition of a stratum for freestanding children's hospitals.

The KID is stratified by freestanding children's or other hospitals. Children's hospitals restrict admissions to children, while other hospitals admit both adults and children. There may be significant differences in practice patterns, severity of illness, and available services between children's hospitals and other hospitals. Children's units in general hospitals are not stratified as children's hospitals.

If you're interested in learning more about weighting the national databases, please access the weighting tutorial Producing HCUP National Estimates.

Return to Contents

 

KID Summary

In this section, you have learned the following information about the KID database:

  • The KID is constructed from the State Inpatient Databases (SID)
  • The KID is a stratified sample of pediatric discharges: complicated births and non-births are over-sampled
  • The KID cannot be used to conduct state-level analyses

Return to Contents

 

Common Errors

There are some mistakes that are easy to make when working with the HCUP national databases. Understanding the sample design of each database, will help you avoid these errors.

One of the most common errors is not weighting the NIS, NEDS, and KID data, when attempting to produce national and/or regional estimates. Remember that these national databases are based on samples - they must be weighted to derive national and/or regional estimates. If you do not weight the data, what you have are sample record counts, not national and/or regional estimates.

A serious violation occurs if users report cell sizes less than or equal to 10 in their publications. Remember that you signed an HCUP Data Use Agreement or DUA, that prohibits you from reporting any cell sizes less than or equal to 10. This is required as a privacy precaution. From a sample design perspective, any estimate that you based off of such a low count probably isn't that reliable anyway. If you'd like a refresher on the HCUP DUA, please consider reviewing the HCUP DUA module - it's only 15 minutes in length, and can be accessed via the link on the screen.

Another error is that sometimes new users attempt to produce state-level estimates from the national databases. Remember that none of the HCUP national databases have a sample design that includes "state" as a strata variable. Only national and regional estimates should be produced from the national databases. Trying to produce state-level estimates from the NIS, NEDS, or KID could result in biased results. In fact, AHRQ removed the data element identifying states beginning with the 2012 NIS.

New users sometimes use the inappropriate database for a particular study. For example, remember to use the KID, rather than the NIS, for your research on rare pediatric conditions as the sample design of the KID is specifically created to accommodate rare pediatric research. Also, take caution when using any of the HCUP national databases for race-related research as race data are not uniformly available across the HCUP state databases, or, put another way, across the "sampling frame."

Sometimes users try to work with the HCUP national databases in software packages that are not designed to account for complex survey design, such as Microsoft Excel. You must use statistical software, such as SAS, Sudaan, or Stata, that can handle data derived from complex sampling designs. This is important because analyses that fail to account for the sample design could yield biased estimates, and may have direct impact on your variance calculations.

Users sometimes neglect to check their estimates against other data sources. At a minimum, it is recommended that you check your estimates against HCUPnet, which is a free online query system with access to HCUP data.

Return to Contents

 

Key Differences in Sample Design

When looking at key differences in sample design amongst the NIS, NEDS, and KID, remember that each database has a unique purpose and that the target universe, frame, strata, and unit of each database differ. The table below highlights those differences.

  NIS NEDS KID
Target Universe All community, non-rehabilitation hospitals in the United States excluding long-term acute-care hospitals All ED visits from hospital-owned ED units in community, non-rehabilitation hospitals in the United States Pediatric discharges from community, non-rehabilitation hospitals in the United States
Sample Frame All discharges from community, non-rehabilitation hospitals, excluding long-term acute care hospitals, in the participating HCUP Partner States All ED visits from hospital-owned ED units in community, non-rehabilitation hospitals in the participating HCUP Partner States Pediatric discharges from community, non-rehabilitation hospitals in the participating HCUP Partner States
Sample Strata US Census Division, urban or rural location, teaching status, ownership, bed size US Region, urban or rural location, teaching status, ownership, trauma-level Uncomplicated births, complicated births, all other pediatric hospital stays
Sample Unit Discharge-level Hospital-owned ED-level Pediatric discharges
Return to Contents

 

Key Points

As you begin your work with the HCUP national databases, you will want to keep in mind the following key points:

  • It's important to select the appropriate database given your research question. The sample design of the HCUP national databases can influence which HCUP database is best suited for your research.
  • All three national databases are derived from the HCUP State-level Databases.
  • You should not use the national databases for state-specific questions. You should get state-level data from state-specific databases only.
  • The NEDS is developed based on a stratified systematic sample of hospital-owned EDs. The NIS is a stratified systematic sample of discharges. The KID is based on a stratified systematic sample of pediatric discharges.

Return to Contents

 

Resources and Other Training

If you are looking for more information on the subject matter covered here, many resources are available on the HCUP User Support Web site.

If you can't find what you need, feel free to email the HCUP Technical Assistance staff at hcup@ahrq.gov. AHRQ has senior research personnel available to answer technical questions you may have.

Thank you for accessing this module. There are several other HCUP online tutorials that can be accessed. Take a look to see if there are other topics that could be helpful to you.

If you have any feedback regarding this module, please email us at hcup@ahrq.gov.

Return to Contents


Internet Citation: HCUP Sample Design: National Databases - Accessible Version. Healthcare Cost and Utilization Project (HCUP). November 2018. Agency for Healthcare Research and Quality, Rockville, MD. www.hcup-us.ahrq.gov/tech_assist/sampledesign/508_compliance/index508_2018.jsp.
Are you having problems viewing or printing pages on this Website?
If you have comments, suggestions, and/or questions, please contact hcup@ahrq.gov.
Privacy Notice, Viewers & Players
Last modified 11/5/18