User Support

Do Your own analysis
Explore Expert Research & Limited Datasets

Using Multiple Years of HCUP Data Tutorial


Welcome to Using Multiple Years of HCUP Data Tutorial.

Thank you for joining us for this Healthcare Cost and Utilization Project (HCUP), online tutorial on multi-year analysis. This course presents solutions that may be necessary when conducting analyses that span multiple years of HCUP data.

One of the strengths of HCUP is that multiple years of data are available. This makes it possible to study trends over time on topics such as utilization, access, charges, quality, and outcomes. It also allows researchers to study rare conditions by combining multiple years of data to gain sufficient sample size.

However, if precautions are not considered, errors in study results may occur when 2 or more years of data are combined. This course will describes problems that may arise when using multiple years of HCUP data and provide you with solutions for addressing those problems.

This tutorial will take 45 minutes to complete.

Return to Contents

About HCUP

HCUP is sponsored by the Agency for Healthcare Research and Quality (AHRQ). AHRQ is part of the U.S. Department of Health and Human Services. HCUP is a family of databases, software tools, and related research products that enable research on a variety of healthcare topics, including cost and quality of healthcare services, access to healthcare, and treatment outcomes. If you are unfamiliar with HCUP or would like a refresher, please consider taking our general HCUP Overview Course located on the HCUP-US website.

Return to Contents

HCUP Databases

HCUP has eight types of databases—three State-specific databases and five Nationwide databases. The State-specific databases include the State Inpatient Databases (SID), the State Ambulatory Surgery and Services Databases (SASD), and the State Emergency Department Databases (SEDD).

The Nationwide databases include the National Inpatient Sample (NIS), the Nationwide Emergency Department Sample (NEDS), the Kids’ Inpatient Database (KID), the Nationwide Readmissions Database (NRD) and the Nationwide Ambulatory Surgery Sample (NASS).
  • State Inpatient Databases (SID)
    The SID contain inpatient hospital discharge data, including admissions that started in the emergency department, from participating HCUP States. The SID can be used for such research topics as use and cost of hospital services, quality of care, and impact of health policy changes.
  • State Ambulatory Surgery and Services Databases (SASD)
    The SASD contain ambulatory surgery and other services data from hospital-owned and sometimes nonhospital-owned facilities from participating HCUP States. The SASD can be used for analyses of ambulatory surgeries such as examining trends in utilization, access, and outcomes.
  • State Emergency Department Databases (SEDD)
    And the SEDD contain treat-and-release emergency department data from participating HCUP States. The SEDD support emergency department research that examines such topics as injury surveillance, emerging infections, and access to emergency department services.
The Nationwide databases include:
  • National Inpatient Sample (NIS)
    The NIS is the largest publicly available all-payer inpatient care database in the United States, containing data on more than seven million hospital stays. It can be used to generate national and regional estimates of quality of care, patient safety, and more.
  • Kids' Inpatient Database (KID)
    The KID is the largest publicly available all-payer pediatric inpatient care database in the United States, containing data from two to three million hospital stays. It can be used to generate national and regional estimates of such topics as pediatric inpatient utilization, healthcare access, and the quality of care.
  • Nationwide Emergency Department Sample (NEDS)
    The NEDS is the largest all-payer emergency department database in the United States. It yields national estimates of hospital-owned emergency department visits and can be used to generate national and regional estimates of emergency department utilization, access, and quality.
  • Nationwide Readmissions Database (NRD)
    The NRD is a unique and powerful database designed to support various types of analyses of national readmission rates. It can be used to generate national estimates of all-cause and condition-specific readmissions.
  • Nationwide Ambulatory Surgery Sample (NASS)
    The NASS is the largest all-payer ambulatory surgery database in the United States, yielding national estimates of major ambulatory surgery encounters performed in hospital-owned facilities. The NASS can be used to analyze selected ambulatory surgery utilization patterns and to support public health professionals, administrators, policymakers, and clinicians in their decision making regarding this critical source of care.

Return to Contents

Learning Objectives

This tutorial has three key learning objectives:

Objective 1

Become familiar with the ways that HCUP data have changed over time.

Objective 2

Learn how these changes may affect your analyses when you use multiple years of HCUP data.

Objective 3

Learn about available HCUP resources and how you can use them to address problems that arise from combining multiple years of data. 

Return to Contents

Overview of Changes that Affect Multi-Year Analysis of HCUP Data

It is important to understand the following changes in the HCUP data and account for them in your multi-year analysis:
  • Changes to nationwide database design and weights
  • Changes to data elements
Return to Contents

Changes to Nationwide Database Design and Weights

Let’s start by talking about the changes to Nationwide database design and weights. For example, major changes in the NIS sampling design occurred in 2012 to improve national estimates.
  • Beginning with 2012 data, the NIS was redesigned and now is created using a sample of discharges from all community hospitals participating in HCUP, approximating a 20 percent sample of discharges from U.S. community hospitals, excluding rehabilitation and long-term acute care hospitals.
  • Prior to 2012, the NIS was created from a sample of hospitals, approximating a 20 percent sample of U.S. community hospitals, excluding rehabilitation hospitals. All discharges from sampled hospitals were retained.
As a result of the 2012 NIS redesign, users should expect one-time disruptions to historical trends for counts, rates, and means estimated from the NIS, beginning with data year 2012. 

Some of the differences with the NIS beginning with 2012 data include: 
  • Decline of about 4.3 percent in overall trends in discharge counts
  • Decline of about 1.5 percent in overall trends in average length of stay
  • Decline of about 0.5 percent in overall trends in total charges
  • Decline of about 2.0 percent in overall trends in hospital mortality
For additional information on the redesign of the 2012 NIS, please refer to the 2012 NIS Redesign Report.

  • To adjust for changes in the 2012 NIS design, AHRQ developed new 1993-2011 NIS Trend Weights for analyses spanning 2012 and earlier NIS data.
  • The data element trend weight (TRENDWT) contains weights that can be applied to NIS data from 1993-2011 for consistency with those that are used in the 2012 and future years of the redesigned NIS. TRENDWT is designed to be used instead of the original NIS discharge weight (DISCWT), which is found on the NIS Core File. 
Obtain the NIS Trend Weights Files:

The NIS Trend Weights files are available for download as self-extracting PKZIP compressed ASCII files along with SAS®, SPSS®, and Stata® load programs. These files are available on the HCUP US website under Tools and Software Supplemental Files.

Using the NIS Trend Weights Files

Let's consider the following example when discussing how to use the NIS trend weights files: a trend analysis that is using NIS data for years 2007 through 2014 to examine a specific condition across these eight years of data.
  1. Obtain the NIS databases for 2007-2014.
  2. Obtain the NIS-Trend Weights files for 2007-2011.
  3. Load these databases into SAS or another statistical software package.
  4. Merge the complete NIS Core file for each year with the corresponding NIS Trend Weights file for that year, for the years 2007-2011.
  5. Check the availability of data elements in the time frame. If you plan to use the data elements included in the NIS Hospital Weights File, Severity Measures File, or Diagnosis and Procedure Groups File, merge these files with the merged NIS Core–Trend Weights file.
  6. Concatenate all the databases: the 2007-2011 merged NIS Core–Trend Weights file and the 2012-2014 complete NIS Core file.
  7. Subset the database above, keeping the data elements necessary for your analysis.

Redesign of the KID in 1997

It is important to note that like the NIS, the KID was redesigned. For analyses that span KID data for 1997 and later years, the KID Trends Weights Files is available. Similar instructions as just discussed can be used to apply the KID Trend Weights Files for the 1997 KID. For additional information, please click on the following link: KID Trend Weights File.

Return to Contents

Data Element Changes:

Now let's discuss the second type of change that users should consider when conducting a multiyear analysis - changes in data elements over time.

When using multiple years of HCUP data, you must consider three general types of data element changes:
  • Availability of data elements across data years
  • Changes in the coding of data elements
  • Updates to coding and classifications systems

Return to Contents

Change #1: Availability of Data Elements Across Data Years

We will begin by discussing the first data element change "availability of data elements across data years".

Data elements can change because States add, modify, or discontinue data elements.

The HCUP-US website provides extensive database documentation including the availability and description of data elements over time. Additional information on the specific sections of HCUP-US will be discussed later in this section of the tutorial.

The HCUP Supplemental Variables for Revisit Analyses serve as an example of data elements that have both availability and coding challenges.

Caution: The HCUP Supplemental Variables for Revisit Analyses (i.e. Revisit Variables) are only available on some State Databases and coding of these variables can change over time.

The two HCUP Revisit Variables are:
  • Synthetic person-level identifiers (VisitLink).
  • Timing variable that can be used to determine the days between hospital events (DaysToEvent).
Effect: In some States, the availability of the Revisit Variables can change over time. Additionally, the variable VisitLink is derived from encrypted person numbers provided by the State. States sometimes change the coding scheme used between data years for the encrypted person numbers, which in turn causes a discontinuity in the variable VisitLink.

Solution: Users are encouraged to review both the State Database and Revisit Variables documentation for additional information.

The HCUP-US website State Database documentation includes data element matrices for each year, which display the data elements available for a given year for each State that releases HCUP data. For the State databases refer to the "Availability of Data Elements by Year" section to view these matrices.

The example provided is for the SID.

In addition, users are encouraged to refer to the User Guide for the HCUP Supplemental Variables for Revisit Analyses. This helpful resource tracks the availability of the Revisit Variables by State and HCUP State Databases and provides tables that document the consistency of encrypted person numbers over time.

The User Guide for the HCUP Supplemental Variables for Revisit Analyses can be found on the HCUP-US website on the Tools and Software page, under the HCUP Supplemental Files section, labeled "HCUP Supplemental Variables for Revisit Analyses".

It is important to note that data element availability information is also available for the Nationwide databases, specifically under "Description of Data Elements".

The example provided is for the NIS.

Return to Contents

Change #2: Changes in the Coding of Data Elements

The second type of data element change we will discuss is "changes in the coding of data elements".

For example, data elements can change as a result of modifications to the coding specified by the National Uniform Billing Committee (NUBC). Many HCUP data elements adhere to NUBC coding, such as:
  • Source of admission (PointofOriginUB04 or ASOURCE)
  • Type of admission (ATYPE)
  • Disposition status (DISPUB04)
  • Type of bill (BILLTYPE)
Source of admission serves as an example of a data element that has changed over the years because of changes from the NUBC.

Caution: Beginning in the 2007 data year, the NUBC updated the specifications on the source of admission coding methods. Detailed information can be found on the HCUP-US website as listed below: 
Effect: The continuity of the source of admission data element was disrupted because of the transition from ASOURCEUB92 to PointOfOriginUB04.

Most States gradually shifted from using the old admission source codes in ASOURCEUB92 to using the new codes in POINTOFORIGINUB04. However, a few States continued using the old UB92 admission source codes to report source of admission through 2016, and one State still uses the old UB92 codes to report source of admission.

Solution: Be aware of changes across years if you are using the following admission source and point of origin data elements: 
  • Admission source, uniform coding (ASOURCE)
  • Admission source, as received from source (ASOURCE_X)
  • PointOfOriginUB04
  • Point of origin, as received from source (PointOfOrigin_X)
You might consider using the data element indicating transfer into the hospital such as (TRAN_IN) in place of the data elements listed above. The data element TRAN_IN indicates that the patient was transferred into the hospital and is defined using either admission source or point of origin.

Return to Contents

Change #3: Updates to Coding and Classification Systems

The last type of data element change that we would like to discuss is "updates to coding and classification systems". Each year data elements can change because of annual updates to coding systems. Example coding systems include:

  • The International Classification of Diseases, Ninth Revision, Clinical Modification or ICD-9-CM
  • International Classification of Diseases, Tenth Revision, Clinical Modification/Procedure Coding System or ICD-10-CM/PCS
  • Current Procedural Terminology (CPT)
These coding system changes may also affect classification systems that use these underlying codes such as Diagnostic Related Groups (DRGs) and Major Diagnostic Categories (MDCs).

Caution: The transition from ICD-9-CM to ICD-10-CM/PCS on October 1, 2015 serves as an example of an update to a coding system. This coding system transition led to substantial changes in procedure and diagnosis codes. Specifically, the number of diagnosis codes increased from about 14,000 under ICD-9-CM to over 68,000 under ICD-10-CM and the number of procedure codes increased from about 4,000 under ICD-9-CM to over 72,000 under ICD-10-PCS.

Effect: The transition from ICD-9-CM to ICD-10-CM/PCS starting in the 2015 data year, directly affected the reporting of medical services. Therefore, users will observe data element changes in the HCUP databases.

To alert user to this change, the names of the HCUP data elements for data years that use ICD-10-CM/PCS coding include the notation "I 10". For example, the HCUP data element DXn, which previously stored ICD-9-CM diagnosis codes, is called I10_DXn beginning in quarter 4 of 2015.

Users may also observe discontinuity in trend analyses that span the October 1, 2015 transition date from ICD-9-CM to ICD-10-CM/PCS. The ICD-10-CM/PCS code structure is very different from the ICD-9-CM code structure, and those differences may affect trends.

Further, due to frequent changes in the ICD-10-CM/PCS coding system, the HCUP software tools are no longer provided on the HCUP databases beginning with quarter 4 2015 data. However, they are still available for download under the Tools & Software page on the HCUP US website.

Solution: Users will need to account for data element name changes when analyzing multiple years of HCUP data that span the ICD-10-CM/PCS transition.

For trend analyses spanning the transition to ICD-10-CM/PCS on October 1, 2015, users should analyze the data by year and by discharge quarter to determine whether there is a discontinuity. If a discontinuity exists, report the trends in a way that acknowledges the change, such as reporting data from only one coding system or clearly demarcating the transition to ICD-10-CM/PCS in trend figures.

For additional information on the transition to the ICD-10-CM/PCS coding system, refer to the ICD-10-CM/PCS Resources page on the HCUP-US website.

The ICD-10-CM/PCS Resources page can be found under the Data Innovations tab on the HCUP-US website.

Return to Contents


In summary, this tutorial described the following three topics:
  • First the specific changes that can occur with the HCUP databases that may affect multiyear analyses. Specifically, two changes were discussed: 1) changes to nationwide database design and weights and 2) changes to data elements.
  • Second, how to use the HCUP US website as a resource to become familiar with the HCUP databases you plan to use, and the changes made to the databases over time before you begin to analyze the data. We recommend that you thoroughly review the HCUP Database Documentation under the databases tab on the HCUP US website.
  • Third, the importance of making sure that the data elements you need for your analysis are available across each year of the database you will use. For the HCUP State databases, you will need to review the Availability of Data Elements by Year and for the nationwide databases, the Description of Data Elements.

Return to Contents

Checklist for Using Multiple Years of HCUP Data

Below, you will find a helpful checklist for preparing a multiyear analysis of HCUP data.
  • Identify the specific HCUP databases and data years you will require for your analysis.
  • Review the respective databases' documentation to determine how changes to the database may affect your multi-year analysis. This would include the availability and description of data elements over time.
  • If applicable, also review supplemental documentation, such as those for the NIS Trend Weights Files, the HCUP Revisit Variables, as well as the HCUP reports page, which may provide sample SAS and other statistical software programming code.
  • Account for any changes within your multiyear analysis. This may require modifying your data element selection or the format used for any visual reporting, such as clearly demarcating the transition to ICD-10-CM/PCS within trend graphs.
Return to Contents


If you are looking for more information on the subject matter covered here, many resources are available on the HCUP-US website.

Click on Resources to review web pages and reports that are useful when analyzing multiple years of HCUP data.

If you can't find what you need, feel free to email the technical assistance staff at AHRQ has resource personnel available to answer technical questions you may have. Thank you for accessing this module. There are several other HCUP online tutorials located on the HCUP US website under the Technical Assistance tab, under HCUP online tutorials. If you have any feedback regarding this module, please email us at

Return to Contents

Other Resources
Return to Contents

Internet Citation: HCUP Using Multiple Years of Data - Accessible Version. Healthcare Cost and Utilization Project (HCUP). October 2019. Agency for Healthcare Research and Quality, Rockville, MD.
Are you having problems viewing or printing pages on this website?
If you have comments, suggestions, and/or questions, please contact
Privacy Notice, Viewers & Players
Last modified 10/25/19