HEALTHCARE COST & UTILIZATION PROJECT

User Support

Do Your own analysis
Explore Expert Research & Limited Datasets

Load and Check HCUP Data - Accessible Version


Contents:

Welcome

Thank you for joining us for this Healthcare Cost and Utilization Project (HCUP) online tutorial.

My name is Yvette, and I'm going to show you how to get started with your HCUP research.

In this tutorial I'll walk you through how to properly load HCUP data on to your computer and how to check that the data have loaded correctly.

These are first steps to conducting successful analyses with HCUP data.

This module is for individuals who have completed the HCUP Data Use Agreement Training, signed the HCUP Data Use Agreement, obtained their copy of the HCUP data, and are ready to begin their research. This tutorial will take approximately 30 minutes to complete.

Return to Contents

Navigation

To navigate this tutorial, use the buttons along the bottom of the screen. Use the next and back buttons to move forward or review previous pages.

If you would like to exit the tutorial, click the "X" button in the far-right corner.

If you would like to navigate to a specific part of the tutorial, use the menu items along the top of the screen.

For additional resources available to you, click the resources button.

To turn the audio on or off, click the "Speaker" button also found in the bottom right corner of the screen.

To pause the tutorial at any time, click the "Play/Pause" button in the lower left corner. Click the button again to resume playing the tutorial.

Return to Contents

About HCUP

Before we get started, a quick word about HCUP:

HCUP is sponsored by the Agency for Healthcare Research and Quality (AHRQ). HCUP is a family of databases, software tools, and related research products that enable research on a variety of healthcare topics.

If you are unfamiliar with HCUP or would like a refresher, please consider taking our General Overview Course.

Return to Contents

Learning Objectives

There are two learning objectives for this tutorial.

The first objective is to save the HCUP data products to your computer, unzip (or decompress) HCUP data, and then load the HCUP data into a standard statistical software package.

The second objective is to learn how to verify that you have correctly loaded the data onto your computer. I'll show you how to run a few basic programs to generate summary output which you can check against the summary statistics and other resourses available on the HCUP-US Web site.

Return to Contents

HCUP Database File Contents

The HCUP databases can be purchased from the online HCUP Central Distributor, which is the entity that accepts, processes, and fulfills applications for the purchase of HCUP databases.

To get started, let's review the two methods for delivery of the HCUP databases. HCUP databases that are purchased from the HCUP Central Distributor are delivered in one of two ways:

  • Nationwide Databases are downloaded directly from the online HCUP Central Distributor, and bundled into a single delivery zip file
  • State Databases are shipped to you by the HCUP Central Distributor on physical media (i.e., CDs or DVDs) using next day or 2-day service.

Regardless of the delivery format, your purchased HCUP database will arrive in a zip file: a compressed, encrypted format that requires a password to unzip.

Let's review what you received from the online HCUP Central Distributor, and exactly what you'll be saving to your computer. For your reference, the information I am about to present can also be found in the introductory document included with the delivery files.

Regardless of the HCUP database that is purchased, contents of the delivery files include:

  • HCUP database zip files (the number and types of files will vary depending on database and year) and
  • Documentation files

These contents are zipped into a single zip file that must be unzipped to access the full collection of related files making up your product set. The zip file and its nested database zip files all use the same password which was emailed to you.

I am now going to review what files you would expect to receive for each HCUP database.

Return to Contents

HCUP National (Nationwide) Inpatient Sample (NIS)

If you are using the National Inpatient Sample (NIS), your downloaded zip file will contain fixed-width ASCII formatted data files that are compressed and encrypted. The NIS is available annually and generally includes three discharge-level files and one hospital-level file.

The three discharge-level files include the:

  • Core File, a single file containing commonly used data elements (e.g., age, expected primary payer, diagnosis and procedure codes, and total charges). This file contains over 7 million discharge records.
  • Severity Measures File, a single file containing additional data elements to aid in identifying the severity of the condition for a specific discharge.
  • Diagnosis and Procedure Groups File, a single file including additional information on diagnosis and procedure codes, generally derived from the HCUP software tools.

The hospital-level file includes the Hospital File, a single file including information on hospital characteristics.

The specific contents of the files by year are available on the NIS HCUP-US File Specifications page.

To load and analyze HCUP NIS data onto your computer, you will need at least 15 gigabytes of available storage, depending on which analysis software you plan to use. Detailed specifications are described in the NIS Overview located on the HCUP-US website.

Return to Contents

HCUP Kids' Inpatient Database (KID)

If you are working with the Kids' Inpatient Database (KID), your downloaded zip file will contain compressed, encrypted ASCII files. The KID is available every three years and includes the same discharge- and hospital-level files as the NIS:

The three discharge-level files include the:

  • Core File, a single file containing commonly used data elements (e.g., age, expected primary payer, diagnosis and procedure codes, and total charges). This file contains 2 to 3 million pediatric discharges.
  • Severity Measures File, a single file containing additional data elements to aid in identifying the severity of the condition for a specific discharge.
  • Diagnosis and Procedure Groups File, a single file including additional information on diagnosis and procedure codes, generally derived from the HCUP software tools.

The hospital-level file includes the Hospital File, a single file including information on hospital characteristics.

The specific contents of the files by year are available on the KID HCUP-US File Specifications page.

In order to load and analyze HCUP KID data onto your computer, you will need at least 10 gigabytes of available storage. Detailed specifications are described in the KID Overview located on the HCUP-US website.

Return to Contents

HCUP Nationwide Ambulatory Surgery Sample (NASS)

If you are working with the Nationwide Ambulatory Surgery Sample (NASS), your downloaded zip file will contain compressed, encrypted CSV files. The NASS is available annually beginning data year 2016 and generally includes two discharge-level files and one hospital-level file, until data year 2018 when a new discharge level file was newly available.

The three discharge-level files include the:

  • Encounter File, a single file containing commonly used data elements for ambulatory surgery encounters containing in-scope major ambulatory surgeries (e.g., age, expected primary payer, diagnosis and procedure codes, and total charges).
  • Supplemental File, a single file containing information on procedures that were performed during encounters recorded in the Encounter file but considered out of scope major ambulatory surgeries.
  • Diagnosis and Procedure Groups File, a single file including additional information on diagnosis codes, generally derived from the HCUP software tools. This file is available beginning data year 2018.

The hospital-level file includes the Hospital File a single file including information on hospital characteristics.

The specific contents of the files by year are available on the NASS HCUP-US File Specifications page.

In order to load and analyze HCUP NASS data onto your computer, you will need 50-100 gigabytes of available storage space. Detailed specifications are described in the NASS Overview located on the HCUP-US website.

Return to Contents

HCUP Nationwide Emergency Department Sample (NEDS)

If you are working with the Nationwide Emergency Department Sample (NEDS), your downloaded zip file will contain compressed, encrypted CSV files. The NEDS is available annually and generally includes three discharge-level files and one hospital-level file; however, a new discharge-level file was added in data year 2018.

The four discharge-level files include the:

  • Core File, a single file containing commonly used data elements (e.g., age, expected primary payer, diagnosis codes, and total charges). This file contains over 30 million emergency department (ED) visits.
  • Supplemental ED File, a single file containing data elements specific to ED visits that do not result in admission to the same hospital.
  • Supplemental IP File, a single file containing data elements specific to ED visits that result in an admission to the same hospital.
  • Diagnosis and Procedure Groups File, a single file including additional information on diagnosis codes, generally derived from the HCUP software tools. This file is available beginning data year 2018.

The hospital-level file includes the Hospital Weights File, a single file including information on hospital characteristics.

The specific contents of the files by year are available on the NEDS HCUP-US File Specifications page.

Because the NEDS is such a large database, you should have 75-100 gigabytes of storage space available on your computer to be able to work comfortably with the NEDS. Detailed specifications are described in the NEDS Overview located on the HCUP-US website.

Return to Contents

HCUP Nationwide Readmissions Database (NRD)

If you are working with the Nationwide Readmissions Database (NRD), your downloaded zip file will contain compressed, encrypted CSV files. The NRD is an annual file that generally includes three discharge-level files and one hospital-level file.

The three discharge-level files include the:

  • Core File, a single file containing commonly used data elements critical to readmission analyses (e.g., age, expected primary payer, diagnosis and procedure codes, and total charges).
  • Severity Measures File, a single file containing additional data elements to aid in identifying the severity of the condition for a specific discharge.
  • Diagnosis and Procedure Groups File, a single file including additional information on diagnosis and procedure codes, generally derived from the HCUP software tools.

The hospital-level file includes the Hospital File, a single file including information on hospital characteristics.

The specific contents of the files by year are available on the NRD HCUP-US File Specifications page.

In order to load and analyze HCUP NRD data onto your computer, you will need at least 50 gigabytes of available storage space. Detailed specifications are described in the NRD Overview located on the HCUP-US website.

Return to Contents

HCUP State Databases

If you are working with one of the State databases, which include the State Inpatient Database (SID), State Ambulatory Surgery and Services Database (SASD), and State Emergency Department Database (SEDD), the zip file will contain compressed, encrypted ASCII files. The HCUP State databases are available annually with the number of files you receive depending on the State, year, and database you are using.

You may receive up to four discharge-level files, which include the:

  • Core File, a single file containing commonly used data elements (e.g., age, expected primary payer, diagnosis and procedure codes, and total charges)
  • Charges File, a single file containing information on charges associated with the inpatient stay or outpatient encounter
  • Severity Measures File, a single file containing additional data elements to aid in identifying the severity of the condition for a specific discharge
  • Diagnosis and Procedure Groups File, a single file including additional information on diagnosis and procedure codes, generally derived from the HCUP software tools.

Unlike the HCUP Nationwide databases, the zip file you receive for the State databases does not include a hospital-level file.

The specific contents of the files by State, year and database are available on the HCUP-US File Specifications page accessible through the link on the screen. Note that the number of DVDs you receive will vary by State, year, and database.

Return to Contents

American Hospital Association (AHA) Linkage Files

If you are interested in obtaining hospital characteristic information, you will need to separately download the American Hospital Association (AHA) Linkage File, which is available on the HCUP-US website. The HCUP AHA Linkage Files are designed to be used exclusively with the HCUP SID, SASD, and SEDD. These files are unique by State and year. The linkage files are available for only a subset of HCUP Partners because not all of them release AHA identifiers.

Return to Contents

Database File Content Changes for ICD-10-CM/PCS

Changes for Data Year 2015

If you are specifically working with one or more HCUP databases for the 2015 data year, you should expect there to be changes with respect to the database file contents for that year. Due to the transition from the International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) to the International Classification of Diseases, Tenth Revision, Clinical Modification/Procedure Coding System (ICD-10-CM/PCS) on October 1, 2015, the 2015 HCUP databases include a combination of codes:

  • Nine months of the data with ICD-9-CM codes (January 1, 2015 to September 30, 2015)
  • Three months of data with ICD-10-CM/PCS codes (October 1, 2015 to December 31, 2015).

To alert users to this change in the data, the file structure for the 2015 HCUP databases differs from the annual files for other data years. Specifically, the first three quarters of data (with ICD-9-CM codes) are stored separately from the fourth quarter of data (with ICD-10-CM/PCS codes). Additionally, for the fourth quarter of 2015 data, some files are not available. Note that the HCUP KID and NASS are unavailable for data year 2015.

Availability of the Diagnosis and Procedure Groups File

The Diagnosis and Procedure Groups File is unavailable for the 2016-2017 HCUP Nationwide databases and the 2016-2019 HCUP State databases as ICD-10-CM/PCS versions of the HCUP software tools were not yet developed. Users interested in applying HCUP software tools to the HCUP databases for data years including ICD-10-CM/PCS-coded data to produce data elements currently unavailable in the database files may do so by downloading the respective tool(s) from the Research Tools section of the HCUP-US website. Additionally, users may wish to review the HCUP Software Tools Tutorial, which provides instructions on how to apply the HCUP software tools to HCUP or other administrative databases.

Return to Contents

HCUP-US Documentation

No matter which database you are working with, all the documentation and tools you will need to use your HCUP data files are found on the HCUP-US website.

Let's go to the HCUP-US website right now, and I'll show you the documentation available to you. It's located in the "Databases" section of the website, under "Database Documentation".

Today, we'll be using the NIS Core File to demonstrate the load and check processes, so I'll take a look at the NIS database documentation.

The NIS database documentation includes a detailed introduction to the database, descriptions of the data elements including availability over time, file specifications, and programs needed to load the data, and analytic tools designed specifically for use with the HCUP data.

If you're just getting started using HCUP data, you may want to begin with the Introduction document, which is available for each database, located on the specific database documentation page for the NIS, KID, NEDS, NRD, NASS, SID, SASD, and SEDD.

These documents can all be found on the left-hand side of the appropriate database documentation page. These introductory documents contain much of the information I am reviewing with you today, such as the size and structure of the HCUP database files.

Return to Contents

Decompressing Data Overview

To load and analyze HCUP data onto your computer, you'll need to have 15 to 100 gigabytes of space available on your computer, depending on the database and year.

Because of the size of the HCUP database files, the files are compressed and encrypted with SecureZip® from PKWARE

To begin, you must save the data to your hard drive before you unzip and decrypt the data.

The steps involved in saving the delivered files to your hard drive may differ slightly depending on whether you're working with Nationwide or State databases, but the process of decompressing the data are the same for each of the HCUP databases.

Return to Contents

Decompressing the Data Step by Step

I'm going to walk through the steps involved in saving the zip files to my hard drive and then unzipping the data.

Please note, the HCUP data files are zipped files that cannot be decrypted by the built-in zip/unzip utility that comes with Windows operating systems or Macintosh (Archive Utility). Unzip programs are available from several reputable vendors.

  • ZIP Reader® (Windows) - PKWARE corporation
  • SecureZIP® for (Mac) - PKWARE coporation
  • WinZip® (Windows) - Win Zip corporation
  • Stuffit Expander® (Mac) - Smith Micro corporation

The process I'm about to walk through is for a Nationwide digital download and is just one of several ways to go about loading the data file from either a DVD or a digital download onto a computer. If you're familiar with other means of accomplishing the same steps, it will not be a problem to use those in place of what I'm about to show you. Just make sure to always check your work, as I will demonstrate throughout this tutorial to confirm that the data have decompressed, saved, and loaded correctly.

If you are in need of assistance in using software packages other than those used in this demonstration, please contact HCUP Technical Assistance at hcup@ahrq.gov.

So, let's get started.

Here in the hard drive directory, I've created a "data" folder to hold the HCUP database products.

Today I'm working with the 2019 NIS, so I'll name my directory "NIS 2019".

Next, I will download the NIS 2019 file from the online HCUP Central Distributor website. I have logged in to my online account and displayed the "Order History". I will click "View Downloads" for my "Order".

This order contains two product files: a zip file for the 2019 NIS and a separate zip file that contains the HCUP Cost-to-Charge Ratio (CCR) File for the 2019 NIS. For this demonstration, I will save the NIS 2019 file to the folder I already created. Clicking the "download" link triggers the browsers download widget. This will vary by browser.

Clicking the arrow by the word "Save As", I will place the zip file in the NIS 2019 folder I created.

If your browser does not offer a "Save-as" option, the file will automatically download to a location on your computer from which you can later unzip it.

Downloading the file will take some time depending on your internet connection speed.

Once the delivery zip file has completely downloaded, you will be able to see the NIS 2019 zip file in the destination folder.

I will choose "Open with WinZip®" to see the files contained in the delivery zip.

The zip dialog window should open. I will be prompted for the password before I can see the contents.

The WinZip® dialog opens showing the contents of the delivery zip file.

Use the Unzip or Extract function to extract the files from the delivery zip file to the location desired.

I will enter the decryption password when prompted. Purchasers received this password in an email from the HCUP Central Distributor.

The zip utility will extract all the files and place them in the folder I created. A progress window may display as the files unzip.

When the extraction function is completed, the files will display in my folder.

The zip files in this folder are data files I will unzip and load into the analysis software.

The other method to save the data to my computer is from a DVD.

If my data products are on a DVD, I will insert the DVD into my disk drive and open up the DVD directory on my computer.

The DVD will display the files that are to be downloaded to my hard drive. Select all of the files and select "Copy".

I will copy the files to the folder I created on my hard drive. Once the files are copied to my folder, open up the folder.

From this point, the steps to extract the databases are the same regardless of which database you are working with. You will usually extract the "Core" database file first.

In this example, in the NIS 2019 folder, I will select the NIS_2019_Core.zip file.

I will be using WinZip® to open the file. Your Zip utility may have a different appearance and different options.

In the WinZip® dialog box, I will Unzip the Core file using the Unzip button. When I click Unzip, I am prompted for the password to decrypt the file.

I'll enter the same password as the zip file used earlier and click okay. The file begins extracting.

When the Core ASCII file is extracted, the newly extracted file appears in my folder. This is the Core data file I will load into my analysis software.

Remember that you are responsible for the security of your HCUP data and the Data Use Agreement requires the data to be stored in a safe place. Loading this data onto a LAN where other users have access is not allowed unless the other users have all signed an HCUP Data Use Agreement.

Return to Contents

Load into Statistical Software

Now that I've saved the data onto my hard drive, I must load it into a statistical software package to work with it. The HCUP-US website offers load programs in SAS, SPSS and Stata because these are some of the most frequently used packages, but there are other statistical software programs on the market that can also be used.

Note that HCUP data files cannot be analyzed using desktop spreadsheet or database applications because of their size and complexity.

I'm going to demonstrate loading the NIS 2019 Core File using SAS. Navigate to the NIS database documentation page on HCUP-US. Halfway down on the left side you will see "File Specifications and Load Programs". I will click on the "Nationwide SAS Load Programs".

I will pull up the "2019 NIS" file from the drop-down menu and then I will click on the "SAS NIS 2019 Core File load program" to download the program to my computer.

This will open the load program in a new window with text that needs to be copied and pasted into SAS. Open SAS and copy this text into SAS. Note that the version of SAS you see here may differ from the version you're using. While the layout and icons may be different, the code you are using will be the same regardless of the version.

I will save the file to my local hard drive in the NIS 2019 folder.

Once the program is open, I need to assign a library as I do every time, I run a SAS program.

I will use the libname command to set the library to the same location where I have stored the ASCII file‒that is, the NIS 2019 folder on my hard drive.

When I am calling the file into SAS, I need to use the assigned libname as a prefix to the name of the data set to let SAS know where to store the data set.

I also need to indicate where on the computer SAS should look to find the ASCII file I want to load.

Now that I've made the 2 modifications needed for the program to know where my data can be found, I'm going to scroll through the rest of the load program to see it in its entirety.

Once I've finished looking through the program, I'll select run and submit.

The program takes a few minutes to run.

When the run is completed, SAS generates a log file of the program. I will check this log file to make sure that I do not see any error messages.

If there is an error message, I may need to double check my work to this point.

If there are no error messages, I will see notes indicating how long it took SAS to load the file as well as notes describing each step the program executed.

Return to Contents

Approach

I've loaded the data into my statistical software. I am now ready for the next step. What I need to do now is check that I've loaded the data correctly.

The files from each of the databases will look quite similar, although the data elements and number of records included in each will differ greatly.

Note that actual data are not shown in this tutorial. This is an illustration of what the SAS option in this example may look like.

Right now, I am scrolling across each of the data elements that were loaded. These include patient characteristics, diagnosis codes, procedure codes, and other information about the hospitalization.

I also want to scroll down through the records to make sure there are 7 million 83 thousand 805 records, the total number of unweighted records in the 2019 NIS.

Yes, there are.

You can find this information in the NIS Summary Statistics, NIS File Specifications, or NIS Introduction.

Note that now that the file has been loaded into the statistical software, you may want to delete the original ASCII version as well as the ZIP file to keep them from taking up too much space on your computer.

Next, to check the data, I'm going to go back to HCUP-US and pull some summary statistics files. I'll compare the summary statistics files from HCUP-US to some basic frequencies and distributions I'll run on the data I've just loaded.

Return to Contents

Summary Statistics

Let's go back to the NIS Data Documentation section of HCUP-US and scroll a bit down the page.

Under the heading "Data Elements" there is a bullet with a link to "NIS Summary Statistics".

Let's click on this link to open the Summary Statistics page. Note that if you are working with a database other than the NIS, go to the appropriate Data Documentation screen and choose the summary statistics which correspond to the database you've loaded.

Right now, I'm only interested in the statistics for the "unweighted Core File".

I will open up the "unweighted Core File" and take a look.

As you can see, this file provides you with basic statistics on each of the data elements in the NIS, the "N", or number of records which contain the data element, the "minimum and maximum" values of the data element, the "mean", and the "median".

For example, if I look at the data element, AWEEKEND, which stores the information on whether the admission occurred on a weekend or a weekday, I see of the 7 million 83 thousand 805 records in the NIS, about 21 percent ‒ 1 million 482 thousand 79 ‒ are coded as having been admitted on the weekend.

I can see that 58 records did not have anything coded for this data element.

These frequency distribution tables are available for each of the data elements in the NIS in the Summary Statistics PDF files we just downloaded from the HCUP-US website.

Return to Contents

Running Check Programs

To check the data, I'll create tables of means and of frequency distributions from the data I've loaded on my computer and then compare those statistics to the summary statistics available on HCUP-US.

If the numbers match the HCUP summary statistics I downloaded from HCUP-US, I'll know the data has loaded correctly. If not, I'll know there's a problem and I'll have to go back and figure out what I've done wrong.

To create the tables, I'll go back to SAS. I'll start with a short program to check the means of the data elements.

Submit the SAS program.

This will generate a table which should match the first table we saw in the Summary Statistics file.

Now, I'll compare the output of our SAS program to the information in the Summary Statistics file. I can see that so far it looks good.

Next, I'll generate some tables to check the frequency distributions of various data elements. Since I've looked at the AWEEKEND table in the Summary Statistics File, I will start with the AWEEKEND data element.

I will need to run the frequency on AWEEKEND.

Submit the SAS program.

My output should match exactly the data I saw in the Summary Statistics File. It looks like it checks out.

I recommend running frequency distributions on a few other data elements as well.

Return to Contents

Linking HCUP Database Files

As I mentioned earlier in this tutorial, each HCUP database includes multiple files, which vary by database, year, and State (specifically for the HCUP State databases). These files also vary in terms of their unit of analysis ‒ that is, either a discharge-level or hospital-level file. In some cases, you may wish to link a respective database's files together to supplement the information found in the Core File. The linkage data element will differ based on the file's unit of analysis.

To link a hospital-level file to a discharge-level file, such as the Core File with a Hospital File or AHA Linkage File, you would use the database's hospital identifier. For example, you would link the NIS Core File to the NIS Hospital File using the data element, HOSP_NIS.

To link one or more discharge-level files, such as linking the Core File with a Diagnosis and Procedure Groups File, you would use the HCUP record identifier. For example, you would link the NIS Core File to the NIS Diagnosis and Procedure Groups File using data element, KEY_NIS.

Let's consider an analysis that is focused on obtaining counts by patient age group for a specific HCUP Clinical Classifications Software Refined (CCSR) for ICD-10-CM diagnosis category (CIR019, Heart failure) in the 2019 NIS. To accomplish this, you will need to link the 2019 NIS Core File (which contains information on patient age at admission) with the 2019 NIS Diagnosis and Procedure Groups File (which contains information on CCSR categories). I will walk through how to do this next.

I will go back to SAS and create a program to obtain the total number of unweighted records with any-listed diagnosis CCSR of CIR019 and then for four age groups ‒ 0-17, 18-44, 45-64, and 65+.

In this program you will see that my KEEP statements reflect the data elements that I am interested in for my given analysis, which are AGE and DXCCSR_CIR019, in addition to other critical data elements for the linkage of NIS files, which include HOSP_NIS and KEY_NIS. In this example, I am only interested in unweighted counts for testing purpose, but if you wish to obtain national estimates of all U.S. discharges, the KEEP statement also needs to include the discharge weight or data element DISCWT.

Be mindful that the data elements in the KEEP statement will need to be modified to align with your analysis as well as your database. For example, if you are using the 2019 NRD, you would use HOSP_NRD and KEY_NRD.

Additionally, be mindful of the OBS macro in this SAS program. It is used in my example SAS code for testing purposes, but in a full and final run, it should be changed to the value MAX.

Now, I will submit the SAS program.

This will generate a table that will provide the total number of unweighted records in the 2019 NIS with any-listed diagnosis of CCSR CIR019 overall as well as by my four age groups.

I can see that it has. Specifically, the column with the value "1" under the merged column header "Any-listed CCSR CIR019".

Thus far in this tutorial, we have discussed how to check the load of your HCUP data against the HCUP Summary Statistics. For this example, I will need to use a different resource for validation as the Summary Statistics do not provide record counts by CCSR categories. I will discuss additional HCUP resources for validation next.

Return to Contents

Additional HCUP Resources for Validation

In some cases, statistics may go beyond the single data element frequencies that are provided in the HCUP Summary Statistics. For example, perhaps you are interested in the cross-tabulation of two data elements, such as patient age and a specific clinical condition.

HCUP offers several publicly available resources that can be used for validation purposes, which I will discuss next.

Examples of the additional HCUP resources that are available include:

  • HCUPnet, our free online query tool that provides select pre-calculated statistics derived from both the HCUP State and Nationwide databases. HCUPnet can be used to validate select national estimates obtained from the NIS, KID, NRD, or NEDS and county- or State-level statistics for participating HCUP Partners.
  • Diagnosis and Procedure Frequency Tables, which provide frequencies of ICD-10-CM/PCS codes (individually and grouped by clinical category) in the HCUP Nationwide databases. These are available under the "Data Elements" section of the respective Nationwide database documentation page on HCUP-US. You can use these tables either to ensure correct application of the HCUP Clinical Classifications Software Refined (CCSR) for ICD-10-CM diagnoses and ICD-10-PCS procedures to one of the HCUP Nationwide databases or use the tables as a guide to see if the HCUP Nationwide database of interest would have enough cases to support a specific diagnosis or procedure for your analysis.
  • HCUP Summary Trend Tables, downloadable tables providing State-specific monthly trends in hospital utilization. The information is derived from the HCUP SID and provides trends overall as well as by select priority conditions like COVID-19, encounter types like elective versus non-elective stays, and service lines like maternal and neonatal. The tables provide a great resource for validating estimates that use the HCUP SID.

As an example, I am going to show you how to use the Diagnosis and Procedure Frequency Tables to validate the overall number of unweighted records with any-listed diagnosis CCSR of CIR019, heart failure. If you recall, I was unable to obtain this count in the Summary Statistics. Note that the Diagnosis and Procedure Frequency Tables only provide overall counts and do not subset by patient characteristics, like age.

In my output, I see 1 million 135 thousand 157 records have any-listed diagnosis CCSR of CIR019.

Now, I'll compare this overall unweighted count from our SAS program to the information in the Diagnosis and Procedure Frequency Table for the NIS, which I obtained from the NIS Database Documentation page on HCUP-US, under "Data Elements."

The total number of unweighted records with any-listed diagnosis CCSR of CIR019 should match exactly the statistics in the Diagnosis and Procedure Frequency Table for the NIS. It looks like it checks out.

Return to Contents

Identifying Possible Problems

What happens if the statistics you generate from the data you load do not agree with those in the summary statistics or other HCUP resources? You will need to double check your work.

First, check to make sure you used the load program that corresponds to the database you are working with.

Then check to make sure you used the summary statistics which correspond to the database you are working with.

If you are using another HCUP resource, check the methodology for that resource to make sure that there aren't any differences. For example, if you are using one or more HCUP SID for a research study and comparing your results with HCUPnet, it is important to keep in mind that HCUPnet limits hospitals in the SID to community hospitals, excluding rehabilitation and long-term acute care (LTAC) hospitals. Depending on the State, your estimates may be higher if you do not implement this same limitation. Another example is a comparison to the Diagnosis and Procedure Frequency tables that we just used to validate our counts for heart failure in the 2019 NIS. If your statistics do not match, it is important to check the version of the HCUP software tool between the HCUP database and the HCUP resource.

If you are using the HCUP Nationwide databases specifically and are obtaining national estimates, check that you have applied the database weights correctly. For additional guidance on applying weights to obtain national estimates, you may wish to review the Producing National HCUP Estimates tutorial.

Finally, check the code used to generate the means and frequency distributions.

If none of these resolve the problem, HCUP Technical Assistance is available at hcup@ahrq.gov.

Return to Contents

Key Points

As you begin to work with HCUP data, the following key points will help you along the way:

The load programs and summary statistics you will need to properly load and check your data are available on the HCUP-US website along with any other database documentation or additional HCUP resources you may need.

Data should always be checked after it has been loaded to make sure it has loaded correctly.

If you encounter problems loading or checking your data, review your work and available HCUP resources, and if you still have questions, then contact HCUP Technical Assistance if necessary, at hcup@ahrq.gov.

Return to Contents

Resources and Other Training

If you are looking for more information on the subject matter covered here, many resources are available on the HCUP-US website.

If you can't find what you need, feel free to email the HCUP Technical Assistance staff at hcup@ahrq.gov. AHRQ has experienced research personnel available to respond to technical questions you may have. Inquiries are typically answered within three business days.

Thank you for accessing this tutorial. There are several other HCUP online tutorials. Access these tutorials to see if there are other topics that could be helpful to you.

If you have any feedback regarding this tutorial, please email us at hcup@ahrq.gov.

Return to Contents


Internet Citation: Load and Check HCUP Data - Accessible Version. Healthcare Cost and Utilization Project (HCUP). January 2022. Agency for Healthcare Research and Quality, Rockville, MD. www.hcup-us.ahrq.gov/tech_assist/loadandcheck/508_course/508course_2019.jsp.
Are you having problems viewing or printing pages on this website?
If you have comments, suggestions, and/or questions, please contact hcup@ahrq.gov.
Privacy Notice, Viewers & Players
Last modified 1/10/22