Producing National HCUP Estimates
Thank you for joining us for this Healthcare Cost and Utilization Project (HCUP) online tutorial on producing national and regional estimates. This tutorial was created for researchers who are using HCUP nationwide databases, understand the design of the nationwide databases, and are ready to produce national and regional estimates.
In this tutorial you will learn how to produce national and regional estimates by weighting the unweighted HCUP data. Contents:
Before we get started, a quick word about HCUP:
HCUP is sponsored by the Agency for Healthcare Research and Quality (AHRQ). HCUP is a family of databases, software tools, and related research products that enable research on a variety of healthcare topics. If you are unfamiliar with HCUP or would like a refresher, please consider taking our General Overview Course. Return to Contents
The first objective is to understand how three nationwide databases - the NIS (National Inpatient Sample); the NEDS (Nationwide Emergency Department Sample); and the KID (Kids' Inpatient Database) can be weighted to produce national and regional estimates.
A separate tutorial for the nationwide database, the NRD, or Nationwide Readmissions Database, explains how to apply weights when using the NRD database. The second objective is to select and apply the appropriate discharge or hospital weight to generate national estimates at the discharge or hospital level from unweighted record counts. The third objective is to understand when it is appropriate to use the NIS and NEDS databases as unweighted samples. This module introduces weighting each of the three nationwide HCUP databases. Please note: The Nationwide Readmissions Database (NRD), has its own tutorial that explains how to apply weights to produce national and regional estimates when using that database. Please refer to the Nationwide Readmissions Database (NRD) tutorial on the HCUP-US website. Return to Contents
Why do we need to weight HCUP data?
Most researchers working with the HCUP nationwide databases are interested in using the data to create national and regional estimates. The HCUP nationwide databases are samples designed to represent a larger universe - data must be weighted in order to achieve national and regional estimates. For an in-depth explanation of the sample designs, you can access the HCUP online course on Sample Design of Nationwide Databases. The next sections will cover how each nationwide database can be used to produce national and regional estimates. Return to Contents
The National Inpatient Sample (NIS) is a database of hospital inpatient discharges which can be used to create national and regional estimates of hospital utilization, access, costs and quality.
To perform such analyses on the NIS data contained in the Core File, you must weight the unweighted observations. Weighting the data will enable you to produce nationally representative estimates. Return to Contents
In 2012, the NIS was redesigned to improve national estimates.
The previous NIS was composed of all discharges from a sample of hospitals in HCUP. The redesigned NIS is sample of discharges from all hospitals in HCUP. To highlight the design change, AHRQ renamed the Nationwide Inpatient Sample (NIS) to the National Inpatient Sample (NIS). For detailed information on the 2012 NIS Redesign, see the NIS Redesign Report. Return to Contents
There are several changes to the NIS beginning with 2012 data.
The new NIS, beginning with the 2012 data year, is called the "National" Inpatient Sample. The previous NIS, which included data years 1988 to 2011, was called the "Nationwide" Inpatient Sample. Whereas, the previous NIS universe included long-term acute care hospitals, and annual discharge estimates and hospital identification was were based on information from the American Hospital Association or AHA, the new NIS removes long-term acute care hospitals. Annual discharge estimates and hospital identification are based on information from the SID when available. Otherwise, they are based on AHA information. Whereas the strata for the previous NIS used hospital census regions for stratification, the new NIS uses hospital census divisions for stratification. Whereas the sample design for the previous NIS was to sample 1,000 hospitals, amounting to more than 8 million records, the sample design for the new NIS is to sample 7 million discharge records from more than 4,000 hospitals. This new sampling strategy results in estimates with more precise statistical properties than the previous NIS design. This table has a summary of the 2012 NIS redesign. |
New NIS (beginning in 2012 Data Year) | Previous NIS (1998-2011 Data Years) | |
---|---|---|
Name | National Inpatient Sample (NIS) | Nationwide Inpatient Sample (NIS) |
Universe | Removed long-term acute care hospitals | Included long-term acute care hospitals |
Annual hospital discharge count estimates and hospital entities based on information from the SID when available, otherwise, based on AHA information | Annual hospital discharge count estimates and hospital identification based on information from AHA | |
Strata | Used hospital census division (9) for stratification | Used hospital census region (4) for stratification |
Sample Design | 7 million hospital discharge records from more than 4,000 hospitals | 8 million records from more than 1,000 hospitals |
The weights you apply to the data depend on the type of estimates you want to produce.
The NIS includes weights to produce national or regional estimates. NIS data for years prior to 2012 includes both hospital and discharge weights. The hospital weights can be used to produce hospital-level estimates, and the discharge weights can be used to produce discharge-level estimates. Beginning with data from 2012 after the NIS redesign, hospital weights are not included in the data because they are no longer needed. NIS data for years 2012 onwards, should only be weighted to produce discharge-level estimates. To accurately produce discharge-level estimates, such as estimates of the total number of discharges with a diagnosis of asthma in the United States or estimates of the total number of discharges in the United States for individuals age 65 and over, you must apply a discharge weight to each record in the NIS Core File. The discharge weights were calculated for NIS data by first stratifying the NIS hospitals on the same variables that were used for creating the sample. These variables were Census division, urban/rural location, teaching status, bed size, and ownership. A weight was then calculated for each stratum, by dividing the number of universe discharges in that stratum, obtained from HCUP and the AHA data, by the number of NIS discharges in the stratum. Weighted estimates can be calculated by applying the discharge weights to the sample discharges. Weights have been assigned to each discharge and are stored in each record in the data element discharge weight (DISCWT). When the discharge weights are applied to the unweighted NIS data, the result is an estimate of the number of discharges for the entire universe. In the case of the NIS, the universe is all inpatient discharges from community hospitals in the U.S., excluding rehabilitation hospitals beginning with 1998, and excluding long-term acute-care hospitals beginning with 2012. Return to Contents
This tutorial will use SAS® to demonstrate how to weight HCUP data to produce national and regional estimates. In addition to SAS®, there are several other statistical software packages which can produce statistics from the stratified sampling design of the nationwide HCUP databases. STATA® and SPSS® are two commonly used examples.
For a more detailed explanation of how to use these software packages to work with the nationwide HCUP databases, please refer to the documentation available on Methods Report on HCUP-US, including the Calculating National Inpatient Sample (NIS) Variances for Data Years 2012 and Later. During all demonstrations, this tutorial will refer to CCS categories. Clinical Classification Software utilizes a categorization scheme that collapses the universe of ICD-9-CM diagnosis codes into over 280 clinically meaningful diagnosis categories. It does the same for procedure codes. The CCS categorization scheme has been applied to the records within the HCUP databases, and the CCS codes are stored in each record. Return to Contents
As a means of demonstration in this tutorial, we will tabulate the unweighted number of records in the NIS for which asthma is indicated as a principal diagnosis. First determine which records do and do not have asthma listed as the principal diagnosis.
Next, this tutorial will demonstrate how to weight discharge-level data to produce national estimates.
To estimate the number of hospital discharges nationwide with a principal diagnosis of asthma, weight the data by using the WEIGHT keyword in SAS, and the discharge weight data element in the PROC SURVEYMEANS step. For analysis including 2012 and earlier years, replace HOSP_NIS with HOSPID and use the NIS Trend Weight (TRENDWT), available in the resources section of this tutorial, in place of the original discharge weight (DISCWT) for years prior to 2012. See the Multi-Year Analysis Tutorial for more information.
In this example, the result is 339,890 - an estimate of the number of hospital discharges, nationwide, with a principal diagnosis of asthma in 2014.
One way to verify that you have weighted the data correctly, would be to compare estimates with those generated by a simple query on HCUPnet, the online system which provides quick access to national and regional estimates using HCUP data.
The HCUPnet results and your results using SAS® should be the same - in this case 339,890 discharges with a principal diagnosis of asthma (CCS 128) using the original NIS discharge weight (DISCWT). The nationwide statistics in HCUPnet for years prior to 2012 were regenerated using new trend weights to permit longitudinal analysis. Note that since the NIS contains close to a 20 percent sample of all US hospital discharges, another simple check on the accuracy of your weighted estimate is to multiply the number of unweighted discharges by 5.
You might want to also produce regional estimates of hospital discharges with a diagnosis of asthma once you have weighted the data. If so, one method for producing these estimates is to use information contained in the HOSP_REGION data element which is available in the hospital file. Then, you can use the DOMAIN SAS keyword to indicate that you want to produce estimates of asthma discharges by region of hospital. The resulting output will contain separate discharge sum estimates for each Census Region (1: Northeast, 2: Midwest, 3: South, and 4: West).
Regional Estimates
Check your results using HCUPnet. The first part of the query on HCUPnet will be the same as that which you performed for the national estimate.
The HCUPnet results and your results using SAS® should be the same. Return to Contents
NIS data are available annually going back to 1988. The NIS discharge weight data element has changed over time.
For trend analysis spanning 2012 and earlier years, use the NIS Trend Weight (TRENDWT) available at https://www.hcup-us.ahrq.gov/db/nation/nis/trendwghts.jsp, in place of the original discharge weight (DISCWT) included in the NIS core file for years prior to 2012. See the Multi-Year Analysis Tutorial for more information. Return to Contents |
Years | Variable Name | Use |
---|---|---|
2001 and later | DISCWT | All national estimates |
2000 | DISCWT | National estimates except those including total charge |
2000 | DISCWTcharge | National estimates of total charge |
1998-1999 | DISCWT | All national estimates |
1998 | DISCWT_U | All national estimates |
Prior to 2012, to produce hospital-level estimates you must apply hospital weights to the data.
For NIS years 2012 onwards, hospital weights are discontinued. Prior to 2012, hospital weights are calculated for each of the NIS strata. Within each of the strata, each hospital's weight is equal to the number of universe hospitals it represents. Because twenty percent of the AHA universe hospitals are sampled in each stratum when possible, the hospital weights are usually near five (as each hospital sampled represents about 5 hospitals). The NIS 1988-2011 hospital weights are available in the data element hospital weight (HOSPWT) and are stored in each hospital record in the Hospital File. When the hospital weights are applied to the unweighted NIS hospital observations, the result is the number of hospitals for the entire universe - in the case of the 1988-2011 NIS, the universe is all US community hospitals, excluding rehabilitation hospitals beginning with 1998, and excluding Long-Term Acute Care (LTAC) hospitals beginning with 2012. Return to Contents
Note that the variable for hospital weights depends on the data year. For the 1998-2011 NIS and later years, HOSPWT should be used to create nationwide hospital-level estimates. For NIS databases prior to 1998, the variable HOSPWT_U should be used. Beginning with the 2012 NIS redesign, the hospital weights are no longer needed.
Return to Contents
You may be interested in using the NIS as an unweighted sample.
In most cases, it is critical to weight the data to produce accurate, unbiased results. However, if your research does not necessitate creating national or regional estimates, do not use any weights in your programming. An analysis of the association between hospital-level results from a survey on safe hospital practices and hospital-level inpatient risk-adjusted mortality scores is one example of research for which it would be appropriate to use the NIS as an unweighted sample. Visit here to access this study. Return to Contents
Let's review what you need to do to create national or regional estimates using NIS data.
Weight your data with discharge weights (DISCWT) for discharge-level analyses. Prior to 2012, weight your data with hospital weights (HOSPWT) for hospital level estimates. For trend analysis including 2012 and earlier years, use the NIS Trend Weight (TRENDWT) in place of the original discharge weight (DISCWT) for years prior to 2012. Check the weighted data totals by performing a quick query on HCUPnet. In most cases, weighting the data is critical to producing accurate and unbiased results. However, if you are not producing national or regional estimates, but rather are using the NIS as an unweighted sample, do not apply weights to your data. If you are interested in a more detailed statistical explanation of weighting, refer to the documentation available on the HCUP-US website, particularly the report entitled Calculating National Inpatient Sample (NIS) Variances for Data Years 2012 and Later. Remember that the nationwide databases cannot be used to conduct State-level analyses. If you are interested in performing State-level analyses, use the HCUP State-specific databases. Return to Contents
The Nationwide Emergency Department Sample (NEDS) is a database of emergency department visits - visits for which the patient was treated and released as well as visits that resulted in a hospital admission - going back annually to 2006.
The NEDS can be used to produce national and regional estimates of emergency department care, utilization, access, costs and quality. To produce national or regional estimates of the NEDS data contained in the Core File, you must weight the unweighted observations. Return to Contents
The weights you apply will depend on the type of analysis you are performing. The NEDS data can be weighted to produce emergency department or ED visit level estimates or hospital-level estimates.
To produce discharge-level estimates, such as estimating the number of emergency department visits for influenza in the United States or estimating the number of emergency department visits for hip fractures among the older adults in the United States, you must apply a discharge weight to each record in the Core File. The discharge weights were calculated for NEDS data by first stratifying the NEDS hospitals on the same variables that were used for creating the sample. These variables were geographic region, trauma center designation, urban/rural location, teaching status, and ownership. A weight was then calculated for each stratum by dividing the number of universe of ED visits in that stratum - obtained from AHA data - by the number of NEDS of ED visits in the stratum. Weighted estimates can be calculated by applying the discharge weights to the sample of ED visits. Weights have been assigned to each record. In each record in the NEDS Core File, the weight is stored in the data element DISCWT. When the discharge weights are applied to the unweighted NEDS observations, the result is an estimate of the number of of ED visits for the entire universe. In the case of the NEDS, the universe is emergency department visits nationwide. Return to Contents
Again, let's demonstrate weighting the data using SAS. As with the NIS data, begin by running a simple program to see how many records there are in the NEDS for which influenza is indicated as a first-listed diagnosis.
Unweighted Record Count
To estimate the number of nationwide emergency department visits with a first-listed diagnosis of influenza, weight the data using the WEIGHT keyword in the PROC SURVEYMEANS step.
National Estimate
The result is 782,665, an estimate of the number of nationwide emergency department visits, both those in which the patient was treated and released and those that resulted in a hospital admission, with a first-listed diagnosis of influenza in 2014.
To verify that you have weighted the data correctly, perform a query on HCUPnet.
Analysis will generate and display. The HCUPnet results and the SAS results should be the same. Return to Contents
Once you have weighted the data, you might want to also produce regional estimates of emergency department visits for influenza. If so, one method for producing these estimates is to use information contained in the HOSP_REGION data element which are available in the hospital file. Then, use the SAS DOMAIN keyword to indicate that you want to produce estimates of influenza ED visits by region of hospital.
Regional Estimates
Check your results by running a quick query on HCUPnet.
The HCUPnet results and the SAS results should be the same. Return to Contents
To produce hospital-level estimates, such as the number of U.S. emergency departments with a trauma center designation, you will apply a hospital weight (HOSPWT) to the data.
HOSPWT was also calculated according to the NEDS strata. Within each of the strata, each hospital's weight is equal to the number of universe hospitals it represents during the year. To demonstrate how to weight at the hospital level, let's consider an analysis that requires us to estimate the number of emergency departments in the United States with a trauma center. Return to Contents
First, tabulate the number of emergency department records in the NEDS from hospitals which are classified as having a trauma center.
Record Count
The result is 203. This is the number of emergency department records in the NEDS from hospitals with a trauma center.
Unweighted Record Count
To estimate the number of hospitals with a trauma center nationwide you will weight the data.
National Estimate
The result is 958. This is an estimate of the number of hospitals with a trauma center nationwide.
Unweighted Record Count
Depending on the nature of your research, you may be interested in using the NEDS as an unweighted sample.
Examples of an analysis in which it would be appropriate to use the NEDS as an unweighted sample include a hospital-level study in which NEDS emergency department-level data are linked to data on pre-hospital care, such as that provided by emergency medical services. In most cases, it is critical to weight the data to produce accurate, unbiased results. However, if your research does not necessitate creating national or regional estimates, do not use any weights in your programming. Return to Contents
Let's review what you need to do to create national or regional estimates using NEDS data.
Weight your data with discharge weights for ED visit-level analyses. Weight your data with hospital weights for hospital-level estimates. Check the weighted data totals by performing a quick query on HCUPnet. In most cases, weighting the data is critical to producing accurate and unbiased results. However, if you are not producing national or regional estimates, but rather are using the NEDS as an unweighted sample, do not apply weights to your data. Remember that the nationwide databases cannot be used to conduct State-level analyses. If you are interested in performing State-level analyses, use the HCUP State-specific databases. Return to Contents
A third nationwide database is the KID, a database designed specifically for the study of pediatric conditions that require hospitalization.
The KID is produced every 3 years starting with the 1997 data year. To produce national or regional estimates using KID data, you must weight the unweighted observations in the KID file. Return to Contents
KID data must be weighted to perform discharge-level analyses. Because of the sample design of the KID, it cannot be used as an unweighted database. For more information on the unique sample design of the KID, see the Sample Design tutorial.
Discharge weights were calculated for KID data by stratifying the hospitals on the same variables that were used for creating the sample and then creating weights by stratum. The stratifying variables were geographic region, urban/rural location, teaching status, bed size, ownership, and children's hospital. Remember that the KID was designed for the study of pediatric hospitalizations, and that the discharges in the KID are a combination of newborn discharges (including both complicated and uncomplicated) and non-newborn pediatric discharges. Because of this, for each stratum, weights were created for both newborn discharges and non-newborn pediatric discharges. The weights were created for newborn discharges (both complicated and non-complicated) by dividing the number of universe newborns in the stratum by the number of KID newborns in the stratum. And the weights were created for non-newborn discharges by dividing the number of universe non-newborn pediatric discharges in the stratum by the number of KID non-newborn discharges in the stratum. Weighted estimates can be calculated by applying the discharge weights to the sample discharges. Return to Contents
Today, let's demonstrate how to weight the KID to produce national and regional estimates of pediatric hospital discharges with a principal diagnosis of cystic fibrosis, a discharge-level estimate. Remember that pediatric discharges are defined as those for which the patient was age 20 or younger at admission.
First, tabulate the number of records in the KID for which cystic fibrosis is indicated as a principal diagnosis. Unweighted Record Count
In other words, records in which DXCCS1 equals 56.
Here is the resulting output. Again, the data summary provides a quick summary of the number of strata, clusters, and total observations in the data.
The summary shown here confirms a database that contains 95 strata, 4,179 clusters, each cluster representing a single hospital, and 3,195,782 records, the number of records in the 2012 KID. The statistics section provides the results of the analysis: 4,871 records. This is the number of records in the KID for which cystic fibrosis is indicated as a principal diagnosis. This is not an estimate of pediatric discharges nationwide for cystic fibrosis.
Weight the data using the WEIGHT keyword in the PROC SURVEYMEANS step.
National Estimate
The result is 7,007. This is an estimate of the number of pediatric hospital discharges, nationwide, with a principal diagnosis of cystic fibrosis in 2012.
Note that HCUPnet contains a path for querying National Statistics on Children. However, this path provides statistics only on discharges in which the patient was age 17 or less. The KID file from the Central Distributor provides data on discharges where patients are age 20 or less. As a result, the weighted estimates produced will not exactly match those provided by HCUP net, unless you limit the data set to those age 17 or less when creating your estimate of discharges. That said, use HCUP net to get a ballpark idea of what the estimate should be.
The HCUPnet total will be smaller than the total produced from the KID database because the HCUPnet estimate is limited to pediatric patients ages 0-17, but the KID database includes patients ages 0-20. To match the HCUPnet totals, limit the records from the KID database to ages 0-17.
Once you have weighted the data, you might want to also produce regional estimates of the number of discharges for cystic fibrosis. For the KID, the data elements KID_STRATUM and HOSP_REGION are on the Core file. Use the SAS DOMAIN keyword to indicate that you want to produce estimates of cystic fibrosis discharges by region.
Regional Estimate
The KID discharge weight variable has changed over time.
|
Years | Variable Name | Use |
---|---|---|
2003 and later | DISCWT | All national estimates |
2000 | DISCWT | National estimates except those including total charge |
2000 | DISCWTcharge | National estimates of total charge |
1997 | DISCWT_U | All national estimates |
Unlike the NIS and NEDS data, KID data cannot be used to produce hospital-level estimates because the data for the KID was sampled at the discharge level rather than at the hospital level.
Note that the KID cannot be used as an unweighted sample because of the design of the database. Return to Contents
Let's review what you need to do to create national or regional estimates using KID data.
Weight your data with discharge weights (DISCWT) for discharge-level estimates. Check the weighted data totals by performing a quick query on HCUPnet. Remember, as previously mentioned, HCUPnet contains a path for discharges in which the patient was age 17 or less. However, the KID file from the Central Distributor provides data on discharges where patients are age 20 or less. Hospital-level analyses are not possible with KID data. The KID cannot be used as an unweighted sample. Weights should always be applied to the data. Remember that the nationwide databases cannot be used to conduct State-level analyses. If you are interested in performing State-level analyses, use the HCUP State-specific databases. Return to Contents
In summary, weighting is a key concept when working with the HCUP nationwide databases.
Please click each box to learn more on what to do and what not to do. What to do: Remember that the NIS, NEDS, and KID are sample databases. Thus, to produce national or regional estimates from these databases, you must be sure to properly weight the data. It is important that you select the proper weight based on the database, the year of data, and the type of analysis you are conducting. Check your estimates against HCUPnet to ensure that you are using the weights appropriately and calculating estimates and variances accurately. And keep in mind that proper statistical techniques must also be used to calculate standard errors and confidence intervals when using each of the nationwide databases. For detailed instructions, refer to the special report Calculating National Inpatient Sample (NIS) Variances for Data Years 2012 and Later on the HCUP-US website. What not to do: State-level analyses cannot be conducted with the nationwide HCUP databases because the sampling frames are not designed with State as a stratification variable. If you are interested in analyses by State, you should use the State-specific databases. HCUPnet cannot be used to check unweighted estimates, because weights have been applied to the HCUPnet data. Remember that the KID cannot be used as an unweighted database. Return to Contents
If you are looking for more information on the subject matter covered here, several resources are available on the HCUP User Support website (HCUP-US).
If you can't find what you need, feel free to email the HCUP Technical Assistance staff at hcup@ahrq.gov. AHRQ has senior research personnel available to answer technical questions you may have. Thank you for accessing this module. There are several other HCUP online tutorials. Access these tutorials to learn if there are other topics that could be helpful to you. If you have any feedback regarding this module, please email us at hcup@ahrq.gov. Detailed documentation of HCUP resources is available on the HCUP-US website. To access documentation for each of the HCUP national databases, see: Resources
Additional items of interest include:
Return to Contents |
Internet Citation: Producing National HCUP Estimates - Accessible Version. Healthcare Cost and Utilization Project (HCUP). December 2018. Agency for Healthcare Research and Quality, Rockville, MD. www.hcup-us.ahrq.gov/tech_assist/nationalestimates/508_course/508course_2018.jsp. |
Are you having problems viewing or printing pages on this website? |
If you have comments, suggestions, and/or questions, please contact hcup@ahrq.gov. |
Privacy Notice, Viewers & Players |
Last modified 12/13/18 |