Skip Navigation

Producing National HCUP Estimates - Accessible Version

Producing National HCUP Estimates


Welcome

Thank you for joining us for this Healthcare Cost and Utilization Project (HCUP) online tutorial on producing national and regional estimates. This tutorial was created for researchers who are using HCUP national databases, understand the design of the national databases, and are ready to produce national and regional estimates.

In this tutorial you'll learn how to produce national and regional estimates by weighting the unweighted HCUP data.


Contents:
  1. Introduction
    1. About HCUP
    2. Learning Objectives
    3. Weighting HCUP Data
  2. NIS
    1. National (Nationwide) Inpatient Sample (NIS)
    2. NIS Redesign
    3. Prior to the 2012 NIS
    4. NIS Weights
    5. About the Demonstrations
    6. NIS Unweighted Discharge Record Count
    7. NIS National Discharge-Level Estimates
    8. NIS Regional Discharge-Level Estimates
    9. NIS Discharge Weights Over Time
    10. NIS Hospital Weights
    11. NIS Unweighted Hospital Record Count
    12. NIS National Hospital-Level Estimate
    13. Data Elements for NIS Hospital Weights Over Time
    14. NIS Unweighted Analysis
    15. NIS Summary
  3. NEDS
    1. Nationwide Emergency Department Sample (NEDS)
    2. NEDS Discharge Weights
    3. NEDS Unweighted Discharge Record Count
    4. NEDS National Discharge-Level Estimates
    5. NEDS Regional Discharge-Level Estimates
    6. NEDS Hospital Weights
    7. NEDS Unweighted Hospital ED Record Count
    8. NEDS National Hospital ED-Level Estimate
    9. NEDS Unweighted Analysis
    10. NEDS Summary
  4. KID
    1. Kids' Inpatient Database (KID)
    2. KID Discharge Weights
    3. KID Unweighted Discharge Record Count
    4. KID National Discharge-Level Estimates
    5. KID Regional Discharge-Level Estimates
    6. KID Weights Over Time
    7. KID Limitations
    8. KID Summary
  5. Wrap-Up
    1. Key Points
    2. Resources and Other Training

About HCUP

Before we get started, a quick word about HCUP:

HCUP is sponsored by the Agency for Healthcare Research and Quality (AHRQ). HCUP is a family of databases, software tools, and related research products that enable research on a variety of healthcare topics.

If you are unfamiliar with HCUP or would like a refresher, please consider taking our General Overview Course.

Return to Contents


Learning Objectives

There are three learning objectives in this tutorial:

The first objective is to understand how the three national databases (the NIS, or National (Nationwide) Inpatient Sample; the NEDS, or Nationwide Emergency Department Sample; and the KID, or Kids' Inpatient Database) can be weighted to produce national and regional estimates.

The second objective is to select and apply the appropriate discharge or hospital weight in order to generate national estimates at the discharge or hospital level from unweighted record counts.

The third objective is to understand when it is appropriate to use the NIS and NEDS databases as unweighted samples. This module introduces weighting each of the three national HCUP databases.

Return to Contents


Weighting HCUP Data

Why do we need to weight HCUP data?

Most researchers working with the HCUP nationwide databases are interested in using the data to create national and regional estimates.

The HCUP national databases are samples designed to represent a larger universe - data must be weighted in order to achieve national and regional estimates.

For an in-depth explanation of the sample designs, you can access the HCUP online course on Sample Design of National Databases. The next sections will cover how each national database can be used to produce national and regional estimates.

Return to Contents


National (Nationwide) Inpatient Sample (NIS)

The NIS is a database of hospital inpatient discharges which can be used to create national and regional estimates of hospital utilization, access, costs and quality.

In order to perform such analyses on the NIS data contained in the Core File, you must weight the unweighted observations.

Weighting the data will enable you to produce nationally representative estimates.

Return to Contents


NIS Redesign

In 2012 the NIS was redesigned to improve national estimates.

The previous NIS was comprised of all discharges from a sample of hospitals in HCUP.

The redesigned NIS is sample of discharges from all hospitals in HCUP.

To highlight the design change, AHRQ renamed the Nationwide Inpatient Sample (NIS) to the National Inpatient Sample.

For detailed information on the 2012 NIS Redesign, see the NIS Redesign Report.

Return to Contents


Prior to the 2012 NIS

There are several differences between the previous and new NIS.

The new NIS, beginning with the 2012 data year, is called the National Inpatient Sample. The previous NIS, which includes data years 1988 to 2011, was called the Nationwide Inpatient Sample.

Whereas the previous NIS universe included long-term acute care hospitals, and annual discharge estimates and hospital entities were based on information from the AHA, in the new NIS, long-term acute care hospitals are removed, and annual discharge estimates and hospital entities are based on information from the SID when available - otherwise they are based on AHA information.

Whereas the strata for the previous NIS used hospital census region for stratification, the new NIS uses hospital census division for stratification.

Whereas the sample design for the previous NIS was to sample 1,000 hospitals, amounting to more than 8 million records, the sample design for the new NIS is to sample 7 million discharge records from more than 4,000 hospitals.

This new sampling strategy results in estimates with more precise statistical properties than the previous NIS design.

  New NIS (beginning in 2012 Data Year) Previous NIS (1998-2011 Data Year)
Name National Inpatient Sample (NIS) Nationwide Inpatient Sample (NIS)
Universe Removed long-term acute care hospitals Included long-term acute care hospitals
Annual hospital discharge count estimates and hospital entities based on information from the SID when available, otherwise, based on AHA information Annual hospital discharge count estimates and hospital identification based on information from AHA
Strata Used hospital census division (9) for stratification Used hospital census region (4) for stratification
Sample Design 7 million hospital discharge records from more than 4,000 hospitals 8 million records from more than 1,000 hospitals


This new sampling strategy results in estimates with more precise statistical properties than the previous NIS design. Here is a summary of the 2012 NIS redesign:

Return to Contents


NIS Weights

The weights you apply to the data depend on the type of estimates you want to produce.

The NIS includes weights to produce national or regional estimates. NIS data for years prior to 2012 includes both hospital and discharge weights. The hospital weights can be used to produce hospital-level estimates, and the discharge weights can be used to produce discharge-level estiamtes.

Beginning with data from 2012 after the NIS redesign, hospital weights are not included in the data because they are no longer needed. NIS data for years 2012 onwards should be weighted to produce discharge-level estimates only.

To accurately produce discharge-level estimates, such as estimates of the total number of discharges with a diagnosis of asthma in the US or estimates of the total number of discharges in the US for individuals age 65 and over, you must apply a discharge weight to each record in the Core File.

The discharge weights were calculated for NIS data by first stratifying the NIS hospitals on the same variables that were used for creating the sample. These variables were Census division, urban/rural location, teaching status, bed size, and ownership. A weight was then calculated for each stratum by dividing the number of universe discharges in that stratum - obtained from HCUP and American Hospital Association (AHA) data - by the number of NIS discharges in the stratum. Weighted estimates can be calculated by applying the discharge weights to the sample discharges.

Weights have been assigned to each discharge and are stored in each record in the data element DISCWT. When the discharge weights are applied to the unweighted NIS data, the result is an estimate of the number of discharges for the entire universe. In the case of the NIS, the universe is all inpatient discharges from community hospitals in the U.S., excluding rehabilitation hospitals beginning with 1998, and excluding long-term acute-care hospitals beginning with 2012.

Return to Contents


About the Demonstrations

This tutorial will use SAS® to demonstrate how to weight HCUP data to produce national and regional estimates. In addition to SAS®, there are several other statistical software packages which are capable of producing statistics from the stratified sampling design of the national HCUP databases. STATA® and SPSS® are two commonly used examples. For a more detailed explanation of how to use these software packages to work with the national HCUP databases please refer to the documentation available on HCUP-US, including the Methods Report on Calculating National (Nationwide) Inpatient Sample Variances.

During all demonstrations, this tutorial will refer to CCS categories. Clinical Classification Software utilizes a categorization scheme that collapses the universe of ICD-9-CM diagnosis codes into over 280 clinically meaningful diagnosis categories. It does the same for procedure codes. The CCS categorization scheme has been applied to the records within the HCUP databases and the CCS codes are stored in each record.

Return to Contents


NIS Unweighted Discharge Record Count

As a means of demonstration in this tutorial, we will tabulate the unweighted number of records in the NIS for which asthma is indicated as a principal diagnosis.

National Estimate

Title1 'Count records with CCS=128 (asthma) from 2007 NIS File';
libname nis2007 "C:\NIS 2007\";
options obs = MAX PageSize=51 LineSize=146 ;
           
data asthma; 
      set NIS2007.nis_2007_core (keep=KEY HOSPID DISCWT NIS_STRATUM DXCCS1);
      if dxccs1 eq 128 then asthma = 1; 
      else asthma = 0; 
run;
 
PROC SURVEYMEANS DATA=asthma SUM STD MEAN STDERR ; 
      VAR asthma; 
      CLUSTER hospid ; 
      STRATA NIS_stratum ; 
run;
          
  1. First determine which records do and do not have asthma listed as the principal diagnosis. The CCS category for asthma includes 13 ICD-9 codes, so any record with one of those 13 codes listed as the principal diagnosis is assigned the CCS code 128 for asthma. Note that asthma is CCS code 128, so look for records in which DXCCS1 equals 128.
  2. 
    data asthma; 
          set NIS2007.nis_2007_core (keep=KEY HOSPID DISCWT NIS_STRATUM DXCCS1);
          if dxccs1 eq 128 then asthma = 1; 
          else asthma = 0; 
    run;
              
  3. Use PROC SURVEYMEANS to generate statistics about the records which do and do not have the CCS code of 128 (asthma listed as a principal diagnosis). The SURVEYMEANS statement accounts for the complex sample design of the NIS.
  4. 
    PROC SURVEYMEANS DATA=asthma SUM STD MEAN STDERR ;
          VAR asthma; 
          CLUSTER hospid ; 
          STRATA NIS_stratum ; 
    run;
              
  5. The resulting output will contain a data summary of the number of strata, clusters, and total observations in the data. In this example, the summary confirms a database composed of 60 sample strata containing 1,044 clusters - each cluster representing a single hospital - and 8,043,415 records - the number of records in the 2007 NIS. Although the examples employ the 2007 NIS, the program code is suitable for all NIS years except that the data element HOSPID is replaced by HOSP_NIS for 2012 and later.


  6. 
              Count records with CCS=128 (asthma) from 2007 NIS File
                            
                              The SURVEYMEANS Procedure
     
                                Data Summary
                       
                       Number of Strata                   60
                       Number of Clusters               1044
                       Number of Observations        8043415
    
                                  Statistics
    
                                 Std Error
    Variable          Mean         of Mean           Sum           Std Dev
    ------------------------------------------------------------------------
    asthma         0.010125        0.000315         81443        2810.775895
    ------------------------------------------------------------------------
              
  7. The statistics section provides the results of the analysis. The output in this example confirms that the number of records in the NIS with a principal diagnosis of asthma is 81,443. Remember, this is the number of records in the NIS for which asthma is indicated as a principal diagnosis. Without weighting, it is not an estimate of the number of hospital discharges nationwide for asthma.


  8. 
                                 Std Error
    Variable          Mean         of Mean           Sum           Std Dev
    ------------------------------------------------------------------------
    asthma         0.010125        0.000315         81443        2810.775895
    ------------------------------------------------------------------------
              


NIS National Discharge-Level Estimates

Next this tutorial will demonstrate how to weight discharge-level data to produce national estimates.

To estimate the number of hospital discharges nationwide with a principal diagnosis of asthma, weight the data by using the WEIGHT keyword in SAS and the DISCWT data element in the PROC SURVEYMEANS step.

For trend analysis spanning 2012 and earlier years, use the NIS Trend Weight (TRENDWT), available at https://www.hcup-us.ahrq.gov/db/nation/nis/trendwghts.jsp, in place of the original discharge weight (DISCWT) for years prior to 2012. See the Multi-Year Analysis Tutorial for more information.


Title1 'Produce national estimate of discharges with CCS=128 (asthma) from 2007 NIS File (weighted)'; 
libname nis2007 "C:\NIS 2007\"; 
options obs = MAX PageSize=51 LineSize=146 ;
           
data asthma; 
      set NIS2007.nis_2007_core (keep=KEY HOSPID DISCWT NIS_STRATUM DXCCS1); 
      if dxccs1 eq 128 then asthma = 1; 
      else asthma = 0;
run;

PROC SURVEYMEANS DATA=asthma SUM STD MEAN STDERR ; 
    VAR asthma; 
    WEIGHT discwt; 
    CLUSTER hospid ; 
    STRATA NIS_stratum ; 
run;
          
In this example, the result is 402,088 - an estimate of the number of hospital discharges, nationwide, with a principal diagnosis of asthma in 2007.


Produce regional estimates of discharges with CCS=128 (asthma) from 2007 NIS File (weighted)
                          The SURVEYMEANS Procedure
                          
                                Data Summary
                                
                           Number of Strata 60
                           Number of Clusters 1044
                           Number of Observations 8043415
                           Sum of Weights 39541948
                           
                                   Statistics
                                   
                             Std Error
Variable          Mean         of Mean           Sum           Std Dev
------------------------------------------------------------------------
asthma        1.010169        0.000321         402088            13985
------------------------------------------------------------------------
          
One way to verify that you have weighted the data correctly, would be to compare estimates with those generated by a simple query on HCUPnet, the online system which provides quick access to national and regional estimates using HCUP data.

  1. Go to HCUPnet and select "National Statistics on All Stays."

  2. Describe yourself as "Researcher, medical professional."

  3. In this example, you are running a query on a particular diagnosis so you should select "Statistics on specific diagnoses or procedures."

  4. Select 2007 as the data year.

  5. You are using CCS codes to identify asthma patients, so select "Diagnoses grouped by CCS" and then "Principal diagnosis."

  6. Select "All discharges (no restrictions)"

  7. Highlight CCS code 128 for asthma and select "Next."

  8. Select "Number of discharges." and select "Next."

  9. Select "All patients in all hospitals."and select "Next."

  10. On the bottom of the Results page, you can select whether you want the original NIS weights or the new NIS Trend Weights.

    The HCUPnet results and your results using SAS® should be the same - in this case 402,088 discharges with a principal diagnosis of asthma (CCS 128) using the original NIS discharge weight (DISCWT) or 387,880 discharges using the new NIS Trend Weight (TRENDWT). The nationwide statistics in HCUPnet for years prior to 2012 were regenerated using new trend weights in order to permit longitudinal analysis. Note that since the NIS contains close to a 20 percent sample of all US hospital discharges, another simple check on the accuracy of your weighted estimate is to multiply the number of unweighted discharges by 5.
Return to Contents


NIS Regional Discharge-Level Estimates

You might want to also produce regional estimates of hospital discharges with a diagnosis of asthma once you have weighted the data. If so, one method for producing these estimates is to create a variable for region by using information contained in the NIS_STRATUM data element. Then, you can use the DOMAIN SAS keyword to indicate that you want to produce estimates of asthma discharges by region. The resulting output will contain a separate line item estimate for each region.

Note that beginning with 2012, the first digit of NIS_STRATUM is the Census Division (1-9) rather than the Census Region (1-4).

Regional Estimates

Title1 Produce regional estimates of discharges with CCS=128 (asthma) from 2007 NIS File (weighted';
libname nis2007 "C:\NIS 2007\";
options obs = MAX PageSize=51 LineSize=146 ; 

data asthma;
      set NIS2007.nis_2007_core (keep=KEY HOSPID DISCWT NIS_STRATUM DXCCS1);
      retain dischgs 1;
      region = substr(left(put(nis_stratum,8.)),1,1);
      if dxccs1 eq 128 then asthma = 1;
      else asthma = 0;
run;

PROC SURVEYMEANS DATA=asthma SUM STD MEAN STDERR ;
     VAR dischgs;
     WEIGHT discwt ;
     CLUSTER hospid ;
     STRATA NIS_stratum ;
     DOMAIN region * asthma ; 
run;
          




Produce regional estimates of discharges with CCS=128 (asthma) from 2007 NIS File (weighted)
The SURVEYMEANS Procedure Data Summary Number of Strata 60 Number of Clusters 1044 Number of Observations 8043415 Sum of Weights 39541948 Statistics Std Error Variable Mean of Mean Sum Std Dev ---------------------------------------------------------------------------------- dischgs 1.000000 0 39541948 799355 ---------------------------------------------------------------------------------- Domain Analysis: region*asthma Std Error region asthma Variable Mean of Mean Sum Std Dev 1 0 dischgs 1.000000 0 7660700 335678 1 dischgs 1.000000 0 92596 8089.096894 2 0 dischgs 1.000000 0 9038455 322029 1 dischgs 1.000000 0 91657 5868.575661 3 0 dischgs 1.000000 0 15112513 589289 1 dischgs 1.000000 0 160784 9133.832891 4 0 dischgs 1.000000 0 7328192 256261 1 dischgs 1.000000 0 57051 3505.469814
Check your results using HCUPnet. The first part of the query on HCUPnet will be the same as that which you performed for the national estimate. In terms of patient and hospital characteristics, this time you want to see the discharges by region, so you should select "Region of the US."

  1. Go to HCUPnet and select "National Statistics on All Stays."

  2. Describe yourself as "Researcher, medical professional."

  3. In this example, you are running a query on a particular diagnosis so you should select "Statistics on specific diagnoses or procedures."

  4. Select 2007 as the data year.

  5. You are using CCS codes to identify asthma patients, so select "Diagnoses grouped by CCS" and then "Principal diagnosis."

  6. Select "All discharges (no restrictions)"

  7. Highlight CCS code 128 for asthma and select "Next."

  8. Select "Number of discharges." and select "Next."

  9. Select "Region of the US." and select "Next."

  10. On the bottom of the Results page, you can select whether you want the original NIS weights or the new NIS Trend Weights.

The HCUPnet results and your results using SAS® should be the same.

Return to Contents


NIS Discharge Weights over Time

NIS data are available annually going back to 1988.The NIS discharge weight data element has changed over time.

For trend analysis spanning 2012 and earlier years, use the NIS Trend Weight (TRENDWT) available at https://www.hcup-us.ahrq.gov/db/nation/nis/trendwghts.jsp, in place of the original discharge weight (DISCWT) included in the NIS core file for years prior to 2012. See the Multi-Year Analysis Tutorial for more information.

Return to Contents

Years Variable Name Use
2001 and later DISCWT All national estimates
2000 DISCWT National estimates except those including total charge
2000 DISCWTcharge National estimates of total charge
1998-1999 DISCWT All national estimates
1998 DISCWT_U All national estimates


NIS Hospital Weights

Prior to 2012, to produce hospital-level estimates you must apply hospital weights to the data.

For NIS years 2012 onwards, hospital weights are discontinued.

Prior to 2012, hospital weights are calculated for each of the NIS strata. Within each of the strata, each hospital's weight is equal to the number of universe hospitals it represents during the year. Since twenty percent of the AHA universe hospitals in each stratum are sampled when possible, the hospital weights are usually near five.

The NIS 1988-2011 hospital weights are available in the data element HOSPWT and are stored in each hospital record in the Hospital File. When the hospital weights are applied to the unweighted NIS hospital observations, the result is the number of hospitals for the entire universe - in the case of the NIS, the universe is all US community hospitals, excluding rehabilitation hospitals beginning with 1998.

Return to Contents


NIS Unweighted Hospital Record Count

For the NIS 1988-2011, to estimate the number of teaching hospitals nationwide, use the HOSP_TEACH variable to tabulate the number of hospital records in the NIS which are classified as teaching hospitals.

Unweighted Record Count

Title1 Count hospital records with HOSP_TEACH=1 from 2007 NIS HOSPITAL File';
libname nis2007 "C:\NIS 2007\";
options obs = MAX PageSize=51 LineSize=146 ; 

data TEACH1;
       set NIS2007.nis_2007_hospital (keep=HOSPID DISCWT NIS_STRATUM HOSP_TEACH);
       if hosp_teach = 1 then teach = 1;
       else teach = 0;
run;

PROC SURVEYMEANS DATA=TEACH1 SUM STD MEAN STDERR ;
        VAR teach;
        CLUSTER hospid ;
        STRATA NIS_stratum ;
run;
          

The result is 191. This is the number of hospital records in the NIS which are classified as teaching hospitals. This is not a national estimate of teaching hospitals because you did not use a weighting variable.


Count hospital records with HOSP_TEACH=1 from 2007 NIS HOSPITAL File


                           The SURVEYMEANS Procedure

                                   Data Summary

                       Number of Strata                 60
                       Number of Clusters             1044
                       Number of Observations         1044


                            Statistics

                        Std Error
Variable       Mean       of Mean            Sum           Std Dev

--------------------------------------------------------------------
teach       0.182950    0.003682       191.000000         3.844516
--------------------------------------------------------------------
      
          


NIS National Hospital-Level Estimate

To estimate the number of teaching hospitals nationwide, weight the data using the WEIGHT keyword in the PROC SURVEYMEANS step.

National Estimate

Title1 'Produce national estimate of hospitals with HOSP_TEACH =1 from 2007 NIS HOSPITAL File (weighted)'; 
libname nis2007 "C:\NIS 2007\"; 
options obs = MAX PageSize=51 LineSize=146 ;

data TEACH1; 
      set NIS2007.nis_2007_hospital (keep=HOSPID HOSPWT NIS_STRATUM HOSP_TEACH); 
      if hosp_teach = 1 then teach = 1; 
      else teach = 0; 
run;

PROC SURVEYMEANS DATA=TEACH1 SUM STD MEAN STDERR ; 
    VAR teach; 
    WEIGHT hospwt; 
    CLUSTER hospid ; 
    STRATA NIS_stratum ; 
run;
          

The result is 927. This is the estimated of the number of teaching hospitals nationwide (in 2007).


Produce national estimate of hospitals with HOSP_TEACH =1 from 2007 NIS HOSPITAL File (weighted) 
 
 
                              The SURVEYMEANS Procedure

                                     Data Summary

                            Number of Strata             60
                            Number of Clusters         1044
                            Number of Observations     1044
                            Sum of Weights             5099

                                       Statistics

                                    Std Error
Variable             Mean             of Mean               Sum           Std Dev
----------------------------------------------------------------------------------
teach             0.181774            .003672        926.863668         18.722859
----------------------------------------------------------------------------------
          


Data Elements for NIS Hospital Weights Over Time

Note that the variable for hospital weights depends on the data year. For the 1998 NIS and later years, HOSPWT should be used to create nationwide estimates. For NIS databases prior to 1998, the variable HOSPWT_U should be used. Beginning with the 2012 NIS redesign, the hospital weights are no longer needed.

Return to Contents


NIS Unweighted Analysis

Depending on the nature of your research, you may be interested in using the NIS as an unweighted sample.

An analysis of the association between hospital-level results from a survey on safe hospital practices and hospital-level inpatient risk-adjusted mortality scores is one example of research for which it would be appropriate to use the NIS as an unweighted sample.

In most cases, it is critical to weight the data to produce accurate, unbiased results. However, if your research does not necessitate creating national or regional estimates, do not use any weights in your programming.

Return to Contents


NIS Summary

Let's review what you need to do to create national or regional estimates using NIS data.

Prior to 2012, weight your data with discharge weights (DISCWT) for discharge-level estimates.

Check the weighted data totals by performing a quick query on HCUPnet.

In most cases, weighting the data is critical to producing accurate and unbiased results. However, if you are not producing national or regional estimates, but rather are using the NIS as an unweighted sample, do not apply weights to your data.

If you are interested in a more detailed statistical explanation of weighting, refer to the documentation available on the HCUP-US Web site, particularly the Calculating National (Nationwide) Inpatient Sample Variances on the HCUP-US Web site.

Remember that the national databases cannot be used to conduct state-level analyses. If you are interested in performing state-level analyses, use the HCUP state-specific databases.

Return to Contents


Nationwide Emergency Department Sample (NEDS)

The NEDS is a database of emergency department visits - visits for which the patient was treated and released as well as visits that resulted in a hospital admission - going back annually to 2006.

The NEDS can be used to produce national and regional estimates of emergency department care, utilization, access, costs and quality.

In order to produce national or regional estimates of the NEDS data contained in the Core File, you must weight the unweighted observations.

Return to Contents


NEDS Discharge Weights

The weights you apply will depend on the type of analysis you are performing. The NEDS data can be weighted to produce discharge-level estimates or hospital-level estimates.

To produce discharge-level estimates, such as estimating the number of emergency department visits for influenza in the US or estimating the number of emergency department visits for hip fractures among the elderly in the US, you must apply a discharge weight to each record in the Core File.

The discharge weights were calculated for NEDS data by first stratifying the NEDS hospitals on the same variables that were used for creating the sample.

These variables were geographic region, trauma center designation, urban/rural location, teaching status, and ownership.

A weight was then calculated for each stratum by dividing the number of universe discharges in that stratum - obtained from AHA data - by the number of NEDS discharges in the stratum.

Weighted estimates can be calculated by applying the discharge weights to the sample discharges.

Weights have been assigned to each record. In each record in the NEDS Core File, the weight is stored in the data element DISCWT. When the discharge weights are applied to the unweighted NEDS observations, the result is an estimate of the number of discharges for the entire universe. In the case of the NEDS, the universe is emergency department visits nationwide.

Return to Contents


NEDS Unweighted Discharge Record Count

This tutorial will now demonstrate weighting the data using SAS®.

Unweighted Record Count


Title1 'Count records with CCS=123 (influenza) from 2006 NEDS File'; 
libname neds2006 "C:\NEDS 2006\"; 
options obs = MAX PageSize=51 LineSize=146 ;

data influenza1; 
     set NEDS2006.neds_2006_core (keep=HOSP_ED DISCWT NEDS_STRATUM DXCCS1); 
     if dxccs1 eq 123 then influenza = 1; 
     else influenza = 0; 
run;

PROC SURVEYMEANS DATA=influenza1 SUM STD MEAN STDERR ; 
     VAR influenza; 
     CLUSTER hosp_ed ; 
     STRATA NEDS_stratum ; 
run;

  1. As with the NIS data, begin by running a simple program to see how many records there are in the NEDS for which influenza is indicated as a first-listed diagnosis.

  2. Note that influenza is CCS code 123, so look for records in which DXCCS1 equals 123.

  3. 
    data influenza1; 
         set NEDS2006.neds_2006_core (keep=HOSP_ED DISCWT NEDS_STRATUM DXCCS1); 
         if dxccs1 eq 123 then influenza = 1; 
         else influenza = 0; 
    run;
    
    
  4. In the resulting output, the data summary provides the number of strata, clusters, and total observations in the data. The summary in this example confirms a database that contains 71 strata, 958 clusters - each cluster representing a single hospital emergency department - and 25,954,816 records - the number of records in the 2006 NEDS.
  5. 
    Count records with CCS=123 (influenza) from 2006 NEDS File
    
                              The SURVEYMEANS Procedure
                              
                                     Data Summary
                             
                         Number of Strata                      71
                         Number of Clusters                   958
                         Number of Observations          25954816
                         
                                      Statistics
    
                               Std Error
    Variable         Mean         of Mean            Sum              Std Dev
    ---------------------------------------------------------------------------
    influenza    0.001779      0.000065779          46185          1852.887261
    ----------------------------------------------------------------------------
    
    
  6. The statistics section provides the results of the particular analysis. The result is 46,185. This is the number of records in the NEDS for which influenza is indicated as a first-listed diagnosis. This is not an estimate of the number of emergency department visits nationwide for influenza.


NEDS National Discharge-Level Estimates

To estimate the number of nationwide emergency department visits with a first-listed diagnosis of influenza weight the data using the WEIGHT keyword in the PROC SURVEYMEANS step.

The result is 211,740 - an estimate of the number of nationwide emergency department visits, both those in which the patient was treated and released and those that resulted in a hospital admission, with a first-listed diagnosis of influenza in 2007.

National Estimate


Title1 'Produce national estimate of discharges with CCS=123 (influenza) from 2006 NEDS File (weighted)'; 
libname neds2006 "C:\NEDS 2006\"; 
options obs = MAX PageSize=51 LineSize=146 ;


data influenza1; 
        set NEDS2006.neds_2006_core (keep=HOSP_ED DISCWT NEDS_STRATUM DXCCS1); 
        if dxccs1 eq 123 then influenza = 1; 
        else influenza = 0; 
run;
        
PROC SURVEYMEANS DATA=influenza1 SUM STD MEAN STDERR ; 
    VAR influenza; 
    WEIGHT discwt; 
    CLUSTER hosp_ed ; 
    STRATA NEDS_stratum ; 
run;





Produce national estimate of discharges with CCS=123 (influenza) from 2006 NEDS File (weighted)


                                     The SURVEYMEANS Procedure
                                     
                                            Data Summary
                                            
                         Number of Strata                  71
                         Number of Clusters               958
                         Number of Observations      25954816
                         Sum of Weights             120033750
                         
                               Statistics
                                               
                                 Std Error
Variable          Mean             of Mean                 Sum               Std Dev
-------------------------------------------------------------------------------------
influenza      0.001764         0.000070788              211740           8898.259246
-------------------------------------------------------------------------------------

To verify that you have weighted the data correctly, perform a query on HCUPnet.

  1. Go to HCUPnet and select "National Statistics on All ED Visits."

  2. Select "All ED Visits."

  3. You are running a query on a particular first-listed diagnosis so select the first option, "Statistics on specific diagnoses."

  4. Select 2006 as the data year.

  5. Use CCS codes to identify influenza patients. Select "Diagnoses grouped by Clinical Classifications Software (CCS)" and then "First-listed diagnosis."

  6. Highlight CCS code 123 for influenza and select "Next."

  7. Select "Number of Visits."

  8. Select "All patients in all hospitals."

The HCUPnet results and the SAS® results should be the same.

Return to Contents


NEDS Regional Discharge-Level Estimates

Once you have weighted the data, you might want to also produce regional estimates of emergency department visits for influenza. If so, one method for producing these estimates is to create a variable for region by using information contained in the NEDS_STRATUM data element. Then, use the DOMAIN keyword to indicate that you want to produce estimates of influenza discharges by region.

Regional Estimates


Title1 'Produce regional estimates of discharges with CCS=123 (influenza) from 2006 NEDS File (weighted)'; 
    libname neds2006 "C:\NEDS 2006\"; 
    options obs = MAX PageSize=51 LineSize=146 ;
    
    
data influenza1; 
    set NEDS2006.neds_2006_core (keep=HOSP_ED DISCWT NEDS_STRATUM DXCCS1); 
    retain edrecs 1; 
    region = substr(left(put(neds_stratum,8.)),1,1); 
    if dxccs1 eq 123 then influenza = 1; 
    else influenza = 0; 
run;

PROC SURVEYMEANS DATA=influenza1 SUM STD MEAN STDERR ; 
    VAR edrecs; 
    WEIGHT discwt ; 
    CLUSTER hosp_ed ; 
    STRATA NEDS_stratum ; 
    DOMAIN region * influenza ; 
run;
     




Produce regional estimates of discharges with CCS=123 (influenza) from 2006 NEDS File (weighted)


                                 The SURVEYMEANS Procedure

                                        Data Summary

                         Number of Strata                   71
                         Number of Clusters                958
                         Number of Observations       25954816
                         Sum of Weights              120033750

                                         Statistics

                                   Std Error
Variable             Mean            of Mean                Sum              Std Dev
-------------------------------------------------------------------------------------
edrecs            1.000000                  0         120033750               2477212
-------------------------------------------------------------------------------------

                            Domain Analysis: region*influenza
                            
                                      Std Error
region    influenza      Variable       Mean of     Mean      Sum           Std Dev
-------------------------------------------------------------------------------------
1             0           edrecs       1.000000       0   23520030           1120105
              1           edrecs       1.000000       0      27885        620.732277
2             0           edrecs       1.000000       0   27789334           1153925
              1           edrecs       1.000000       0      47477       3196.225129
3             0           edrecs       1.000000       0   46771539           1656723
              1           edrecs       1.000000       0     118487       7742.256091
4             0           edrecs       1.000000       0   21741106            889365
              1           edrecs       1.000000       0      17892       1467.104457
-------------------------------------------------------------------------------------
     
Check your results by running a quick query on HCUPnet.

  1. The first part of the query on HCUPnet will be the same as the query performed previously for the national estimate of emergency department visits with a first-listed diagnosis of influenza.

  2. Select "Number of discharges."

  3. Select "All patients in all hospitals."

  4. In terms of patient and hospital characteristics, this time to see the discharges by region, select "Region of the U.S."

The HCUPnet results and the SAS® results should be the same.

Return to Contents


NEDS Hospital Weights

To produce hospital-level estimates, such as the number of US emergency departments with a trauma center designation, you will apply a hospital weight, HOSPWT, to the data.

HOSPWT was also calculated according to the NEDS strata. Within each of the strata, each hospital's weight is equal to the number of universe hospitals it represents during the year.

To demonstrate how to weight at the hospital level, let's consider an analysis which requires us to estimate the number of emergency departments in the US with a trauma center.

Return to Contents


NEDS Unweighted Hospital ED Record Count

First, tabulate the number of emergency department records in the NEDS from hospitals which are classified as having a trauma center.

Record Count


Title1 'Count hospital records with HOSP_TRAUMA= (1, 2, 3, 8, or 9) from 2006 NEDS HOSPITAL File'; 
libname neds2006 "C:\NEDS 2006\"; 
options obs = MAX PageSize=51 LineSize=146 ;

data TRAUMA1; 
    set NEDS2006.neds_2006_hospital (keep=HOSP_ED DISCWT NEDS_STRATUM HOSP_TRAUMA); 
    if hosp_trauma in (1,2,3,8,9) then trauma = 1; 
    else trauma = 0; 
run;

PROC SURVEYMEANS DATA=TRAUMA1 SUM STD MEAN STDERR ; 
    VAR trauma; 
    CLUSTER hosp_ed ; 
    STRATA NEDS_stratum ; 
run;
    
The result is 131. This is the number of emergency department records in the NEDS from hospitals with a trauma center.

Unweighted Record Count


Count hospital records with HOSP_TRAUMA= (1, 2, 3, 8, or 9) from 2006 NEDS HOSPITAL File

                              The SURVEYMEANS Procedure

                                     Data Summary

					Number of Strata 				71
					Number of Clusters 				958
					Number of Observations 			958

										Statistics

                               Std Error
Variable           Mean          of Mean                   Sum          Std Dev
--------------------------------------------------------------------------------
trauma         0.136743                0            131.000000                0
--------------------------------------------------------------------------------
    


NEDS National Hospital ED-Level Estimate

To estimate the number of hospitals with a trauma center nationwide you will weight the data.

National Estimate


Title1 'Produce national estimate of hospitals with HOSP_TRAUMA = (1, 2, 3, 8, or 9) from 2006 NEDS HOSPITAL File (weighted)'; 
libname neds2006 "C:\NEDS 2006\"; 
options obs = MAX PageSize=51 LineSize=146 ;

data TRAUMA1; 
    set NEDS2006.neds_2006_hospital (keep=HOSP_ED HOSPWT NEDS_STRATUM HOSP_TRAUMA); 
    if hosp_trauma in (1,2,3,8,9) then trauma = 1; 
    else trauma = 0; 
run;

PROC SURVEYMEANS DATA=TRAUMA1 SUM STD MEAN STDERR ; 
    VAR trauma; 
    WEIGHT hospwt; 
    CLUSTER hosp_ed ; 
    STRATA NEDS_stratum ; 
run;
    
The result is 697. This is an estimate of the number of hospitals with a trauma center nationwide.

Unweighted Record Count


Produce national estimate of hospitals with HOSP_TRAUMA = (1, 2, 3, 8, or 9) from 2006 NEDS HOSPITAL File (weighted)

                              The SURVEYMEANS Procedure

                                     Data Summary

                    Number of Strata                 71
                    Number of Clusters              958
                    Number of Observations          958
                    Sum of Weights                   48
                    
                    
                                       Statistics

                               Std Error
Variable           Mean          of Mean               Sum            Std Dev
------------------------------------------------------------------------------
trauma         0.143860                0        697.000000                  0
------------------------------------------------------------------------------
    


NEDS Unweighted Analysis

Depending on the nature of your research, you may be interested in using the NEDS as an unweighted sample.

Examples of an analysis in which it would be appropriate to use the NEDS as an unweighted sample include a hospital-level study in which NEDS emergency department-level data are linked to data on pre-hospital care, such as that provided by emergency medical services.

In most cases, it is critical to weight the data to produce accurate, unbiased results. However, if your research does not necessitate creating national or regional estimates, do not use any weights in your programming.

Return to Contents


NEDS Summary

Let's review what you need to do to create national or regional estimates using NEDS data.

Weight your data with discharge weights (DISCWT) for discharge-level estimates.

Weight your data with hospital weights (HOSPWT) for hospital-level estimates.

Check the weighted data totals by performing a quick query on HCUPnet.

In most cases, weighting the data is critical to producing accurate and unbiased results. However, if you are not producing national or regional estimates, but rather are using the NEDS as an unweighted sample, do not apply weights to your data.

Remember that the national databases cannot be used to conduct state-level analyses. If you are interested in performing state-level analyses, use the HCUP state-specific databases.

Return to Contents


Kids' Inpatient Database (KID)

The third national database is the KID - the database designed specifically for the study of pediatric conditions that require hospitalization.

The KID is produced every three years starting with the 1997 data year.

In order to produce national or regional estimates using KID data, you must weight the unweighted observations in the KID file.

Return to Contents


KID Discharge Weights

KID data must be weighted to perform discharge-level analyses. Because of the sample design of the KID, it cannot be used as an unweighted database. For more information on the unique sample design of the KID, see the Sample Design tutorial.

Discharge weights were calculated for KID data by stratifying the hospitals on the same variables that were used for creating the sample and then creating weights by stratum.

The stratifying variables were geographic region, urban/rural location, teaching status, bed size, ownership, and children's hospital.

Remember that the KID was designed for the study of pediatric hospitalizations, and that the discharges in the KID are a combination of newborn discharges (including both complicated and uncomplicated) and non-newborn pediatric discharges. Because of this, for each stratum, weights were created for both newborn discharges and non-newborn pediatric discharges.

The weights were created for newborn discharges (both complicated and non-complicated) by dividing the number of universe newborns in the stratum by the number of KID newborns in the stratum.

And the weights were created for non-newborn discharges by dividing the number of universe non-newborn pediatric discharges in the stratum by the number of KID non-newborn discharges in the stratum.

Weighted estimates can be calculated by applying the discharge weights to the sample discharges.

Return to Contents


KID Unweighted Discharge Record Count

Next, this tutorial will demonstrate how to weight the KID to produce national and regional estimates of pediatric hospital discharges with a principal diagnosis of cystic fibrosis - a discharge-level estimate. Remember that pediatric discharges are defined as those for which the patient was age 20 or less at admission.

Unweighted Record Count


Title1 'Count records with CCS=56 (cystic fibrosis) from 2006 KID File'; 
libname kid2006 "C:\KID 2006\"; 
options obs = MAX PageSize=51 LineSize=146 ;


data cf1; 
    set KID2006.kid_2006_core (keep=HOSPID DISCWT DXCCS1 KID_stratum); 
    if dxccs1 eq 56 then cysticf = 1; 
    else cysticf = 0; 
run;

PROC SURVEYMEANS DATA=cf1 SUM STD MEAN STDERR ; 
    VAR cysticf; 
    CLUSTER hospid ; 
    STRATA KID_stratum ; 
run;

  1. Tabulate the number of records in the KID for which cystic fibrosis is indicated as a principal diagnosis. In other words, records in which DXCCS1 equals 56


  2. The resulting output provides a data summary of the number of strata, clusters, and total observations in the data.


  3. 
    Count records with CCS=56 (cystic fibrosis) from 2006 KID File
    
                                The SURVEYMEANS Procedure
    
                                      Data Summary
    
                    Number of Strata 				60
                    Number of Clusters 			  3739
    				   Number of Observations      3131324
    
    									
                                       Statistics
    
                                   Std Error
    Variable          Mean           of Mean              Sum                 Std Dev
    ----------------------------------------------------------------------------------
    cysticf       0.001298           0.000103     4063.000000              346.994786
    ----------------------------------------------------------------------------------
    
    

    The summary shown here confirms a database that contains 60 strata, 3,739 clusters-each cluster representing a single hospital-and 3,131,324 records, the number of records in the 2006 KID.

  4. The statistics section provides the results of the particular analysis. The result is 4,063. This is the number of records in the KID for which cystic fibrosis is indicated as a principal diagnosis. This is not an estimate of pediatric discharges nationwide for cystic fibrosis.


KID National Discharge-Level Estimates

Weight the data using the WEIGHT keyword in the PROC SURVEYMEANS step.

National Estimate


Title1 'Produce national estimate of discharges with CCS=56 (cystic fibrosis) from 2006 KID File (weighted)'; 
libname kid2006 "C:\KID 2006\"; 
options obs = MAX PageSize=51 LineSize=146 ;

data cf1; 
    set KID2006.kid_2006_core (keep=HOSPID DISCWT DXCCS1 KID_STRATUM); 
    if dxccs1 eq 56 then cysticf = 1; 
    else cysticf = 0; 
    run;
    
PROC SURVEYMEANS DATA=cf1 SUM STD MEAN STDERR ; 
    VAR cysticf; 
    WEIGHT discwt; 
    CLUSTER hospid ; 
    STRATA KID_stratum ; 
run;


The result is 6,947. This is an estimate of the number of pediatric hospital discharges, nationwide, with a principal diagnosis of cystic fibrosis in 2006.


Produce national estimate of discharges with CCS=56 (cystic fibrosis) from 2006 KID File (weighted)

                            The SURVEYMEANS Procedure

                                  Data Summary

                Number of Strata                            60
                Number of Clusters                        3739
                Number of Observations                 3131324
                Sum of Weights                      7558812.48

                                     Statistics

                               Std Error
Variable          Mean           of Mean              Sum                 Std Dev
-------------------------------------------------------------------------------------
cysticf       0.000919       0.000075764      6946.648756               601.770739
-------------------------------------------------------------------------------------

Note that HCUPnet contains a path for querying National Statistics on Children. However, this path provides statistics only on discharges in which the patient was age 17 or less. The KID file from the Central Distributor provides data on discharges where patients are age 20 or less. As a result, the weighted estimates produced will not exactly match those provided by HCUPnet unless you limit the data set to those age 17 or less when creating your estimate of discharges.

That said, use HCUPnet to get a ballpark idea of what the estimate should be.

  1. Select "National Statistics on Children."

  2. Select "Researcher, medical professional."

  3. Select "Statistics on specific diagnoses or procedures."

  4. Select 2006 as the data year.

  5. Use CCS codes to identify cystic fibrosis patients, so select "Diagnoses grouped by Clinical Classifications Software (CCS)" and then "Principal diagnosis."

  6. Highlight CCS code 56 for cystic fibrosis and select "Next."

  7. Select "Number of discharges."

  8. Select "All patients in all hospitals."

The HCUPnet total will be smaller than the total produced from the KID data using SAS®, but in the same overall range.

Return to Contents


KID Regional Discharge-Level Estimates

Once you have weighted the data, you might want to also produce regional estimates of the number of discharges for cystic fibrosis. Because the design of the KID is different from that of the NIS and NEDS, you need to use a different method for creating regional estimates than the one used above. For the KID, prior to 2012 you have to merge the Hospital File with the Core File in order to pick up the hospital region data element. Then, you use the DOMAIN keyword to indicate that you want to produce estimates of cystic fibrosis discharges by region. Note that the same method can be used to produce regional estimates from the NIS and NEDS.

Regional Estimate


Title1 'Produce regional estimates of discharges with CCS=56 (cystic fibrosis) from 2006 KID File (weighted)'; 
libname kid2006 "C:\KID 2006\"; 
options obs = MAX PageSize=51 LineSize=146 ;

data cf1; 
    set KID2006.kid_2006_core (keep=HOSPID DISCWT DXCCS1); 
    retain dischgs 1; 
    if dxccs1 eq 56 then cysticf = 1; 
    else cysticf = 0; 
run;

proc sort data=cf1; 
     by hospid; 
run;

proc sort data=KID2006.kid_2006_hospital (keep=HOSPID KID_STRATUM Hosp_region) out=hosp; 
     by hospid; 
run;

data cf2; 
    merge cf1 (in=a) 
         hosp (in=b); 
    by hospid; 
    if a and b; 
    region = Hosp_region ; 
run;

PROC SURVEYMEANS DATA=cf2 SUM STD MEAN STDERR ; 
    VAR dischgs; 
    WEIGHT discwt ; 
    CLUSTER hospid ; 
    STRATA KID_stratum ; 
    DOMAIN region * cysticf ; 
run;





Produce regional estimates of discharges with CCS=56 (cystic fibrosis) from 2006 KID File (weighted)

                            The SURVEYMEANS Procedure

                                  Data Summary

                Number of Strata                60
                Number of Clusters            3739
                Number of Observations     3131324
                Sum of Weights          7558812.48

                                    Statistics

                               Std Error
Variable          Mean           of Mean              Sum                 Std Dev
-----------------------------------------------------------------------------------
dischgs       1.000000                 0          7558812                  123453
-----------------------------------------------------------------------------------

                              Domain Analysis: cysticf*region

                                         Std Error
cysticf  region    Variable        Mean    of Mean      Sum              Std Dev
---------------------------------------------------------------------------------
0        1          dischgs     1.000000    0         1276711              56801
         2          dischgs     1.000000    0         1645709              59677
         3          dischgs     1.000000    0         2894738              92104
         4          dischgs     1.000000    0         1734707              67196
1        1          dischgs     1.000000    0     1302.943850         329.418971
         2          dischgs     1.000000    0     2002.856118         397.992297
         3          dischgs     1.000000    0     2222.916758         365.441907
         4          dischgs     1.000000    0     1417.932030         304.223111
---------------------------------------------------------------------------------



KID Weights over Time

The KID discharge weight variable has changed over time.
Years Variable Name Use
2003 and later DISCWT All national estimates
2000 DISCWT National estimates except those including total charge
2000 DISCWTcharge National estimates of total charge
1997 DISCWT_U All national estimates



KID Limitations

Unlike the 1988-2011 NIS and NEDS data, KID data cannot be used to produce hospital-level estimates because the data for the KID was sampled at the discharge level rather than at the hospital level.

Note that the KID cannot be used as an unweighted sample because of the design of the database.

Return to Contents


KID Summary

Let's review what you need to do to create national or regional estimates using KID data.

Weight your data with discharge weights (DISCWT) for discharge-level estimates.

Check the weighted data totals by performing a quick query on HCUPnet.

Hospital-level analyses are not possible with KID data.

The KID cannot be used as an unweighted sample. Weights should always be applied to the data.

Remember that the national databases cannot be used to conduct state-level analyses. If you are interested in performing state-level analyses, use the HCUP state-specific databases.

Return to Contents


Key Points

In summary, weighting is a key concept when working with the HCUP national databases.

What to do:

Remember that the NIS, NEDS, and KID are sample databases. Thus, to produce national or regional estimates from these databases, you must be sure to properly weight the data.

It is important that you select the proper weight based on the database, the year of data, and the type of analysis you are conducting.

Check your estimates against HCUPnet to ensure that you are using the weights appropriately and calculating estimates and variances accurately.

And keep in mind that proper statistical techniques must also be used to calculate standard errors and confidence intervals when using each of the national databases. For detailed instructions, refer to the special report Calculating National (Nationwide) Inpatient Sample Variances on the HCUP-US Web site.

What not to do:

State-level analyses cannot be conducted with the national HCUP databases because the sampling frames are not designed with state as a stratification variable. If you are interested in analyses by state, you should use the state-specific databases.

HCUPnet cannot be used to check unweighted estimates, as weights have been applied to the HCUPnet data.

Remember that the KID cannot be used as an unweighted database.

Return to Contents


Resources and Other Training

If you are looking for more information on the subject matter covered here, several resources are available on the HCUP User Support Web site.

If you can't find what you need, feel free to email the HCUP Technical Assistance staff at hcup@ahrq.gov. AHRQ has senior research personnel available to answer technical questions you may have.

Thank you for accessing this module. There are several other HCUP online tutorials. Access these tutorials to learn if there are other topics that could be helpful to you.

If you have any feedback regarding this module, please email us at hcup@ahrq.gov.

Detailed documentation of HCUP resources is available on the HCUP User Support Web site. TO access documentation for each of the HCUP national databases, see:

NIS Documentation

Additional items of interest include:



Return to Contents



Internet Citation: Producing National HCUP Estimates - Accessible Version. Healthcare Cost and Utilization Project (HCUP). October 2015. Agency for Healthcare Research and Quality, Rockville, MD. www.hcup-us.ahrq.gov/tech_assist/nationalestimates/508_course/508course.jsp.
Are you having problems viewing or printing pages on this Website?
If you have comments, suggestions, and/or questions, please contact hcup@ahrq.gov.
Privacy Notice, Viewers & Players
Last modified 10/6/15