HCUP Calculating Standard Errors

Contents:

Introduction

Welcome
About HCUP
Learning Objectives

Standard Errors

Importance of Calculating Standard Errors
HCUP Nationwide Database Sample Design
Finite Population Correction
Statistical Software
National Estimate Example
Example Results
Verification of Results

Standard Errors for Subsets

Calculating Standard Errors for Subsets
Subsets: Recommended Method
Subsets: Recommended Method Results
Verification of Results
Subsets: Alternate Method
Subsets: Alternate Method Results
Verification of Results

Significance Testing

Using the Z-Test Calculator
Z-Test Calculator LOS
Z-Test Calculator Trend

Wrap-Up

Key Points
Resources and Other Training

Welcome

Thank you for joining us for this Healthcare Cost and Utilization Project (HCUP) online tutorial on Calculating Standard Errors.

My name is Sarah, and I am going to show you how to calculate standard errors for national estimates calculated from the HCUP nationwide databases.

This tutorial is for researchers who have some background in basic research methods and who understand how to produce national estimates using the HCUP nationwide databases.

For a detailed description on how to produce regional and national estimates using these databases, please refer to the Producing HCUP National Estimates Tutorial.

Return to Contents

About HCUP

Before we get started, a quick word about HCUP:

HCUP is sponsored by the Agency for Healthcare Research and Quality (AHRQ). HCUP is a family of databases, software tools, and related research products that enable research on a variety of healthcare topics. The nationwide HCUP databases are designed to facilitate the development of national and regional estimates.

If you are unfamiliar with HCUP or would like a refresher, please consider taking our HCUP Overview Course.

Return to Contents

Learning Objectives

The goal of this tutorial is to show you how to determine the precision of the estimates you calculate from HCUP nationwide databases so that you will be able to draw sound conclusions from your analyses.

By the end of this tutorial, you will:

Understand how to calculate standard errors for the national estimates calculated from the HCUP nationwide databases, the National (Nationwide) Inpatient Sample (NIS), the Nationwide Emergency Department Sample (NEDS), the Kids' Inpatient Database (KID), and the Nationwide Readmissions Database (NRD).

And understand how to calculate standard errors for estimates based on subgroups of the nationwide databases.

Return to Contents

Importance of Calculating Standard Errors

Standard error is a measure of the precision of a statistic. It reflects the amount that a sample statistic's value would fluctuate if a large number of samples were to be drawn using the same sampling design. Less precise estimates have larger standard errors while more precise estimates have smaller standard errors.

Standard errors can be used to determine if differences between two sample statistics are significant, or construct confidence intervals for sample estimates of population statistics such as the mean.

The calculation of a standard error involves another statistical measure: standard deviation. Standard deviation measures the spread of individual data values around the mean.

For a simple random sample of size n from a large population, standard error of the mean equals the standard deviation of the sample divided by the square root of the sample size.

Return to Contents

HCUP Nationwide Database Sample Design

The HCUP nationwide databases are not simple random samples. The NIS (beginning with data year 2012) KID, and NRD are stratified samples. The NIS was redesigned in 2012 to improve national estimates. Prior to its redesign, the NIS was a stratified two-stage cluster sample without replacement. The NEDS also is a stratified two-stage cluster sample without replacement. Standard formulas for a stratified two-stage cluster sample without replacement may be used to calculate standard errors in most applications for all four samples. Although a sample of hospitals is not drawn for the NIS (beginning with data year 2012), KID, or NRD, for estimation purposes, hospitals should be treated as though they were selected at the first stage of sampling from the entire universe of hospitals within each stratum. Examples provided in this tutorial use 2013 NIS data, but the same standard error calculations apply to prior data years of the NIS as well as to the NEDS, KID, and NRD. To review the sample designs, refer to the HCUP Sample Design Tutorial.

The Nationwide Inpatient Sample

Prior to data year 2012, the NIS was a stratified two-stage cluster sample, similar to the NEDS. Beginning with the 2012 data year, the NIS is a stratified sample of hospital discharges. Discharges in the sampling frame are stratified by five key hospital characteristics. Then, a systematic random sample of discharges is chosen from each of the strata after the discharges are sorted by "control" variables ordered as follows: encrypted hospital ID, Diagnosis-Related Group (DRG), admission month, and a random number. Although the NIS is not a cluster sample, (discharges are sampled from all frame hospitals) discharges are still clustered within hospitals. Consequently, each hospital is considered a cluster for the purpose of calculating standard errors.

The Nationwide Emergency Department Sample

The NEDS is a stratified two-stage cluster sample. Hospital-based emergency departments in the sampling frame are stratified by five key hospital characteristics. Then, a random sample of hospital-based emergency departments is chosen from each of the strata. In sampling terminology, each emergency department is considered a cluster. The NEDS includes all discharges from the selected clusters, or emergency departments.

The Kids Inpatient Database

The KID is comprised of a sample of pediatric discharges from all hospitals in the sampling frame. Discharges are stratified by whether they are an uncomplicated in-hospital birth, a complicated in-hospital birth, or a pediatric non-birth. For the KID, a random sample of 10% of uncomplicated in-hospital births and 80% of all other pediatric discharges is selected.

The Nationwide Readmissions Database

The NRD is drawn from HCUP State Inpatient Databases (SID) that contain reliable, verified patient linkage numbers that can be used to track a person across hospitals within a State, while adhering to strict privacy guidelines. All of the discharges in the sampling frame were included, making the NRD a sample of convenience. Discharges are post-stratified for the purpose of weighting by hospital characteristics (census region, urban/rural location, hospital teaching status, size of the hospital defined by the number of beds, and hospital control) and patient characteristics (sex and five age groups [0, 1-17, 18-44, 45-64, and 65 and older]).

Return to Contents

Finite Population Correction

The procedures being described in this tutorial all assume inferences to a large population. Therefore, the finite population correction is not used. It is applied only when inferences are being made to the specific population of patients actually hospitalized during the year of the data. Usually analysts prefer not to use the finite population correction because they are interested in the long-run results for hospitals. For example, interest centers on the true, long-run mortality rate for a hospital rather over multiple years rather than to the mortality rate actually observed in a single year.

For more information on instances in which it would be appropriate to use the finite population correction, please refer to the Calculating Nationwide Inpatient Sample (NIS) Variances for Data Years 2011 and Earlier and Calculating National Inpatient Sample (NIS) Variances for Data Years 2012 and Later reports.

Return to Contents

Statistical Software

Several statistical programming packages can be used to calculate sample statistics and appropriate standard errors based on data from complex sampling designs. Some examples of these statistical programming packages are SAS®, SUDAAN®, STATA®, and SPSS®.

I will use SAS in today's demonstrations. In particular, I will use the SAS survey sampling and analysis procedures.

SURVEYFREQ
SURVEYLOGISTIC
SURVEYMEANS
SURVEYREG

These procedures incorporate the complex sample design of the HCUP nationwide databases into the analysis. They MUST be used when calculating national estimates, regional estimates and standard errors.

The HCUP reports Calculating Nationwide Inpatient Sample (NIS) Variances for Data Years 2011 and Earlier and Calculating National Inpatient Sample (NIS) Variances for Data Years 2012 and Later provide more information as well as example code for calculating standard errors using other statistical packages.

Return to Contents

National Estimate Example

First I will show you how to produce standard errors for statistics based on the entire National Inpatient Sample. The SAS program code below produces national estimates of the sums, the means, and the standard errors for the number of discharges, the length of stay, the percentage of people who died during hospitalization, and the total hospital charges from the 2013 NIS.

LIBNAME NIS2013 "C:\";

DATA NIS_2013_CORE;
    SET NIS2013.NIS_2013_CORE;
    LENGTH DISCHGS 3;
    RETAIN DISCHGS 1;
RUN;

PROC SURVEYMEANS DATA=NIS_2013_CORE SUM STD MEAN STDERR MISSING;
    WEIGHT discwt;
    CLASS died;     
    FORMAT died FDIED.;
    CLUSTER hosp_nis;
    STRATA nis_stratum;
    VAR DISCHGS los died totchg;
RUN;

In all examples, the following conventions apply:

Lowercase words denote NIS variable names.

UPPERCASE WORDS denote keywords and options that are part of the programming language as well as user-defined variables.

LIBNAME NIS2013 "C:\";

DATA NIS_2013_CORE;
    SET NIS2013.NIS_2013_CORE;
    LENGTH DISCHGS 3;
    RETAIN DISCHGS 1;
RUN;

PROC SURVEYMEANS DATA=NIS_2013_CORE SUM STD MEAN STDERR MISSING;
    WEIGHT discwt;
    CLASS died;     
    FORMAT died FDIED.;
    CLUSTER hosp_nis;
    STRATA nis_stratum;
    VAR DISCHGS los died totchg;
RUN;

When you select "SET": Keep all observations in the CORE file.

SET NIS2013.NIS_2013_CORE;

When you select "LENGTH" By default, numeric variables have a length of 8. In order to reduce the size of the file, a length of 3 is sufficient for binary variables.

LENGTH DISCHGS 3;

When you select "RETAIN" Create a dummy variable to ensure that every observation will be included in the discharge count.

RETAIN DISCHGS 1;

When you select "PROC_SURVEYMEANS" The PROC SURVEYMEANS statement invokes the SAS procedure.

PROC SURVEYMEANS DATA=NIS_2013_CORE SUM STD MEAN STDERR MISSING;

When you select "DATA" The DATA= option requests that the analysis be performed on the NIS 2013 Core file.

PROC SURVEYMEANS DATA=NIS_2013_CORE SUM STD MEAN STDERR MISSING;

When you select "SUM" The SUM option requests the sum for variables listed in the VAR statement. For example, the variable DISCHGS is set to equal 1 for every record, so its sum estimates the total number of discharges.

PROC SURVEYMEANS DATA=NIS_2013_CORE SUM STD MEAN STDERR MISSING;

When you select "STD" The STD option requests the standard deviation of the sum.

PROC SURVEYMEANS DATA=NIS_2013_CORE SUM STD MEAN STDERR MISSING;

When you select "MEAN_STDERR" The MEAN and STDERR options request that the mean and its standard error be printed.

PROC SURVEYMEANS DATA=NIS_2013_CORE SUM STD MEAN STDERR MISSING;

When you select "MISSING" If you specify the MISSING option in the PROC SURVEYMEANS statement, the procedure treats missing values of a categorical variable as a valid category. Otherwise, observations with missing values of a categorical variable would be excluded from estimates.

PROC SURVEYMEANS DATA=NIS_2013_CORE SUM STD MEAN STDERR MISSING;

When you select WEIGHT The WEIGHT statement weights each record by the value of the variable DISCWT.

WEIGHT discwt;

When you select CLASS The CLASS statement identifies DIED as a categorical variable for which a ratio analysis is performed (ratio of sum of DIED to sum of DISCWT).

CLASS died;

When you select FORMAT The FORMAT statement is used to add value labels. In this example, it is assigning value labels for the class variable, DIED. If the FORMAT statement is not used, the SAS output will only display values (i.e., 0 and 1). The value labels help clarify the results (e.g., 0 represents patients that did not die in the hospital and 1 represents patients that died in the hospital).

FORMAT died FDIED.;

When you select CLUSTER The CLUSTER statement specifies HOSP_NIS as the cluster identifier. The cluster is the hospital.

CLUSTER hosp_nis;

When you select STRATA The STRATA statement specifies NIS_STRATUM as the stratum identifier. In the case of the NIS, the strata are based on hospital characteristics.

STRATA nis_stratum;

Return to Contents

Example Results

Here are the results of the program.

                                                       The SURVEYMEANS Procedure

                                                             Data Summary

                                                 Number of Strata                 202
                                                 Number of Clusters              4363
                                                 Number of Observations       7119563
                                                 Sum of Weights              35597792


                                                        Class Level Information
 
   CLASS
   Variable    Label                            Levels    Values

   DIED        Died during hospitalization           4     .: Missing .A: Invalid  0: Did not die in hospital  1: Died in hospital  


                                                              Statistics
 
                                                                                       Std Error
Variable  Level                        Label                                 Mean        of Mean                Sum            Std Dev
--------------------------------------------------------------------------------------------------------------------------------------
DISCHGS                                                                      1.00           0.00         35,597,792            296,045
LOS                                    Length of stay (cleaned)              4.55           0.02        161,796,496          1,466,640
TOTCHG                                 Total charges (cleaned)          39,513.25         480.47  1,378,643,839,214     21,505,352,862
DIED       .: Missing                  Died during hospitalization           0.00           0.00              9,585              2,359
          .A: Invalid                  Died during hospitalization           0.00           0.00              3,575                855
           0: Did not die in hospital  Died during hospitalization           0.98           0.00         34,912,122            290,483
           1: Died in hospital         Died during hospitalization           0.02           0.00            672,510              7,974
--------------------------------------------------------------------------------------------------------------------------------------

As you can see, there are 202 sampling strata; 4,363 clusters, each of which is a hospital; and 7,119,563 unweighted sample records in the 2013 NIS.

         Data Summary

Number of Strata                 202
Number of Clusters              4363
Number of Observations       7119563
Sum of Weights              35597792

According to the results, it is estimated that nationwide there were a total of 35,597,792 inpatient discharges with a standard deviation of 296,045.

                                                                                         Std Error
Variable  Level                        Label                                 Mean        of Mean                Sum            Std Dev
--------------------------------------------------------------------------------------------------------------------------------------
DISCHGS                                                                      1.00           0.00         35,597,792            296,045
LOS                                    Length of stay (cleaned)              4.55           0.02        161,796,496          1,466,640
TOTCHG                                 Total charges (cleaned)          39,513.25         480.47  1,378,643,839,214     21,505,352,862
DIED       .: Missing                  Died during hospitalization           0.00           0.00              9,585              2,359
          .A: Invalid                  Died during hospitalization           0.00           0.00              3,575                855
           0: Did not die in hospital  Died during hospitalization           0.98           0.00         34,912,122            290,483
           1: Died in hospital         Died during hospitalization           0.02           0.00            672,510              7,974
--------------------------------------------------------------------------------------------------------------------------------------

The estimated average length of stay was 4.55 days with a standard error of .02 days.

The estimated average total charge was $39,513.25 with a standard error of $480.47.

The mean of the flags indicating death during hospitalization was 0.02. In other words, 2 percent of stays resulted in death during hospitalization with a standard error of 0.00 percent.

Return to Contents

Verification of Results

The results of the example analysis can be verified using HCUPnet.

Here are the results of an HCUPnet query corresponding to our SAS program.

When the results of the SAS program are compared to HCUPnet output, all of the estimates and standard errors agree: total discharges, length of stay, total charges, and in-hospital deaths.

When the results of the SAS program are compared to HCUPnet output, you may notice small discrepancies in some estimates. HCUPnet uses data that are stored as SAS files. The NIS files that are purchased through the HCUP Central Distributor are sent as ASCII files. Weights (for making national estimates) in the ASCII files are truncated at the fourth decimal place, thus some resulting estimates will be slightly different from those from HCUPnet; however, the differences should be very small.

Return to Contents

Calculating Standard Errors for Subsets

What if your research focuses on only a subset of discharges from the NIS, such as hospital stays in which a coronary artery bypass graft, or CABG (pronounced "cabbage") was performed? Does calculating standard errors for a subset of discharges differ from calculating standard errors for estimates based on the entire sample?

Yes. When you produce statistics based on all the discharges in the sample, you include discharges from all of the hospitals in the sample, and thus take all of the hospitals, or clusters, in the sample into account.

If you select a subset of discharges, your subset may not include discharges from all of the hospitals in the sample.

However, to produce accurate standard errors, you must account for all of the hospitals in the sample.

The standard errors from a subset will be correct if every sample hospital has at least one observation in the subset.

There are two methods you can use to account for all of the hospitals in the sample:

The recommended method uses all of the records in the core file and identifies discharges of interest.
The alternate method subsets the database and creates "dummy" records for hospitals in every stratum to ensure the appropriate calculation of standard errors. This method is sometimes necessitated by computer memory limitations, and may be of particular use when working with the Nationwide Emergency Department Sample--which contains 30 million unweighted observations. We will look at both methods.

Return to Contents

Subsets: Recommended Method

The recommended method for calculating standard errors requires more disk space and CPU time than the alternate method because the HCUP nationwide databases have a large number of records, all of which are involved in the recommended method. This may present a challenge in terms of disk space or software capabilities when using a database such as the 2013 NEDS--which contains roughly 30 million unweighted observations. In this case the alternate method, which we will look at shortly, may be more appropriate. See below for an explanation of each line of code and the recommended method for calculating standard errors.


LIBNAME NIS2013 "C:\";

/* CREATE SUBSET OF CABG PROCEDURES */
DATA CABGSUBSET;
    SET NIS2013.NIS_2013_CORE;
    LENGTH DISCHGS CABG 3;
    RETAIN DISCHGS 1;
    IF PRCCS1=44 THEN CABG=1;
    ELSE CABG=0;
RUN;

PROC SURVEYMEANS DATA=CABGSUBSET SUM STD MEAN STDERR MISSING;
    WEIGHT discwt;
    CLASS died;
    FORMAT dief fdied.;
    CLUSTER hosp_nis;
    STRATA nis_stratum;
    VAR DISCHGS los died totchg;
    DOMAIN CABG;
RUN;

When you select "SET NIS.NIS_2013" Keep all observations in the CORE file.

When you select "RETAIN DISCHGS 1" Create a dummy variable to ensure that every observation will be included in the discharge count.

When you select "IF prccs1=44 THEN CABG=1" PRCCS1 is the data element in which the CCS principal procedure is stored and the CCS code for CABG is 44. For more information on Clinical Classification Software (CCS) and CCS codes, visit the HCUP-US Tools & Software page.

When you select "CABG=0" Initialize a variable to flag discharges for which coronary artery bypass graft, or CABG, was the principal procedure performed.

When you select "DOMAIN CABG" Use the CABG flag in the SAS DOMAIN statement in the SURVEYMEANS procedure. The DOMAIN statement requests analyses for a subpopulation (i.e. CABG procedures) and enables appropriate calculations for statistics in each domain.

Return to Contents

Subsets: Recommended Method Results

The data summary shows the output accounts for all 4,363 hospitals in the sample and all 7 million unweighted observations. The first set of statistics, where CABG equals zero, are for discharges which did not have a CABG performed. The second set of statistics, where CABG equals one, are for those discharges for which CABG was the principal procedure.


                                                                 The SURVEYMEANS Procedure

                                                                       Data Summary

                                                           Number of Strata                 202
                                                           Number of Clusters              4363
                                                           Number of Observations       7119563
                                                           Sum of Weights              35597792


                                                                  Class Level Information
 
             CLASS
             Variable    Label                            Levels    Values

             DIED        Died during hospitalization           4     .: Missing .A: Invalid  0: Did not die in hospital  1: Died in hospital  

                                                                 Domain Statistics in CABG
 
                                                                                                       Std Error
CABG    Variable    Level                          Label                                   Mean          of Mean                  Sum              Std Dev
----------------------------------------------------------------------------------------------------------------------------------------------------------
   0    DISCHGS                                                                            1.00             0.00           35,440,072              294,316
        LOS                                        Length of stay (cleaned)                4.52             0.02          160,334,536            1,449,387
        TOTCHG                                     Total charges (cleaned)            38,971.03           476.96    1,353,657,499,120       21,351,712,415
        DIED         .: Missing                    Died during hospitalization             0.00             0.00                9,545                2,356
                    .A: Invalid                    Died during hospitalization             0.00             0.00                3,545                  854
                     0: Did not die in hospital    Died during hospitalization             0.98             0.00           34,757,392              288,803
                     1: Died in hospital           Died during hospitalization             0.02             0.00              669,590                7,935
   1    DISCHGS                                                                            1.00             0.00              157,720                4,347
        LOS                                        Length of stay (cleaned)                9.27             0.06            1,461,960               41,387
        TOTCHG                                     Total charges (cleaned)           160,477.45         2,469.74       24,986,340,094          738,469,446
        DIED         .: Missing                    Died during hospitalization             0.00             0.00                   40                   18
                    .A: Invalid                    Died during hospitalization             0.00             0.00                   30                   25
                     0: Did not die in hospital    Died during hospitalization             0.98             0.00              154,730                4,275
                     1: Died in hospital           Died during hospitalization             0.02             0.00                2,920                  142
----------------------------------------------------------------------------------------------------------------------------------------------------------

Results show an estimated total of 157,720 hospitalizations in which CABG is the principal procedure with a standard deviation of 4,347.

The average length of stay, indicated as LOS, is estimated at 9.27 days with a standard error of 0.06 days.

The estimated average total charges were $160,477.45 with a standard error of $2,469.74.

The mean of the flags indicating death during hospitalization was 0.02. In other words, 2 percent of stays resulted in death during hospitalization with a standard error of 0.00 percent.

Return to Contents

Verification of Results

The results of the example analysis can be verified using HCUPnet.

Here are the results of a query corresponding to our SAS program.

The results of the SAS program are compared to HCUPnet output and you can see that all of the estimates are the same.

Return to Contents

Subsets: Alternate Method

The alternate method for calculating appropriate standard errors is to subset the nationwide database to the observations of interest. Then, append one "dummy" observation for each of the hospitals included in the nationwide database that is not represented in the subset. The dummy observations ensure that all the hospitals in the sample are taken into account, resulting in the accurate calculation of standard error.

To do this, you must concatenate the subset of interest with the HOSPITAL file.

  
LIBNAME NIS2013 "C:\";   

/* CREATE SUBSET OF CABG PROCEDURES */
DATA CABGSUBSET;
    SET NIS2013.NIS_2013_CORE;
    LENGTH DISCHGS 3;
    RETAIN DISCHGS 1;
    IF PRCCS1=44;
   
/* CREATE ANALYSIS FILE */
DATA CABGSUBSET;
    SET CABGSUBSET
        NIS2013.NIS_2013_HOSPITAL (IN=INHOSP KEEP=HOSP_NIS NIS_STRATUM)
    ;  
    LENGTH INSUBSET 3;
    INSUBSET = 1;
    IF INHOSP THEN DO;
        INSUBSET = 2;   /* ASSIGN A VALUE OUTSIDE THE SUBSET */
        DISCWT   = 1;   /* ASSIGN A VALID WEIGHT */
        /* ASSIGN ANALYSIS VARIABLES TO 0 */
        DISCHGS  = 0;
        los      = 0;
        died     = 0;
        totchg   = 0;
    END;
RUN;

TITLE "CABG Subset Statistics Using Alternative Method";
PROC SURVEYMEANS DATA=CABGSUBSET SUM STD MEAN STDERR MISSING;
    WEIGHT discwt;
    CLASS died;   
    FORMAT died FDIED.;
    CLUSTER hosp_nis;
    STRATA nis_stratum;
    VAR DISCHGS los died totchg;
    DOMAIN INSUBSET;
RUN;

The Hospital File is a supplemental file which is provided with the NIS Core File. It contains a few key variables for each hospital included in the nationwide database.

Constructing this smaller database allows you to work around any memory limitations.

Including dummy observations for each of the hospitals in the database ensures that the statistics you calculate will be accurate. The estimates produced with the alternate method are the same as those produced with the recommended method.

When you select "NIS.NIS_2013_HOSPITAL" Append dummy observations from the HOSPITAL file. The variable INHOSP indicates which file the observation came from. In this case, INHOSP=1 indicates that the observation came from the HOSPITAL file.

When you select "INSUBSET=1" Create a flag to indicate observations that came from the CABG subset.

When you select "IF INHOSP THEN DO; INSUBSET = 2" Set the value of INSUBSET to 2 to indicate the observation did not come from the CABG subset (i.e., did not have the CCS code = 44 for CABG procedures).

When you select "discwt =1" Assign a valid weight value to non-CABG subset observations from the HOSPITAL FILE to ensure that every hospital will be included in the standard error calculations.

When you select "DISCHGS=0; los=0; died=0; totchg=0" Assign non-missing values to of variables of interest for non-CABG subset observations from the HOSPITAL file to ensure that every hospital will not be included in the standard error calculations.

When you select "DOMAIN INSUBSET" The variable INSUBSET is used to indicate whether or not an observation came from the CABG subset. In this case the statistics will be calculated separately for observations that came from the CABGSUBSET file and those that did not. Thus, we will only be interested in the results for INSUBSET = 1.

Return to Contents

Subsets: Alternate Method Results

The alternate method produces the same correct statistical output as the recommended method. Again, results of the analysis can be verified using HCUPnet.

                                                                  The SURVEYMEANS Procedure

                                                                       Data Summary

                                                           Number of Strata                 202
                                                           Number of Clusters              4363
                                                           Number of Observations         35907
                                                           Sum of Weights            162083.005

                                                               Domain Statistics in INSUBSET
 
                                                                                                        Std Error
  INSUBSET   Variable   Level                         Label                                  Mean         of Mean                 Sum             Std Dev
  -------------------------------------------------------------------------------------------------------------------------------------------------------
         1   DISCHGS                                                                         1.00            0.00             157,720               4,347
             LOS                                      Length of stay (cleaned)               9.27            0.06           1,461,960              41,387
             TOTCHG                                   Total charges (cleaned)          160,477.45        2,469.74      24,986,340,094         738,469,446
             DIED        .: Missing                   Died during hospitalization            0.00            0.00                  40                  18
                        .A: Invalid                   Died during hospitalization            0.00            0.00                  30                  25
                         0: Did not die in hospital   Died during hospitalization            0.98            0.00             154,730               4,275
                         1: Died in hospital          Died during hospitalization            0.02            0.00               2,920                 142
         2   DISCHGS                                                                         0.00            0.00                   0                   0
             LOS                                      Length of stay (cleaned)               0.00            0.00                   0                   0
             TOTCHG                                   Total charges (cleaned)                0.00            0.00                   0                   0
             DIED        .: Missing                   Died during hospitalization            0.00            0.00                   0                   0
                        .A: Invalid                   Died during hospitalization            0.00            0.00                   0                   0
                         0: Did not die in hospital   Died during hospitalization            1.00            0.00               4,363                   0
                         1: Died in hospital          Died during hospitalization            0.00            0.00                   0                   0
  -------------------------------------------------------------------------------------------------------------------------------------------------------

Return to Contents

Verification of Results

Remember, if the alternate method was not correctly applied, and all hospitals in the sample were not included in the analysis, standard errors will be incorrect.

The SURVEYMEANS Procedure

                                                                       Data Summary

                                                           Number of Strata                 124
                                                           Number of Clusters              1110
                                                           Number of Observations         31544
                                                           Sum of Weights            157720.005


                                                                  Class Level Information
 
             CLASS
             Variable    Label                            Levels    Values

             DIED        Died during hospitalization           4     .: Missing .A: Invalid  0: Did not die in hospital  1: Died in hospital  


                                                                        Statistics
 
                                                                                                   Std Error
    Variable    Level                          Label                                   Mean          of Mean                  Sum              Std Dev
    --------------------------------------------------------------------------------------------------------------------------------------------------
    DISCHGS                                                                            1.00             0.00              157,720                3,439
    LOS                                        Length of stay (cleaned)                9.27             0.06            1,461,960               33,676
    TOTCHG                                     Total charges (cleaned)           160,477.45         2,373.93       24,986,340,094          615,460,503
    DIED         .: Missing                    Died during hospitalization             0.00             0.00                   40                   18
                .A: Invalid                    Died during hospitalization             0.00             0.00                   30                   25
                 0: Did not die in hospital    Died during hospitalization             0.98             0.00              154,730                3,385
                 1: Died in hospital           Died during hospitalization             0.02             0.00                2,920                  134
    --------------------------------------------------------------------------------------------------------------------------------------------------

Here is an example of output from a program which does not account for all hospitals in the sample. The number of strata and clusters do not reflect the complete sample. The standard errors produced when all hospitals are not accounted for are incorrect and could lead to erroneous conclusions in your research. It is critical to ensure you obtain a correct standard error.

Return to Contents

Using the Z-Test Calculator

Once you have calculated standard errors for the subset of discharges you are studying, you may want to check to see if there are any statistically significant differences between outcomes or measures of hospital stays in your subset and other subsets.

The Z-Test calculator is a convenient way to do just that. It can be accessed by clicking the Z-test calculator link below any HCUPnet query results page.

The Z-test calculator allows you to test the significance of the difference between two weighted counts, means, or percentages.

Return to Contents

Z-Test Calculator LOS

To test if the length of stay of a discharge with a principal procedure of CABG is significantly different from that of stays which did not have a principal CABG procedure, select the Z-Test calculator.

Enter the estimated length of stay and corresponding standard error for the CABG discharges, and then enter the estimated length of stay and corresponding standard error for discharges for which CABG was not the principal procedure. Then select calculate. The calculator provides the associated standard error, z statistic, and p-value for the test.

As you can see, the difference between the two estimates is statistically significant at p < 0.001.

Note that estimates for non-CABG cases are only valid when using the recommended method of calculating standard errors including both CABG and non-CABG cases with a CABG indicator. Non-CABG estimates are not valid using the alternate method of calculating standard errors because it only includes dummy records for non-CABG cases.

Return to Contents

Z-Test Calculator Trend

Perhaps I am also Interested in testing to see if there has been a statistically significant change in the number of hospital stays with CABG between 2003 and 2013.

To test if the number of discharges with a principal procedure of CABG in 2013 is significantly different from that in 2003, enter the 2003 estimate and corresponding standard error and then the 2013 estimate and corresponding standard error. Select calculate. The calculator provides the associated standard error, z statistic and p-value for the test.

As you can see, the difference between the two estimates is statistically significant at p < 0.001.

Return to Contents

Key Points

As you calculate sample statistics and standard errors from the HCUP nationwide databases, you should consider the following key points:

The HCUP nationwide databases are not simple random samples and the usual variance calculations cannot be used.
When using the HCUP nationwide databases to produce national and regional estimates, a statistical programming package that incorporates the complex sample design into the data analysis must be used.
When calculating statistics such as standard errors, all hospitals in the sample must always be accounted for, even if you are only interested in a subset of records. This can be accomplished using either of the methods outlined in this tutorial.

Return to Contents

Resources and Other Training

If you are looking for more information on the subject matter covered here, several resources are available on the HCUP User Support (HCUP-US) website: hcup-us.ahrq.gov.

If you can't find what you need, feel free to email the HCUP Technical Assistance staff at hcup@ahrq.gov. AHRQ has research personnel available to respond to technical questions you may have. Inquiries are answered within three business days.

Thank you for accessing this module. There are several other HCUP Online Tutorials. Take a look to see if there are other topics that could be helpful to you.

If you have any feedback regarding this module, please email us at hcup@ahrq.gov.

Detailed documentation of HCUP is available on the HCUP User Support website (https://hcup-us.ahrq.gov For documentation on each of the HCUP national databases, click on the links below:

Special Methods Documents are available at https://hcup-us.ahrq.gov/reports/methods.jsp. Specific reports of interest to this module include:

User Support

HCUP Calculating Standard Errors - Accessible Version