Elite Biostats: ROC curve in BioStatistics

ROC (Receiver Operating Characteristic) Curve is used to evaluate the accuracy of a test to discriminate disease cases from normal cases, i.e. the performance of diagnosis. Also, it can be used to compare two or more tests [1]. ROC curve is to plot the Sensitivity (true positive rate) against the 1-Specificity (false positive rate). Each point on the curve represents a sensitivity/specificity pair and results in a particular decision threshold. The Area Under the Curve (AUC) is a measure to describe how well the test can distinguish the disease and normal.

When you consider the test in two populations, one with disease and the other normal, you can rarely observe a perfect separation between the two groups. Assume the distribution of the test results is normal, these two test results will overlap as [1],

For every possible cut-off point or criterion value you select to discriminate between the two populations, there will be some cases with the disease correctly classified as positive (TP = True Positive fraction), but some cases with the disease will be classified negative (FN = False Negative fraction). On the other hand, some cases without the disease will be correctly classified as negative (TN = True Negative fraction), but some cases without the disease will be classified as positive (FP = False Positive fraction).

There are some statistics based on TN, TP, FN, FP as below [1],

The different fractions (TP, FP, TN, FN) are represented in the following table.

	Disease
Test	Present	n	Absent	n	Total
Positive	True Positive (TP)	a	False Positive (FP)	c	a + c
Negative	False Negative (FN)	b	True Negative (TN)	d	b + d
Total		a + b		c + d

Sensitivity: probability that a test result will be positive when the disease is present (true positive rate, expressed as a percentage). = a / (a+b) Specificity: probability that a test result will be negative when the disease is not present (true negative rate, expressed as a percentage). = d / (c+d) Positive likelihood ratio: ratio between the probability of a positive test result given the presence of the disease and the probability of a positive test result given the absence of the disease, i.e. = True positive rate / False positive rate = Sensitivity / (1-Specificity) Negative likelihood ratio: ratio between the probability of a negative test result given the presence of the disease and the probability of a negative test result given the absence of the disease, i.e. = False negative rate / True negative rate = (1-Sensitivity) / Specificity Positive predictive value: probability that the disease is present when the test is positive (expressed as a percentage). = a / (a+c) Negative predictive value: probability that the disease is not present when the test is negative (expressed as a percentage). = d / (b+d)
When you select a higher criterion value, the false positive fraction will decrease with increased specificity but on the other hand the true positive fraction and sensitivity will decrease; Meanwhile, when you select a lower criterion value, then the true positive fraction and sensitivity will increase. On the other hand the false positive fraction will also increase, and therefore the true negative fraction and specificity will decrease. The ROC curve In a Receiver Operating Characteristic (ROC) curve the true positive rate (Sensitivity) is plotted in function of the false positive rate (100-Specificity) for different cut-off points. Each point on the ROC curve represents a sensitivity/specificity pair corresponding to a particular decision threshold. A test with perfect discrimination (no overlap in the two distributions) has a ROC curve that passes through the upper left corner (100% sensitivity, 100% specificity). Therefore the closer the ROC curve is to the upper left corner, the higher the overall accuracy of the test [2].

An ROC curve demonstrates several things [3]:

It shows the trade-off between sensitivity and specificity (any increase in sensitivity will be accompanied by a decrease in specificity).
The closer the curve follows the left-hand border and then the top border of the ROC space, the more accurate the test.
The closer the curve comes to the 45-degree diagonal of the ROC space, the less accurate the test.
The slope of the tangent line at a cut point gives the likelihood ratio (LR) for that value of the test.
The area under the curve is a measure of test accuracy.

ROC curve in SAS
The ROC plot in SAS program is embedded in proc logistic procedure as following [3],

ods graphics on;
ods html;
proc logistic data=mydata plots(only)=(roc);
model Y=marker;
run;
ods html close;
ods graphics off;

ROC curve in R

You can use the function "roccurve" to plot ROC curve in R. Also, you can implement this function in SAS by SUBMIT.

You can use R in SAS by the code SUBMIT [3]. The SUBMIT statement takes options, and it is the option R on the SUBMIT statement which indicates that code is to be directed to R. Thus, a clearer indication of the usage of submit block code would be:

submit;
<SAS data step or procedure code to be executed>
endsubmit;
submit / R; <R code to be executed>
endsubmit;

Now, it is likely that R code is being submitted from a SAS session because the user is performing data manipulation (and perhaps some analyses) in SAS, but R has some functions for data analysis which are not available in SAS. Thus, data exist in the SAS session, and must be passed to the R session. The submit block directs code to an R session, but the user also needs to exchange data between SAS and R – often in both directions.

References:
1. http://www.medcalc.org/manual/roc-curves.php
2. Zweig MH, Campbell G (1993) Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. Clinical Chemistry 39:561-577. [Abstract]
3. ROC curve in SAS: http://labs.fhcrc.org/pepe/dabs/ROC_Curve_Plotting_in_SAS_9_2.pdf

Elite Biostats

Monday, October 8, 2012

ROC curve in BioStatistics

The ROC curve

No comments:

Post a Comment