TY - JOUR
T1 - Interobserver variation in breast cancer grading
T2 - A statistical modeling approach
AU - Chowdhury, Nilotpal
AU - Pai, Muktha R.
AU - Lobo, Flora D.
AU - Kini, Hema
AU - Varghese, Rebecca
PY - 2006/8/1
Y1 - 2006/8/1
N2 - OBJECTIVE: To study random and the systematic error in breast cancer grading, to find the source of disagreements and measure the reliability of graders so that appropriate corrective action can be taken. STUDY DESIGN: Five independent observers graded 50 breast carcinoma slides from 50 consecutive breast cancer specimens according to the Nottingham criteria. The polychoric correlation was used to measure association. Stuart-Maxwell and McNemar tests were used to measure equality of thresholds. RESULTS: The polychoric correlation among observers was high (mean = 0.803,0.712, 0.797 and 0.602 for the final grade, tubule formation, nuclear pleomorphism and mitotic figures, respectively). However, there were significant differences in thresholds (6, 7, 7 and 9 pairs of 10 showing significant differences in classification of grades/scores for final grade, tubule formation, nuclear pleomorphism and mitotic counts, respectively). CONCLUSION: The high polychoric correlations suggest that random error in grading breast cancers in this study was low, confirming the underlying reliability of grading and graders. However, significant differences in the thresholds lowers raw agreement. Such a scenario may be rectified by increased intradepartmental discussion.
AB - OBJECTIVE: To study random and the systematic error in breast cancer grading, to find the source of disagreements and measure the reliability of graders so that appropriate corrective action can be taken. STUDY DESIGN: Five independent observers graded 50 breast carcinoma slides from 50 consecutive breast cancer specimens according to the Nottingham criteria. The polychoric correlation was used to measure association. Stuart-Maxwell and McNemar tests were used to measure equality of thresholds. RESULTS: The polychoric correlation among observers was high (mean = 0.803,0.712, 0.797 and 0.602 for the final grade, tubule formation, nuclear pleomorphism and mitotic figures, respectively). However, there were significant differences in thresholds (6, 7, 7 and 9 pairs of 10 showing significant differences in classification of grades/scores for final grade, tubule formation, nuclear pleomorphism and mitotic counts, respectively). CONCLUSION: The high polychoric correlations suggest that random error in grading breast cancers in this study was low, confirming the underlying reliability of grading and graders. However, significant differences in the thresholds lowers raw agreement. Such a scenario may be rectified by increased intradepartmental discussion.
UR - http://www.scopus.com/inward/record.url?scp=33746684302&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33746684302&partnerID=8YFLogxK
M3 - Article
C2 - 16927641
AN - SCOPUS:33746684302
SN - 0884-6812
VL - 28
SP - 213
EP - 218
JO - Analytical and Quantitative Cytology
JF - Analytical and Quantitative Cytology
IS - 4
ER -