Objective To compare the diagnostic performance of five different thyroid ultrasound classification systems, to determine which system is optimal for evaluating thyroid nodules and reducing the unnecessary biopsy rate. Methods In this prospective study, 1,010 nodules referred for biopsy during a 2-year period were classified using five classification systems: the Kwak Thyroid Imaging Reporting and Data System (Kwak TI-RADS), the European TI-RADS (EU TI-RADS, the Korean TI-RADS (K TI-RADS), the American College of Radiology TI-RADS (ACR TI-RADS), and the American Thyroid Association (ATA) classification. After fine needle aspiration biopsy, all classifications were compared for all nodules and also particularly for nodules sized 1-3 cm. Sensitivity, specificity, and interobserver agreement were evaluated for each classification system. Results Of the 939 nodules (after exclusion of Bethesda 3 nodules) finally classified according to the surgical histopathology and cytology results, 73 (7.8%) were malignant and 866 nodules were benign (92.2%). The sensitivity was highest (94.5%) for the ACR TI-RADS and lowest for the Kwak TI-RADS (69%). After exclusion of small (<1 cm) and large nodules (>3 cm); while sensitivity was highest for ATA (97.8%), ACR TI-RADS was the second best classification (91.3%). There was substantial agreement among all classification systems except the Kwak TI-RADS (fair agreement). Conclusions The ACR TI-RADS was the most sensitive ultrasound risk stratification system for all nodules, while the Kwak TI-RADS was the most specific, ie, the most capable of excluding benign nodules based on the combined cytological and histopathological results. ATA and ACR-TIRADS were the most sensitive classification systems for nodules 1 to 3 cm in size. The ACR TI-RADS had higher sensitivity than the Bethesda classification system when compared according to the histopathological results.