Text-to-speech system improvements are best assessed by periodic listening tests. This papers reports on the results of two listening test methodologies: measurement of perceptions of speech quality using a single-item scale and a nine-item scale. The tests engender two important conclusions. First, TTS systems as well as individual synthesized sentences used to test the system are factors that affect a listener's perception. Second, the single-item scale can be a reliable and valid substitute for the nine-item scale with appropriate empirical justification when pragmatic considerations such as respondent fatigue necessitate the use of a shorter scale.