Pain intensity is the most commonly used outcome domain in pain clinical trials. To minimize the chances of type II error (ie, concluding that a treatment does not have beneficial effects, when in fact it does), the measure of pain intensity used should be sensitive to changes produced by effective pain treatments. Here we sought to identify the combination of pain intensity ratings that would balance the need for reliability and validity against the need to minimize assessment burden. We conducted secondary analyses using data from a completed 4-arm clinical trial of psychological pain treatments (N = 164 adults). Current, worst, least, and average pain intensity in the past 24 hours were assessed 4 times before and after treatment using 0 to 10 numerical rating scale-11. We created a variety of composite scores using these ratings and evaluated their reliability (Cronbach's alphas) and validity (ie, associations with a gold standard score created by averaging 16 ratings and sensitivity for detecting between-group differences in treatment efficacy). We found that for each measure, reliability increased as the number of ratings used to create the measures increased and that ratings from 3 or more days were needed to have adequately strong associations with the gold standard. Regarding sensitivity, the findings suggest that composite scores made up of ratings from 4 days are needed to maximize the chances of detecting treatment effects, especially with smaller sample sizes. In conclusion, using data from 3 or 4 days of assessment may be the best practice. PERSPECTIVE: Composite scores made up of at least 3 days of pain ratings appear to be needed to maximize reliability and validity while minimizing the assessment burden. TRIAL REGISTRATION: clinicaltrials.gov NCT01800604.