We performed a human study through Amazon Mechanical Turk using the FaceScrub dataset in conjuction with the MegaFace dataset.

Test Set

Since all the identities in the FaceScrub dataset are celebrities, we sorted all the names in FaceScrub according to the number of results that Google image search returns per person as a measure of popularity and chose the 50 most popular people and the 50 least popular people as the test set. Each person had 100 photos, we selected one photo as the probe image, and used the 99 others as gallery images. We then produced 99 positive pairs per person and ramdomly selected 10K photos from our MegaFace dataset as the distractors. This results in a total of 100 x (99 + 10K) pairs.


We presented to turkers 10 pairs per page and asked them to click on each pair that contained the same person.

We collected pairs recieving one click or more for a sorting experiment. Only the pairs that include the probe photos were collected and we created a set of possible matches per probe. We generated triples of probe with two possible matches, presented 10 triples per page, and asked which match is the person in the probe. This sorting determined the position of each gallery photo relative to the distractor images.


Group Rank-1 Rank-10
All 23.9 91.13
Males 23.35 89.98
Females 24.01 92.5
Less Popular 22.7 90.9
More Popular 25.1 91.3


Group TAR @ 2 × 10−3 TAR @ 5 × 10−2
All 41.6 76.5
Males 43.7 79.0
Females 39.4 73.9
Less Popular 39.4 74.7
More Popular 43.6 78.2