The MF2 training dataset is the largest (in number of identities) publicly available facial recognition dataset with a 4.7 million faces, 672K identities, and their respective bounding boxes. All images obtained from Flickr (Yahoo's dataset) and licensed under Creative Commons.
If you wish to request access to dataset please follow instructions on challenge page
Loosely Cropped (Padded)
([0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13]) — Each file ~65 GB and contains up to 50K identities.Tightly Cropped (Face detection box region only)
(tar.gz) — 159 GB and contains all 672K identities.Metadata (Detection coordinates, full image URLs, landmarks)
(tar.gz) — 1.4 GB and contains a corresponding JSON Metadata file for each face image1M Disjoint Distractors
Contains 1M unlabeled faces which are not found in the MF2 training set. For use with challenge 2
Tightly Cropped (tar.gz) — 24 GB
Loosely Cropped (tar.gz) — 68 GB
By downloading the dataset you must agree to the following terms:
[RESEARCHER_FULLNAME] (the "Researcher") has requested permission to use the MegaFace database (the "Database") at the University of Washington. In exchange for such permission, Researcher hereby agrees to the following terms and conditions:
Each folder is named [FlickID]_identity_[IdentityID]. And is a training identity. For example 100003507@N04_identity_10 is the 11th (zero indexed) identity found in the Flickr account identified by 100003507@N04.
Files are named in the form: ##########_#. The part of
the name before the "_" identifies the image the face is from and the part after identifies the number associated with that face in the image.
Stored in the Metadata JSON object for each image. This is a static link to the full image downloaded that the face images were found in.
Expanded face detection region. Stored in the Metadata JSON object for each image. These coordinates (left, right, top, bottom), are with respect to the full Flickr image.
Stored in the Metadata JSON object for each image. These coordinates (left, right, top, bottom), are with respect to the full Flickr image, and are also completely contained within the loose cropped image coordinates.
Stored in the Metadata JSON object for each image. Landmarks are 68 points detected by the DLIB framework (link) and are x/y coordinates with respect to the tight face bounding box.