MegaFace

The MF2 training dataset is the largest (in number of identities) publicly available facial recognition dataset with a 4.7 million faces, 672K identities, and their respective bounding boxes. All images obtained from Flickr (Yahoo's dataset) and licensed under Creative Commons.

If you wish to request access to dataset please follow instructions on challenge page

Download

Loosely Cropped (Padded)

([0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13]) — Each file ~65 GB and contains up to 50K identities.

Tightly Cropped (Face detection box region only)

(tar.gz) — 159 GB and contains all 672K identities.

Metadata (Detection coordinates, full image URLs, landmarks)

(tar.gz) — 1.4 GB and contains a corresponding JSON Metadata file for each face image

1M Disjoint Distractors

Contains 1M unlabeled faces which are not found in the MF2 training set. For use with challenge 2

Tightly Cropped (tar.gz) — 24 GB

Loosely Cropped (tar.gz) — 68 GB

License

By downloading the dataset you must agree to the following terms:

[RESEARCHER_FULLNAME] (the "Researcher") has requested permission to use the MegaFace database (the "Database") at the University of Washington. In exchange for such permission, Researcher hereby agrees to the following terms and conditions: Researcher shall use the Database only for non-commercial research and educational purposes. University of Washington makes no representations or warranties regarding the Database, including but not limited to warranties of non-infringement or fitness for a particular purpose. Researcher accepts full responsibility for his or her use of the Database and shall defend and indemnify the University of Washington, including their employees, Trustees, officers and agents, against any and all claims arising from Researcher's use of the Database, including but not limited to Researcher's use of any copies of copyrighted images that he or she may create from the Database. Researcher may provide research associates and colleagues with access to the Database provided that they first agree to be bound by these terms and conditions. The University of Washington reserves the right to terminate Researcher's access to the Database at any time. If Researcher is employed by a for-profit, commercial entity, Researcher's employer shall also be bound by these terms and conditions, and Researcher hereby represents that he or she is fully authorized to enter into this agreement on behalf of such employer. The law of the State of Washington shall apply to all disputes under this agreement.

Documentation

Folder Naming

Each folder is named [FlickID]_identity_[IdentityID]. And is a training identity. For example 100003507@N04_identity_10 is the 11th (zero indexed) identity found in the Flickr account identified by 100003507@N04.

File Naming

Files are named in the form: ##########_#. The part of the name before the "_" identifies the image the face is from and the part after identifies the number associated with that face in the image.

Full Flickr Image

Stored in the Metadata JSON object for each image. This is a static link to the full image downloaded that the face images were found in.

Loose Box Coordinates

Expanded face detection region. Stored in the Metadata JSON object for each image. These coordinates (left, right, top, bottom), are with respect to the full Flickr image.

Tight Box Coordinates

Stored in the Metadata JSON object for each image. These coordinates (left, right, top, bottom), are with respect to the full Flickr image, and are also completely contained within the loose cropped image coordinates.

Landmarks

Stored in the Metadata JSON object for each image. Landmarks are 68 points detected by the DLIB framework (link) and are x/y coordinates with respect to the tight face bounding box.

MF2 Training Dataset