The MegaFace dataset is the largest publicly available facial recognition dataset with a million faces and their respective bounding boxes. All images obtained from Flickr (Yahoo's dataset) and licensed under Creative Commons.

If you wish to request access to dataset please follow instructions on challenge page

Download: (.zip) (.tar.gz) — 65 GB

License


By downloading the dataset you must agree to the following terms:

[RESEARCHER_FULLNAME] (the "Researcher") has requested permission to use the MegaFace database (the "Database") at the University of Washington. In exchange for such permission, Researcher hereby agrees to the following terms and conditions:

  1. Researcher shall use the Database only for non-commercial research and educational purposes.
  2. University of Washington makes no representations or warranties regarding the Database, including but not limited to warranties of non-infringement or fitness for a particular purpose.
  3. Researcher accepts full responsibility for his or her use of the Database and shall defend and indemnify the University of Washington, including their employees, Trustees, officers and agents, against any and all claims arising from Researcher's use of the Database, including but not limited to Researcher's use of any copies of copyrighted images that he or she may create from the Database.
  4. Researcher may provide research associates and colleagues with access to the Database provided that they first agree to be bound by these terms and conditions.
  5. The University of Washington reserves the right to terminate Researcher's access to the Database at any time.
  6. If Researcher is employed by a for-profit, commercial entity, Researcher's employer shall also be bound by these terms and conditions, and Researcher hereby represents that he or she is fully authorized to enter into this agreement on behalf of such employer.
  7. The law of the State of Washington shall apply to all disputes under this agreement.

Documentation


Folders

The top level folders are named by the first three digits of user ids that they include.

The faces are sorted into folders by user. These folders are named in the form: ########@N## The part of the name prior to the "@" is the user id that the images are from. The folders contain the images and their bounding box data.

Faces

These files are named in the form: ##########_#. The part of the name before the "_" identifies the image the face is from and the part after identifies the number associated with that face in the image.
Accompanying JSON files use same name as corresponding face with ".json" appended. Contains information on bounding box, rotation, confidence, and landmarks.