Abstrakt | Fraunhofer IGD evaluated the recognition accuracy of four commercial off-the-shelf face identification systems for use in forensics. Face identification systems for use in forensics provide candidate lists with a selectable length (here: 100 candidates) to be checked by experts in forensic face recognition. The German Federal Criminal Police Office (BKA) provided the digital face images used in the evaluation. In each system under test, approximately 4.8 million frontal portraits of approximately 3 million test subjects served as reference images. The reference databases remained unchanged for all searches. If available, several frontal portraits were enrolled per test subject without linking them together. The following probe images were used:
- 10,000 mated (i.e. paired with references from the same test subject) frontal portraits of 10,000 ranndomly selected test subjects,
- 10,000 non-mated (i.e. not paired with any references from the same test subject) frontal portraits of 10,000 randomly selected test subjects,
- 10,000 mated frontal portraits of 10,000 test subjects wearing glasses,
- 10,000 mated frontal portraits of 10,000 bearded test subjects,
- 600 mated frontal portraits of 147 test subjects, taken a known period of time after the corresponding reference portrait (up to about 9 years),
- 10,000 mated half-profile portraits (i.e. head rotated by 45° around the vertical axis) of 10,000 ran domly selected test subjects,
- 10,000 non-mated half-profile portraits of 10,000 randomly selected test subjects,
- up to 257 mated portraits from different angles of 181 test subjects: images on which the head is only rotated by 10°, 20°, 30°, 45°, 60°, 70°, 80°, or 90° in one direction around the vertical axis (»yaw angle«), images on which the head is only lowered or raised by -45°, -30°, -20°, -10°, 10°, 20°, 30°, or 45° around the transverse axis (»pitch angle«), images on which the head is only inclined by 10°, 20°, 30°, or 45° in one direction around the longitudinal axis (»roll angle«).
The most interesting metric for use in forensics is the false-negative identification rate at rank 100 (abbreviated rank-100 FNIR). When searching against the 10,000 mated frontal portraits of randomly selected test subjects, the best systems under test achieved a rank-100 FNIR of 0.3% ± 0.1%. The rank-100 FNIR values for searches against frontal portraits of test subjects with glasses are not significantly higher than the rank-100 FNIR values for searches against frontal portraits of randomly selected test subjects are. For the best system under test in this category, the rank-100 FNIR value for searches against frontal portraits of bearded subjects (0.4% ± 0.1%) is also not much higher than the rank-100 FNIR value is for searches against frontal portraits of randomly selected test subjects. With the available data, no dependency of the FNIR on the time elapsed since the reference portrait was taken could be determined for the systems under test. In order to evaluate the influence of the image quality on the recognition accuracy, the quality of copies of the 10,000 mated frontal portraits of randomly selected test subjects was degraded in different ways and to varying degrees. The best systems under test in this category show no significant increase in rank-100 FNIR if the quality reductions result in a peak signal-to-noise ratio of at least 20 dB. When searching against half-profile portraits, the best system under test achieved a rank-100 FNIR of 3.0% ± 0.3%. Searches against high-quality probe portraits with a yaw angle of up to 30° led to rank-100 FNIR values similar to those achieved in searches against frontal portraits. Searches against high quality probe portraits with a pitch angle of up to 20° also led to rank-100 FNIR values similar to those achieved in searches against frontal portraits. In order to avoid the weaknesses of the individual face identification systems, the candidate lists of any two systems were merged in a simple way at rank level (by means of the Borda count method) to joint candidate lists. The rank-100 FNIR values of the best system pairs are only about half the rank-100 FNIR values of the best individual systems under test. The evaluation results apply to the targets of evaluation only in the respective tested configuration. They should not be construed as maximum-effort full-capability results. |
---|