Whether used to access your phone, to move through an airport, to access digital services, or to aid in forensic investigations, face recognition has become broadly used and broadly discussed in media, technology, and political circles. However, its functionality and use are often misunderstood. Here, we will discuss face recognition (also commonly referred to as “facial recognition”) from a variety of perspectives with a goal of demystifying this impactful but often confusing technology.
1. What is Face Recognition?
Face recognition is the automated comparison of one face image and one or more other face images for the sake of establishing or verifying identity. These faces could be two dimensional (like a selfie) or three dimensional (like from a specialized sensor on a smartphone), they could be from a photo (like a passport) or a video (like from a security camera), they could be from a basic camera (like a webcam) or a dedicated reader (like an airport eGate terminal). In any case, the features of one face are extracted and compared with the features of another face (or faces) to understand their similarity and support assertions about identity.
In of itself, face recognition does not suggest a specific use, which is part of the reason the technology is misunderstood. The same core technology can be used to access a smartphone app or identify a wanted criminal. No doubt, the implications of these uses are very different, and should be treated as such in terms of policy, configuration, and deployment. But in each case, the goal is to establish or verify a presented identity. In other words, face recognition technology may be “one thing” but face recognition solutions are anything but that.
2. How Face Recognition Works
A conventional perspective on face recognition is that it maps a face image, looking at the geometry of various features (such as the distance between the eyes) and applying an algorithm to determine if the geometries on images are close enough to each other to suggest a match. However, this is an outdated perspective: Essentially all leading face recognition technologies rely on Convolutional Neural Networks (“CNNs”) or similar machine learning / Artificial Intelligence technology.
CNNs don’t measure specific geometries of the face. They look at images in a similar way to how biological organisms look at the world, looking at shapes, contours, shading, and geometries in a combined snapshot. (There are myriad resources online for understanding CNNs – a good place to start is Wikipedia’s entry on the topic.) This approach has fundamentally shifted the accuracy of face recognition, bringing it from one of the less accurate biometric identification technologies to perhaps the most accurate available.
Face recognition systems typically work through a combination of a few different steps, each with their own AI technology integrated. Some people refer to these each or in combination as “algorithms” although AI technologists would more frequently refer to these as “models.” They include:
- Face detection – Before you can compare face images, you have to find faces in an image. Good face detection finds faces even when the environmental or image quality conditions are poor. On the other side, there are many examples of poor performing technologies that don’t effectively detect faces, let alone matching them.
- Face landmark detection – Here, the system finds key landmarks on the face. As noted above, these landmarks are not used for geometric analysis in matching itself. Rather, they are used to properly align the face for optimal matching.
- Face image quality analysis – This is an often overlooked but critical aspect of face recognition. One of the great mistakes that can be made with face recognition (or any biometric identification or computer vision classification problem) is to try and compare poor quality images. Poor quality images result in poor matching results. A properly implemented system should assess all frames for image quality, only passing sufficient quality images for matching. (For more information on image quality, see Face Recognition & Biometric Image Quality).
- Template / embedding creation – Once a face is detected and determined to be of sufficient quality, it is converted to a numerical representation which is vector, and not something that looks at all like the face itself. This is typically referred to as a “template” in the biometric industry, and an “embedding” in the AI industry – these are equivalent when it comes to face recognition.
Template comparison – Face matching happens by comparing two templates. As the templates are actually just vectors, template comparison is accomplished by calculating the distance between these two vectors using an equation like Euclidian distance or similar calculation. The closer the distance, the closer the match between the templates.
3. How Face Recognition is Deployed
We can look at the deployment of face recognition from a number of perspectives. Here, we will focus on high level use cases as well as how the technology is “packaged” into a product.
Use Cases: 1:1 Verification vs. 1:n Identification vs . 1:N Identification
While at its core face recognition compares the characteristics of one face with another, this can be extended to compare one face with a group of others. Comparing one face with another is typically referred to as 1:1 (“one-to-one”) verification. Here, the goal is to see if a presented identity matches one on record, answering the question “Are you who you say you are?” One image can also be compared against a group of others, which is referred to as identification. Identification is typically categorized as 1:n (“one-to-few”) or 1:N (“one-to-many”). In either case, identification typically answers the question ‘“Who are you?” or “Have I seen you before?” The specific use cases and implications for 1:1, 1:n, and 1:N will be discussed below.
Packaging Face Recognition
Face recognition is just a technology, and it needs to be made available in a consumable form for it to be valuable. The form of this “packaging” depends on a variety of factors, including the use case, the controls around deployment, and a given face recognition vendor’s strategy. Some of the typical packages for face recognition include:
- As a series of models: As noted, face recognition is fundamentally a series of models. It is possible to share the technology in this way, but it is highly uncommon as models are not particularly user or developer friendly.
- As an SDK: An SDK (Software Development Kit) integrates the face recognition technology into a developer-friendly library that is deployable on a certain chipset architecture, operating system, and programming language. SDKs are very flexible, but also take more development time than the next level up. Meanwhile, if face recognition is being deployed in an embedded system such as an automobile or access control device, the SDK is the standard approach due to the tightly integrated and resource constrained nature embedded systems.
- As a Docker container: Docker containers are extremely popular as they integrate the SDK functionality and all other system dependencies into a single package and expose that functionality via a simple web service API using a technology like REST or gRPC. This approach is less flexible than the SDK, but extremely efficient to integrate and manage.
- As a Service: In some cases, face recognition can be deployed by API as a hosted web service. This is the easiest from a development perspective, but offers the least amount of flexibility and control. In contrast to other technologies, it is not as common for face recognition due to appropriately heightened sensitivities around privacy and security.
4. How Face Recognition is Measured
As with any piece of software, speed, memory footprint, and storage requirements are all important characteristics of face recognition. However, for face recognition, the most important characteristic is accuracy. Accuracy can typically be measured as a combination of the following:
- Failure to Acquire (FTA) or Failure to Enroll (FTE) rates: This is a measure of the face recognition system’s inability to detect, capture, or create a template from a face in an image. Operationally, this has a similar implication to a false negative, but the implications on system design and technology selection are different, which is why this is a valuable metric on its own.
- False Negative Match Rate (FNMR) or False Negative Identification Rate (FNIR): FNMR and FNIR both measure the likelihood of a system to not properly identify a given person, i.e. to deliver a false negative (e.g. “Susan tries to access her phone, but it says the face isn’t Susan”). FNMR is used for 1:1 verification and FNIR is used for 1:N identification.
- False Match Rate (FMR) or False Position Identification Rate (FPIR): FMR and FPIR both measure the likelihood of a system to misidentify one person as another, i.e. to deliver a false positive (e.g. “Steve tries to enter a building, but the system says the face is actually George’s”). FMR is used for 1:1 verification and FPIR is used for 1:N identification.
Any system is configured to balance the likelihood of false negatives and false positives. Although it depends on the use case, one way to look at this is that false negatives drive user experience whereas false positives drive security. For leading 1:1 face recognition systems, FMR is typically measured at ~1 in 1 million (i.e. 1e-6 or 0.000001) and FNMR is typically measured at ~1 in 1,000 (i.e 1e-3 or 0.001 or 0.1%). The tradeoff between false positives and false negatives is displayed in an ROC (“Receiver Operating Characteristic”) or DET (“Detection Error Tradeoff”) curve. Here is an example of DET curves for one of Paravision’s submissions to NIST FRVT (more on NIST below):
Note that (i) higher False Match Rates correlate to lower False Non-Match Rates, and vice-versa and (ii) different image types have different characteristics. There is not a single ROC / DET curve for a specific face recognition technology. There is a single ROC curve for a specific technology when measured against a specific dataset. Different datasets with similar image characteristics should deliver similar ROC curves, whereas different datasets with different image characteristics may deliver very different ROC curves.
The role of NIST FRVT
The National Institute for Standards and Technology (NIST) Face Recognition Technology Evaluation (FRTE, previously called NIST FRVT) is the preeminent method for assessment of face recognition technology, and nearly ever major face recognition vendor around the world submits to NIST FRTE. For more detail on NIST FRTE (previously FRVT), please see Paravision’s Introduction to NIST FRTE.
5. Primary Applications for Face Recognition
As noted above, face recognition technology can be applied to a wide range of applications. These range from 1:1 verification for PC login to 1:N identification for surveillance, and everywhere between. This is not to suggest all applications are appropriate or ethical uses – more on this in the next section. Typical use cases include:
- Smartphone / PC login – Made popular by Apple’s FaceID, this is probably the most broadly deployed use case for face recognition. Are you allowed access to this device? This replaces the inconvenient and insecure use of a PIN with a face comparison. This is typically 1:1 verification.
- ID Verification / Digital ID – Similar in use as login, face recognition is used here to confirm the identity of someone applying for or logging into a digital service, such as a government portal, gaming site, gig economy app, or otherwise. This is typically 1:1 for verification against a document, in some cases complemented by 1:N for duplicate record checks or biometric-only login.
- Travel and borders – Here, face recognition augments or replaces the repeated manual identity checks that happen in air travel, from check-in to bag drop to security to immigration to boarding. Faces might be compared against a passport or other travel document (typically a 1:1 verification), or against a trusted traveler service such as CLEAR or Global Entry (typically 1:N identification).
- Enterprise security – In this use case, faces can replace ID cards to allow faster and more secure access to facilities, while screening for known individuals who may pose a security risk. This is typically 1:N identification for tokenless access or 1:1 verification if deployed with a second factor such as a card or mobile device.
- Defense, intelligence, and forensic applications – For government applications, face recognition can help identify known or wanted individuals in a variety of use cases. Applications in Western countries are most typically oriented at comparison of captured imagery or video under the guidance of a warrant. This is typically 1:N identification against a watchlist.
- Surveillance – Face recognition can be deployed for real-time surveillance in a variety of applications, as has notably been the case in China and other countries. This is typically 1:N identification against a watchlist or other database.
6. The Role of Ethics in Face Recognition
No doubt, the societal implications of the spectrum of use cases is critical to consider, and a considered approach typically results in limits being applied to the use of a technology. Simply because a technology may be deployed in a given application doesn’t mean it should be deployed. One example of this is the surveillance application noted above.
Industry leaders are strongly advocating for ethical and legal frameworks that set appropriate guardrails for the use of face recognition technology. Paravision, for example, operates under a set of AI Principles that were developed with the help of an outside AI Ethics Advisor and to which all employees adhere.
An ethical approach to face recognition is not limited to use case consideration and limitations. Like an AI technology it also must consider the data that is used to develop (i.e. train) the technology, and the impact of demographics on performance. Demographic variation, often referred to as algorithmic bias, is especially important in face recognition due to the fundamental reliance on human characteristics.
Various studies have addressed demographic performance in leading face recognition technologies. Paramount here are NIST FRTE (previously NIST FRVT) as well as the DHS Biometric Technology Rally. Importantly, leading modern face recognition technology has shown the ability to deliver very low error rates across demographic groups in contrast to earlier or poorer performing technologies.
For more information on this important topic, please see Paravision’s Trust page, Key Considerations for Choosing a Face Recognition Partner, and Face Recognition Best Practices – Developing Policies for Consumer-Facing Deployments.
7. How Face Recognition Compares With Other Biometric Modalities
Face recognition is one of a number of leading biometric identification and authentication technologies. The development of modern AI software, acceleration frameworks, hardware, and imaging systems has created distinct advantages for face recognition, but there are values to other modalities as well. Just like use case considerations, selection of a biometric modality should be a considered activity. This is not an exhaustive list, but may help to frame a deeper analysis.
8. Capabilities Frequently Conflated or Confused With Face Recognition
Consumer media and other stakeholders often mistakenly conflate or confuse related technologies with face recognition. A primary example of this is emotion detection. Emotion detection is a fraught and poorly understood “technology” that is often referred to as face recognition, even though it is another technology altogether. Because of the problematic use cases and technology basis for emotion detection, this often reflects negatively, but mistakenly on face recognition. While each looks at images of the face and uses AI technology, they are otherwise unrelated, based on fundamentally different considerations, and fundamentally different in terms of accuracy, benchmarking, and use.
9. Capabilities That Should Be Considered for Deployment With Face Recognition
Certain technologies are not face recognition per se, but should be closely considered when deploying face recognition. Primary among these is liveness detection, also referred to as anti-spoofing or Presentation Attack Detection. Whereas face recognition compares the similarity of face images, liveness looks at any given face and determines the likelihood it is authentic, and not a physical spoof (such as a printout, digital display, or mask) or a digital spoof (i.e. a deepfake). This is an important pairing for face recognition, especially in unattended use cases like access control or digital ID verification: A secure solution should not only determine that a face matches, but that it is really and authentically being presented. For more information on this topic, please see Paravision’s white paper Authentic Identity: Challenges and Opportunities in Physical and Digital Domains.