Hello, device!
Can you recognize me without violating my privacy?

Automated biometric recognition systems learn more information
about individuals than humans. In this blog post, we highlight the differences between natural recognition and automated recognition to emphasize the resulting privacy concerns. We explain the automated recognition process and discuss the severity of its privacy and security issues. We conclude by bringing to light modern systems’ approach to mitigating those issues.

device trying to recognize an individual through automated facial recognition

Figure: Picture illustrating a device trying to recognize an individual through automated facial recognition.

Natural Recognition vs. Automated Recognition.   Human beings recognize each other from their appearances (faces, eyes, body shapes, etc.). When they meet for the first time, they unconsciously launch a mutual enrollment process consisting of the extraction of their interlocutor’s physiological characteristics to register and store them in their memory under an identifiable label, most likely the interlocutor’s name with some extra information distinguishing her/him from others. Later, when they meet their interlocutor another time, they recall hers/his physiological characteristics from their identifiable label and compare them with the lively extracted ones of the present interlocutor; she/he is recognized only if the physiological characteristics match. All this naturally happens in the blink of an eye. The biometric recognition machinery has automated this natural recognition process. It has been sophisticatedly trained to represent human physiological characteristics as compact representations, called biometric feature vectors, to accurately recognize humans, surpassing natural human recognition. This technological advance seems impeccable; however, it has raised numerous issues swinging between privacy and security. Humans can learn only what they see, which makes them able to answer direct questions such as: is this person A that I know? Did I meet this person before? However, it is hard for humans to answer more meticulous questions such as: what is this person’s occupation? What are the health problems this person is encountering? Merely from observing their interlocutor’s appearance. Unlike humans, modern biometric recognition systems are trained to recognize humans from their physiological characteristics accurately. They, however, end up learning information exceeding what can naturally be observed, which allows them to answer questions not restricted to the human identity. Therefore, privacy indifferences in a natural human recognition become privacy concerns in an automated recognition.

Sensitivity of biometric data.   During the enrollment and authentication phases, biometric recognition systems process raw biometric samples (e.g., facial images) by transforming them into compact representations (facial feature vectors). A feature vector could be visualized as a point in the feature space where each point represents a biometric sample; thus, samples coming from the same individual should be very close to each other while those coming from different individuals should be distant. For authentication, the recognition systems decide match or no match based on how close/far a reference feature vector (processed and stored during the enrollment) and a probe feature vector (freshly processed during the authentication) are from each other. The metric behind such a recognition decision can be either a similarity or a dissimilarity measure. A similarity measure, such as the Cosine similarity, outputs a similarity score resulting in a match only when it is above a biometric threshold. Contrary to a dissimilarity measure, such as the Euclidean distance, that outputs a dissimilarity score resulting in a match when the score is below a biometric threshold. Thus, the most critical forms of biometric data in recognition systems are feature vectors and (dis-)similarity scores. On the one hand, they are decisive for accurate recognition; on the other hand, they threaten the privacy of individuals in terms of information exceeding their identity. Therefore, apart from the individuals’ identity, what kind of information modern biometric recognition systems can infer from processing biometric data, and what are the consequences of such inference? Studies have shown that from biometric feature vectors, it is possible to infer personal information (e.g., gender, age, health condition, ethnicity, occupation, etc.) [1]. Even more, they were able to reconstruct raw biometric samples (e.g., facial images) of individuals via model inversion attacks [2]. Another line of research is looking at the reconstruction of raw biometric samples from a (dis-)similarity score [3, 4]. The direct access to biometric feature vectors enables undesirable personal information inference via soft biometric classification models. This intensifies the severeness of social issues, such as gender inequality and discrimination, due to biased decision-making models. Such models violate the privacy of individuals by performing classification tasks other than recognition. An elaboration on the topic of fairness and soft biometric privacy can be found in the blog post written by our fellow ESR Zohra Rezgui [5]. Additionally, the clear-text possession of a biometric feature vector or a (dis-)similarity score permits the reconstruction of its corresponding raw biometric sample, leading to security issues such as identity fraud and impersonation attacks. For instance, one can use the reconstructed sample to claim the identity of the sample’s owner by presenting its spoofed version to the targeted system. Therefore, biometric feature vectors are extremely sensitive and require strong protection on both the storage and processing levels.

Mitigation of biometric privacy concerns.   Aware of the fragility of biometric data, the academic and industrial communities are enhancing both the privacy and security of their biometric systems’ design. This initiative gave birth to a branch in the field of biometrics called biometric template protection schemes (BTPs)[6]; schemes that aim to preserve biometric data throughout its lifecycle for a maintained recognition performance. BTPs come in different flavors, each approaching the biometrics privacy challenges using distinct techniques. Essentially, they process the biometric data in the unencrypted domain or the encrypted domain. Among the advantages of processing biometric data in the encrypted domain over the unencrypted one, there are 1) the protected biometric feature vector is irreversible, 2) two protected biometric feature vectors are unlinkable even if they are protecting the same biometric sample, and 3) the biometric recognition accuracy of the unprotected recognition system remains identical when applying encryption. Processing in the encrypted domain relays mainly on the cryptographic technology known as homomorphic encryption (HE) [7] that allows the processing of encrypted data ”in use,” unlike the classical encryption that ensures the encryption only for data ”at rest” and ”in transit.” Recently, HE-based BTPs are gaining more attention for their aim to protect biometric data throughout its entire lifecycle (at rest, in transit, and in use) by using the HE technology. This technology allows performing arithmetic operations over encrypted data without intermediate decryption. Typically, HE-based BTPs encrypt the reference template and probe, measure their (dis-)similarity while encrypted, compare their (dis-)similarity score with the biometric threshold under encryption, and eventually output the comparison outcome encrypted; a ciphertext containing match or no match. Subsequently, this encrypted comparison outcome is forwarded to the party interested in verifying the physical presence of the individual in question. An HE scheme could support two modes: the single key mode, where only a single party can decrypt, and the threshold key mode (or its extension multi-key HE), where the decryption has to be performed jointly among the parties involved in the encryption. In the case of a single decryption key, the verifying party holds the decryption key to be able to decrypt the comparison outcome. However, the same key could be used to reveal the protected biometric data. Hence this verifying party should neither store the protected reference template nor the protected probe. As for any application involving encryption, key management is also crucial to the design of HE-based BTPs. These were some glimpses of constraints encountered when designing an HE-based BTP where one should ensure that the system’s keys are preserved and the encrypted biometric data is stored and processed by the right parties. In conclusion, if HE-based BTPs are well-designed and the underlying HE scheme is secure, then devices can verify the physical presence of individuals without violating their privacy.

References:

[1] A. Acien, A. Morales, R. Vera-Rodriguez, I. Bartolome, and J. Fierrez, “Measuring the gender and ethnicity bias in deep models for face recognition,” in Iberoamerican Congress on Pattern Recognition, pp. 584–593, Springer, 2018.

[2] Y. Zhang, R. Jia, H. Pei, W. Wang, B. Li, and D. Song, “The secret revealer: Generative model-inversion attacks against deep neural networks,” in Proceedings of the IEEE/CVF, Conference on Computer Vision and Pattern Recognition, pp. 253–261, 2020.

[3]  P. Mohanty, S. Sarkar, and R. Kasturi, “From scores to face templates: A model-based approach,” IEEE transactions on pattern analysis and machine intelligence, vol. 29, no. 12, pp. 2065–2078, 2007.

[4]  P. Mohanty, S. Sarkar, and R. Kasturi, “Reconstruction of biometric image templates using match scores,” Apr. 24 2012. US Patent 8,165,352.

[5]  Z. Rezgui, “Soft biometric privacy and fairness: two sides of the same coin?.” , January 2022.

[6]  M. Sandhya and M. V. Prasad, “Biometric template protection: A systematic literature review of approaches and modalities,” Biometric Security and Privacy, 2017.

[7]  S. Erabelli, “What is homomorphic encryption?.”, April 2022.

This blog post was written by Amina Bassit. Since May 2020, she has been a Ph.D. candidate at the University of Twente. Her research within the PriMa project focuses on integrating biometric recognition with homomorphic encryption to mitigate recognition’s privacy and security issues.

Amina Bassit