The Supreme Court of Appeal of the French Republic has made a finding on December 14, 2022, that ethnicity estimation using family naming practices may be taken into legal account. Why?
The linguistic origin of the name is highly related to ethnicity, as pointed out in anthropology and human biology that the surname origin reflects genetic and cultural transmission. This idea has been used in epidemiology, geography, sociology, economics, and demography. In practice, names are defined, changed or adopted by humans at various moments in history and in the life course, they result inevitably from social naming practices, which occur within specific ethnic, cultural, spatial and secular contexts.
Today, names are widely collected in almost every micro dataset including administrative registers, consumer data or social surveys. While in most datasets, names remain confidential data items that are not disclosed, ethnic affiliation is only collected as part of special purpose-built applications. Yet, ethnicity is an important component in many contemporary social debates, including in inequality of health outcomes and life chances, identity, social integration and segregation. Names-based classifications offer significant utility in generating information on ethnicity while avoiding disclosure of person-identifiable data. Therefore, surname affinity determined by names classification
tools may be used to observe changing relations between naming practices and ethnic identities as a facet of social integration, discrimination and cosmopolitanism in an increasingly diverse society.
PROS and CONS
This blog post further considers the strengths and weaknesses of this approach. Research shows that using someone’s name to assign their ethnicity can help fill gaps when ethnicity data is not readily available.
Modern methods might use a person’s forename, surname, or both. First names often provide information such as gender and historical trends, cultural backgrounds, and nationality. Last names provide information on the roots of the family system and the origin of ethnicity. Using both can increase the chances of being able to predict someone’s ethnicity with greater confidence. But a name-based ethnic classification will never fully represent the ethnic breakdown of a population. This is because it does not correspond to someone’s subjective identification with an ethnic group.
There are limitations to attributing ethnicity using names. To provide the best attribution, the underlying dataset should be a large sample of names, including
multiple spellings and variations, from a similar time period to the target list and geographically similar to the target list. However, some data sources are incomplete and not fully representative of the population. For example, assigning ethnicity using names will misrepresent some groups, including:
• people from a mixed ethnic background – this is because last names are often given to people according to the last names of their father
• black Caribbean people – this is because of similarities between Caribbean and British last names
• people from countries with a predominantly Muslim faith, such as Pakistan and Somalia – this is because of how common some Muslim last names are in different countries in Asia and Africa
• people who married someone of a different ethnicity and who took their partner’s last name when they married
The relationship between ethnicity and names is specific to particular times, places, and groups of people. This affects the overall accuracy of the ethnicity category assigned because predicting someone’s ethnicity based on their name may not get the same result as if they were asked to provide it themselves. For instance, you may tie yourself into knots while attributing ethnicities to such multi-ethnic combinations like Takeshi Kovacs, Sakura Mikolajczak, Chandraharam O'Malley or Margaret Cho.
MONDONOMO would encourage anybody using names to attribute someone’s ethnicity in a dataset to fully understand the limitations of the approach. Trying to overcome these limitations, nevertheless, we aim to allow people to use and interpret the data correctly. The type of information MONDONOMO presents could include:
• details of the classification – for example the statistical modelling or algorithm used
• what level of confidence the analysts have in the ethnicity assigned
• details of the underlying datasets that have been used in the attribution method, such as timeliness and coverage
• any known limitations, such as unavailable ethnic group categories
• levels of unknown attribution