Sociocultural Insensitivity and Problems with Psychological Testing: An Overview

academicmemories
Oct 1, 2025
10 min read

by Kimberly Hall

(image credit: "20151104-OSEC-LSC-0828" by USDAgov is marked with Public Domain Mark 1.0)

In 2021, the American Psychological Association (APA) released a formal apology to people of color for its role in both shaping and maintaining race-based social hierarchy in the United States. In this statement, the APA acknowledges that psychological assessment in research and diagnostic practice–and notably intelligence testing, with early attempts to both define and discretely measure an individual’s capacity for ‘correct’ learning and advanced thought–has long been used to amplify White supremacy and justify inequitable treatment of non-White communities by means of active exclusion from research and discourse, and by the promotion of eugenics and biological essentialism regarding societal structure, segregation, interracial relationships, and forced sterilization. Their detailed acknowledgment serves as an important step towards creating a more inclusive and firmly anti-racist psychology.

As researchers like Holden and Tanenbaum (2023) point out, the US population is becoming increasingly diverse on multiple fronts, with a minority-majority population estimated by the year 2050. This ongoing demographic shift makes the need for equity and cross-cultural sensitivity in psychological assessment more relevant than ever. However, many traditional psychological models, diagnostic standards, and testing methods were not designed to take social and cultural background into account. In this article, I will give a brief overview of some of the general problems produced by a lack of sociocultural sensitivity in such tests, and discuss some potential avenues used by researchers for correcting those problems.

Common Problems with Psychological Tests

Lack of Diversity and Sociocultural (Mis)Matching

A significant and multifaceted challenge in many areas of psychological testing is removing as much bias from a testing situation as possible, so as to not skew the data and present results which do not reflect reality. This challenge ventures into the realms of method bias–inequality resulting from the way a test is developed or administered–and construct validity–how well a particular test or measurement reflects the intended construct, which itself may not be directly measurable (concepts like intelligence or self-esteem).

As many researchers have pointed out, culture provides people with a variety of models for ways of thinking, feeling, acting, interacting, and communicating (Ardila, 2005; Fernández & Abe, 2018; Ng et al., 2018; Skeggs et al., 2023), and all of these factors will differ based on sociocultural background. Historically, though, psychological theories, models of intelligence, and constructs of cognition have been developed by particular people for particular people–typically White, frequently male, and often upper- or upper-middle class (Holden & Tanenbaum, 2023)–and a number of popular tests which are still used today, such as the Stanford-Binet test or the Wechsler Intelligence Scale for Children, have been based on these models and theories.

The conventions of psychometric testing itself, even, have their roots in specific cultural contexts (Ardila, 2005). Psychometric testing tends to utilize individual-oriented procedures, taken under time constraints and in isolated environments, and often rely on assumptions of ‘best-performance’ necessity, all of which favor participants or examinees from more individualistic or highly competitive cultures; even the social appropriateness of questions about one’s emotional landscape or mental status–the idea that such questions are not disrespectful, or too intimate for strangers to ask about generally favors White Westerners (Ardila, 2005).

Lack of diversity creates problems from the very beginning. Any amount of sociocultural mismatch between test developers, test administrators, and test participants–be it race/ethnicity, gender, dis/ability, education level, language, or any number of interrelated factors–inevitably results in tests which favor particular groups of people, either because of the particular method used or the construct it is meant to be measuring (Ardila, 2005; Holden & Tanenbaum, 2023). On the flip side, greater sociocultural overlap between test participants or examinees and a given test’s developer tends to result in higher test scores (Ardila, 2005; Skeggs et al., 2023).

This occurs for multiple reasons–a significant one being, that communication is generally easier with people who are more similar to oneself. A test developer may rely on the assumption that the administrator and participant/examinee will approach the test, interpret the instructions, relay information, and understand the results the same way (Ardila, 2005; Ng et al., 2018). Problems of method and item bias occur when specific aspects of a test favor those with similar backgrounds, producing inequitable results within diverse sample populations.

Access and construct validity also come into play–the way a construct is measured in a given test may only adequately represent its meaning within specific subcultures, or items on the test might relate to irrelevant or distracting variables (Fernández & Abe, 2018; Holden & Tanenbaum, 2023). These issues create unfair disadvantages for anyone whose background attaches differing characteristics or significance to a given construct from that which is operationally defined by the researchers.

The impact of culture also applies to physiological brain activity–differing social and cultural backgrounds have been shown to correlate with different levels of brain activation in response to visual stimuli, simple attention tasks, facial recognition, and fear expressions (Fernández & Abe, 2018). Even brain atrophy in dementia patients can differ based on cultural cognitive reserve and multilingualism (Skeggs et al., 2023).

Simple cultural mismatch between test-developers and test-takers is one challenge to take into consideration–accounting for marginalization outside of the testing situation is another. Different circumstances of marginalization are known to impact cognitive and emotional development (Holden & Tanenbaum, 2023), which adds another layer of complexity to the development of appropriate psychological tests, the parameters of their administration, and the interpretation of their results. And even if a test is well-designed to account for compounding stresses, social isolation, the lack of resources to promote healthy growth, the slippery slope of learned helplessness–all of which can negatively impact test-related cognitive functions, like working memory, attention control, inhibition, planning, and problem-solving–researchers still need to find ways to effectively reach out to marginalized communities, who may simply lack the time or ability to access study participation opportunities, or who may historically have been ostracized by the scientific community and have become understandably distrustful of researchers’ intentions.

Potential Harm of Insensitive Psychological Testing

Another challenge posed by a lack of sociocultural sensitivity in cognitive and neuropsychological testing is not simply lack of accuracy, or differential construct validity–but the ethics of practicing science without that sensitivity, and the potential to actually harm patients, research participants, and examinees.

For example, the Boston Naming Test–a tool used to measure word retrieval in individuals with some form of language disturbance, in which a patient is shown a number of items and asked to name them–infamously used to contain a noose as a possible test item (Salo, et al., 2021; Wong et al., 2024). For African American communities, who have long-encountered nooses as symbols of danger and hatred, being confronted with a noose on a psychological test could have a triggering psychological impact.

From a clinical perspective, a patient might experience harm when an insensitive assessment does not accurately capture the extent of their illness. As another example, Skeggs et al. (2023) found that dementia patients whose first language was not English present with a different early disease profile than those whose was, in regards to both verbal and nonverbal cognitive tasks, and to behavioral symptomology. Patterns of eventual brain atrophy also differ between those populations (Skeggs et al., 2023). However, patients whose first language is English and who primarily speak English align best with the current diagnostic criteria on cognitive, behavioral, and neurological fronts. In cases of such neuropsychological ailments, patients with linguistically-diverse backgrounds may have their symptoms overlooked or misinterpreted, resulting in inappropriate or even absent treatment.

Another avenue for potential harm lies in what happens outside of psychological assessment. Take intelligence testing–society places a definite weight on standardized intelligence and IQ-test results, which benefits those who meet a certain benchmark and places limitations on those who do not. However, two of the most common intelligence tests used in American schools–the Binet and Wechsler scales–disproportionately identify low-income and non-White students as low-performance or in need of special education (Holden & Tanenbaum, 2023; Wong et al, 2024). Such programs may not actually suit their needs, and such labels can have long-reaching impacts on things like access to higher education, fairness in job-seeking, workplace support and compensation, or even bodily autonomy.

Potential Solutions

Greater Diversity in Test Development

Part of the problem in psychological testing is that many commonly-used theories and models were developed in a very White/Western-male-centric context (Fernández & Abe, 2018; Holden & Tanenbaum, 2023). This problem presents a natural path for improvement–starting with more culturally-sensitive/relevant models and constructs, working with more diverse development teams, and making an effort towards greater diversity in sample populations. Sampling from a wider variety of populations and subpopulations tends to increase generalizability of results (Malik & Norman, 2023); even if generalizability is not necessarily a study’s aim, greater representation in normalizing samples during a test’s development also allows for more nuanced interpretations of its results (Holden & Tanenbaum, 2023). From a scientific standpoint, this improves the depth of our body of knowledge. From an ethical standpoint, this facilitates greater equity in research and psychological assessment, and in clinical treatment standards.

Another ethical point: greater diversity in the development process of psychological tests also allows more people to have a say in how they and people like them will be treated by the scientific community. Historically, there has been a great deal of mistrust between minority populations and scientists, and not for unfounded reasons (APA, 2021; Malik & Norman, 2023). When we allow a wider variety of people into the development process, it helps build back that trust and enables a more conscious approach to the research process as a whole.

New and Adapted Measures

On occasion, the solution to an insensitive test is simply to adapt the test. This process is sometimes known as cultural decentering–that is, adjusting a test for different cultural contexts, while making sure that it reaches similar results (Fernández & Abe, 2018). As an example–in their study on the Boston Naming Test, Salo et al. (2021) find that, between rounds of tests either including or not including (but giving credit for) the noose item, performance classification among patients does not significantly change. These results support the validity and accuracy of a non-noose Boston Naming Test. Meanwhile, removing the noose from the test reduces its insensitivity towards vulnerable populations and cultivates more thoughtful, culturally-competent mental healthcare. This process may help to build trust between those vulnerable populations and healthcare practitioners.

Instead of adapting preexisting tests, another option is to develop new tools fit for cross-cultural purpose. Language diversity has recently become a focus of test development–lack of nuance in translation or cross-cultural administration can introduce method bias, even in an otherwise sensitive test, so some researchers have been developing visual tests for cognitive impairment (Ng et al., 2018). Tools like the Visual Cognitive Assessment Test do not require translation or interpretation on part of either the test administrator or the patient, which can help improve accuracy in impairment detection by reducing the risk of method bias (Ng et al., 2018), and may be particularly useful in serving large multilingual communities. Drawbacks in this context tend to revolve around the time and resources required to develop and validate new tools intended for multicultural, international application.

Adjusted Norms

Another commonly-cited method for mitigating unfair disparities in psychological testing is adjusting test norms based on demographic profiles. At face value, this makes sense–if standard assessments differentially capture aspects of cognition and psychology across demographic groups, then adjusting norms for those groups would logically improve accuracy in the interpretation of those assessments. And use of adjusted norms has, in some cases, been shown to do just that (Byrd & Rivera-Mindt, 2022; Manly & Echemendia, 2007; Malik & Norman, 2023).

However, standardizing adjusted norms based on particular demographic categories poses its own problems. As Byrd and Rivera-Mindt (2022) highlight, demographic constructs such as race are just that–constructs: imperfect and malleable, descriptive rather than prescriptive. By highlighting a single factor as the basis for a test’s interpretation, researchers may obscure compounding or interrelated factors (Byrd & Rivera-Mindt, 2022; Malik & Norman, 2023; Manly & Echemendia, 2007), thus weakening the science.

Some researchers also believe that defaulting to the use of such adjustments may, however inadvertently, serve to further entrench societal prejudices. In an ironic replication of the insensitivity they attempt to avoid, adjusted norms may be used to justify unequal performance standards both within and between populations (Fernandes et al., 2022; Malik & Norman, 2023), giving subsequent inequitable treatment of particular groups the veneer of scientific legitimacy. Widespread use of race-norming, for example–that is, specifically adjusting scores based on the race of the test-taker or examinee–can reinforce the pseudoscientific idea that there are inherent biological differences between races, which has historically been used to disadvantage non-White communities. In the case of diagnostic testing, such adjustments may additionally cut off a greater percentage of people in minority populations from needed mental healthcare or social services (Manly & Echemendia, 2007).

As a real-world example, the National Football League (NFL) famously used adjusted norms in cognitive function testing for Black players, lower than those for White players, in the courtroom. These adjusted standards made it more difficult for Black players to prove significant damage from sport-related brain injuries; less provable damage also resulted in smaller payouts than those given to White players with similar claims. In fact, the NFL only agreed to stop utilizing race-norming in injury settlement programs in 2021.

Conclusion

Historical lack of diversity and cultural sensitivity in psychological research has led to a scientific environment which can and has done calculable harm to many communities. Nowadays, with both growing diversity in the US population and the ongoing backlash against diversity in science, it is important that psychologists and researchers follow through with the APA’s call to action, and continue to improve the social and cultural sensitivity of our own work. Otherwise, we may end up with a brand of psychology that is, at best, woefully homogenous, representative of only the few, and at worst, toxic and actively dangerous to the many.

References

American Psychological Association. (2021, October 29). Apology to people of color for APA’s

role in promoting, perpetuating, and failing to challenge racism, racial discrimination, and human hierarchy in U.S. https://www.apa.org/about/policy/racism-apology

Ardila, A. (2005). Cultural values underlying psychometric cognitive testing. Neuropsychology

Review, 15(1), 185-195. https://doi.org/10.1007/s11065-005-9180-y

Byrd, D.A., & Rivera-Mindt, M.G. (2022). Neuropsychology’s race problem does not begin or

end with demographically adjusted norms. Nature Reviews Neurology, 18(3), 125-126. https://doi.org/10.1038/s41582-021-00607-4

Fernandes, M., Ayede, A.I., & Blackmon, K. (2022). Addressing racial inequities in

neuropsychological assessment requires international prescriptive standards, not demographically adjusted norms. Nature Reviews Neurology, 18(6), 377. https://doi.org/10.1038/s41582-022-00652-7

Fernández, A.L., & Abe, J. (2018). Bias in cross-cultural neuropsychological testing: Problems

and possible solutions. Culture and Brain, 6(1), 1–35. https://doi.org/10.1007/s40167-017-0050-2

Holden, L.R., & Tanenbaum, G.J. (2023). Modern assessments of intelligence must be fair and

equitable. Journal of Intelligence, 11(6), 126. https://doi.org/10.3390/jintelligence11060126

Malik, H. B., & Norman, J. B. (2023). Best practices and methodological strategies for

addressing generalizability in neuropsychological assessment. Journal of Pediatric Neuropsychology, 9(2), 47–63. https://doi.org/10.1007/s40817-023-00145-5

Manly, J.J., & Echemendia, R.J. (2007). Race-specific norms: Using the model of hypertension

to understand issues of race, culture, and education in neuropsychology. Archives of Clinical Neuropsychology, 22(3), 319-325. https://doi.org/10.1016/j.acn.2007.01.006

Ng, K.P., Chiew, H.J., Lim, L., Rosa-Neto, P., Kandiah, N., & Gauthier, S. (2018). The influence

of language and culture on cognitive assessment tools in the diagnosis of early cognitive impairment and dementia. Expert Review of Neurotherapeutics, 18(11), 859–869. https://doi.org/10.1080/14737175.2018.1532792

Salo, S.K., Marceaux, J.C., McCoy, K.J.M., & Hilsabeck, R.C. (2021). Removing the noose

item from the Boston naming test: A step toward antiracist neuropsychological assessment. The Clinical Neuropsychologist, 36(2), 311–326. https://doi.org/10.1080/13854046.2021.1933187

Skeggs, A., Wei, G., Landin-Romero, R., Hodges, J.R., Piguet, O., & Kumfor, F. (2023). The

influence of culture and cognitive reserve on the clinical presentation of behavioural-variant frontotemporal dementia. Journal of Neurology, 270, 3192–3203. https://doi.org/10.1007/s00415-023-11638-w

Wong, C.G., Arastu, S.F., & Ng, R. (2024). Identifying culturally insensitive tests, stimuli, and

assessment practices in neuropsychology: A survey of professionals and trainees. The Clinical Neuropsychologist, 39(6), 1430-1449. https://doi.org/10.1080/13854046.2024.2405064

Sociocultural Insensitivity and Problems with Psychological Testing: An Overview

Recent Posts

Comments