Intelligence Testing: Accurate or Extremely Biased?

By Emily Young

In the early 1900s, psychologist Charles Spearman noticed that children who did well in one subject in school were likely to do well in other subjects as well, and those who did poorly in one subject were likely to do poorly across all subjects. He concluded that there is a factor, g, which correlates with testing performance (Spearman 1904). The g factor is defined as the measure of the variance of testing performance between individuals and is sometimes called “general intelligence”.

Later on, psychologist Raymond Cattell determined that there are two subsets of g, called fluid intelligence (denoted Gf) and crystallized intelligence (denoted Gc). Fluid intelligence is defined as abstract reasoning or logic; it is an individual’s ability to solve a novel problem or puzzle. Crystalized intelligence is more knowledge based, and is defined as the ability to use one’s learned skills, knowledge, and experience (Cattell 1987). It is important to note that while crystallized intelligence relies on knowledge, it is not a measure of knowledge but rather a measure of the ability to use one’s knowledge.

The first standardized intelligence test was created in 1905 by French Psychologist Albert Binet, as a method to screen for mental retardation in French schoolboys. The test measured intelligence by comparing an individual’s score to the average score of children his own age (Binet 1905). The test was later revised by Lewis Terman of Stanford University and named the Stanford-Binet Intelligence Scales. The Stanford-Binet is now in its fifth edition and includes five sections: fluid reasoning, knowledge, quantitative reasoning, visual-spatial processing, and working memory.

Since the Stanford-Binet, many other standardized intelligence scales have been developed. One of the most popular modern intelligence tests is the Raven’s Progressive Matrices (RPM) test (Raven, 2003). The test gives individuals a series of boxes, each containing shapes that change from box to box, and a box that is empty. The test taker must recognize the pattern that is shown and correctly identify the shape that should go in the empty box from a collection of options. Unlike the Stanford-Binet, RPM is entirely visual; the test taker does not have to answer written questions, meaning the measured IQ is not dependent on reading comprehension. This allows for better testing that eliminates variables such as native language, age, and possible reading disability.

A general example of the questions on the Raven’s Progessive Matrices test.

So what exactly are these IQ tests measuring? The Stanford-Binet measures g through tasks that measure both Gf and Gc. Because RPM is entirely non-verbal and puzzle based, it almost exclusively measures Gf.

Which brings us to the next question; are these tests effectively measuring g?

Since their creation, modern Western intelligence testing has shown a difference in average intelligence, varying from group to group; whites score higher than blacks, the rich score higher than the poor. In some tests, women and men score differently from task to task. Are these differences due to heritable differences in intelligence between race, gender, and socioeconomic status? Or are environment, schooling, and stigma to blame? Or, are the tests themselves flawed?

While intelligence tests claim to be culture-fair, none of the tests created so far are one hundred percent unbiased. As Serpell (1979) found, when asked to reproduce figures from using wire, pencil and paper, and clay, Zambian children performed better in the wire task, while English children performed better in the pencil and paper task. Each group did better in the medium to which they were more accustomed. Pencil and paper IQ tests may be intrinsically biased towards Western culture.

Furthermore, while African-Americans have historically scored lower than white Americans on intelligence testing, this gap as been lessening in recent years (Dickens and Flynn 2006). This could be the result of one of two things; the first possibility is that average intelligence is increasing in the black community at a higher rate than in the white community (measured intelligence has been steadily increasing across all groups due to the Flynn effect). However, it seems more likely that post-segregation, white and black cultures have been merging, and schools have been integrated, meaning that white and black children have a better chance of receiving the same education. If this is the case, IQ tests are either measuring knowledge more than the test creators think they do, or the tests are extremely culturally biased, but this bias is lessening due to assimilation of white and black culture in America.

Not only are intelligence tests culturally biased, but they also seem to be biased in favor of neurotypical individuals. For example, while typically developing individuals generally perform similarly on RPM and the Wechsler Adult Intelligence Scale (WAIS), individuals with Autism typically score higher on RPM than on WAIS (Bolte et al. 2009, Mottron 2004). This is because while RPM is a visual task, WAIS is almost entirely verbal. Individuals with autism seem to use visual strategies to solve tasks and therefore have difficulty on tasks that can only be solved verbally (Kunda and Goel 2010). While this phenomenon is typically seen as a cognitive deficit, it is important to note that autistic individuals outperform neurotypical individuals on some visual tasks.

Therefore, by only measuring one specific part of intelligence, some IQ tests portray autistic individuals as having a cognitive deficit. What if some disorders, such as autism, are not actually disorders, but simply a way of thinking that differs from what is considered “normal”?

For example, Dr. Temple Grandin, an autistic woman with a PhD in Animal Sciences, uses her incredible visual working memory to design cattle equipment that is much more humane and far less anxiety-inducing than previous models. Grandin says her autism allows her to see the world in pictures; her inner thoughts are entirely devoid of language, she simply thinks in extremely detailed movies. She says her visual memory and sensitivity to details has allowed her to be so good at designing things, because details that neurotypical people gloss over are extremely important to her and end up making a huge difference in the efficiency of the final product.
Temple Grandin utilized her incredible working memory to design humane cattle-holding equipment for the agriculture industry.

Autism may not be the only example of a disorder being mischaracterized. Studies have shown that children with ADHD on average have lower IQs than neurotypical children (Kuntsi, 2003). However, in his TEDx talk, Stephen Tonti, a senior at Carnegie Mellon, discusses why he believes ADHD is not a disorder, but simply a difference in cognition. Tonti argues that by viewing ADHD as a disorder implies that it needs to be fixed. He states that his ADHD makes him better at some tasks than neurotypical individuals, and that the world needs a diversity of cognition in order to run smoothly.

Therefore, while IQ tests are intended to measure intelligence, they often only measure one type of intelligence, and are therefore biased against certain groups of people. By trying to fit cognition into a box, IQ testing disvalues cognitive diversity. This may be causing negative impacts. By telling an individual that their intelligence is low when in fact it is simply different, we could not only be holding people back, but we might also be depriving the world of a diverse group of thinkers that could solve problems from a different perspective.

Even if current IQ tests are not fair across all groups, the future of intelligence testing may be brighter; as discussed previously on the Neuroethics Blog, fMRI intelligence testing could eliminate biases in intelligence testing. By observing testers’ thought processes in action, researchers would be able to see which brain pathways a subject recruits to solve a test, and whether he or she uses a visual or verbal approach to the question, thereby observing fluid and crystal intelligence in action.


Binet, Alfred. (1905) L'Annee Psychologique, 12,191-244.

Bölte, S., Dziobek, I., & Poustka, F. “Brief report: The level and nature of autistic intelligence revisited”. Journal of Autism and Developmental Disorders 39 (2009): 678–682.

Cattell, Raymond B., and Raymond B. Cattell. "The Discovery of Fluid and Crystallized General Intelligence." Intelligence: Its Structure, Growth, and Action. Amsterdam: North-Holland, 1987. 87-120. Print.

Dickens, William T., and James R. Flynn. "Black Americans Reduce the Racial IQ Gap: Evidence from Standardization Samples." Psychological Science 17.10 (2006): 913-20. Web.

Kunda, Maithilee, and Ashok K. Goel. "Thinking in Pictures as a Cognitive Account of Autism." Journal of Autism and Developmental Disorders 41.9 (2011): 1157-177. Print.

Kuntsi, J., T.C. Eley, A. Taylor, C. Hughes, P. Asherson, A. Caspi, and T.E. Moffitt. "Co-occurrence of ADHD and Low IQ Has Genetic Origins." American Journal of Medical Genetics 124B.1 (2004): 41-47. Print.

Mottron, Laurent, Michelle Dawson, Isabelle Soulières, Benedicte Hubert, and Jake Burack. "Enhanced Perceptual Functioning in Autism: An Update, and Eight Principles of Autistic Perception." Journal of Autism and Developmental Disorders 36.1 (2006): 27-43. Print.

Raven, J., J. C. Raven, and J. Court. Manual for Raven’s Progressive Matrices and Vocabulary Scales, Section I: General Overview. San Antonio: Harcourt Assessment, 2003. Print.

Serpell, Robert. "How Specific Are Perceptual Skills? A Cross-cultural Study of Pattern Reproduction." British Journal of Psychology 70.3 (1979): 365-80. Print.

Spearman, Charles E. "'General Intelligence', Objectively Determined And Measured." American Journal of Psychology 15 (1904): 201-93. Web.

Want to cite this post?

Young, E. (2013). Intelligence Testing: Accurate, or Extremely Biased? Retrieved on , from

Emory Neuroethics on Facebook

Emory Neuroethics on Twitter

AJOB Neuroscience on Facebook