A dataset with all of the Marvel characters (there are 16,376 of them!) provides fertile ground to explore society's biases. The data lists character attributes such as gender, hair color, eye color, and sexual orientation. Not only that, but it also lists whether each of these fictional characters is labeled as 'good' or 'bad.'
After learning about this dataset, I quickly loaded it into a Jupyter notebook and started to zip through cells. I found already established trends, like that many more men are represented than women, so I kept exploring.
Digging in more, I found strong correlations between lighter phenotypes -- blond hair and blue eyes -- and whether a character is portrayed as good or bad. Since there was no skin color attribute, this was a good stand-in for people with lighter complexions in the dataset.
Over half of Marvel characters with lighter hair and eyes are ‘good,’ whereas just over half of all other characters are ‘bad’
GOOD
BAD
60%
56%
of characters with light hair + eyes are good
51%
are bad
40
33%
29%
of all other characters are good
are bad
20
(remaining characters are neutral)
0
GOOD
BAD
60%
56%
of characters with light hair + eyes are good
51%
are bad
40
33%
of all other characters are good
29%
are bad
20
(remaining characters are neutral)
0
Data: fivethirtyeight
A mentor suggested I look at if the trend of lighter complexions being portrayed as "good" held over time. I turned back to the data and brought in the year of first appearance for each character.
Looking at the results shows that the Marvel canon started out with a wide gap between the phenotype of who was portrayed as good and bad, with lighter hair and eye colors overwhelmingly being portrayed as good. However, as time went on, that gap narrowed.
Over time, the bias towards introducing new characters with lighter hair and eyes as ‘good’ has diminished
percent of new characters with light hair + eyes who are labeled Good
percent of all other new characters who are labeled Good
100%
80
Gap has almost disappeared over time
60
40
20
0
1940
1950
1960
1970
1980
1990
2000
2010
year
percent of new characters with light hair + eyes who are labeled Good
percent of all other new characters who are labeled Good
100%
Gap has almost disappeared over time
80
60
40
20
0
1940
1980
1960
2000
year
Data: fivethirtyeight
The gap was closing, but this made me wonder if legacy characters still had an outsized influence. Unfortunately, I didn't have data on the number of appearances for each character by year, but I was able to bring in the overall number of appearances for each character. Again, there was a stark difference in representation.
‘Good’ characters with light hair and eyes appear over three times more than all other ‘good’ characters
good characters with light hair + eyes
all other good characters
89 average appearances
27 average appearances
good characters with light hair + eyes
all other good characters
89 average appearances
27 average appearances
Data: fivethirtyeight