Meghan Markle and the Media - Analysis of the headlines
R
Outline
Since the first rumours of her romance with Prince Harry, Meghan Markle has been a mainstay feature on the pages of the tabloids. In my previous post, I scraped three British tabloids to create a data set with almost 11,000 headlines about the Duchess of Sussex.
In this part, I’ll perform analysis on the text and the sentiment of the headlines to see what patterns can be observed.
Executive summary
-
Both the sentiment analysis libraries I used showed a decline in the sentiment in headlines about Meghan Markle from mid-2016 until mid-2020.
-
The Sun appeared to have headlines with the most negative sentiment, while the Daily Mail generally followed the same peaks and troughs. Within my dataset, The Express had fewer headlines with less fluctuation in sentiment.
-
The events that generated the most headlines in Meghan’s life since mid-2016 were the wedding, the birth of Archie, the pregnancy announcement, feuds with staff and family (both Royal and Markle) and Archie’s christening.
-
Common positive themes across all publications were around the aforementioned life events (wedding and baby), as well as a large fan base and a nod to beauty and modernity.
-
The negative themes were around rumours of feuds with the Royals and public family fallouts with the Markles. There were also headlines around numerous security threats and racist abuse.
-
The data is available in Tableau dashboard allows a user to search for the headlines and observe sentiment for themselves.
Sentiment over time
To analyse the sentiment of the headlines over time, I wanted to aggregate the dates by week and month for easier visualisation.
library(wordcloud)
library(tidytext)
library(dplyr)
library(reshape2)
library(ggplot2)
library(lubridate)
# Create new columns with week and month date truncation
headline_sentiment <- headline_sentiment %>%
mutate(date_week = round_date(date, "week")) %>%
mutate(date_month = round_date(date, "month"))
master_df_text <- master_df_text %>%
mutate(date_week = round_date(date, "week")) %>%
mutate(date_month = round_date(date, "month"))
As outlined in my previous post, I decided to play with two separate sentiment analysis packages. The first was using the AFFIN lexicon from the tidytext package (Silge and Robinson, 2020) which measures individual words on their sentiment scaled from -5 (most negative) to 5 (most positive). The second was getting sentence-level sentiment with sentimentr (Rinker, 2019) which calculates text polarity sentiment.
AFFIN
affin_sent %>%
group_by(date_month) %>%
summarise(avg_sent = mean(sentiment)) %>%
ggplot(aes(date_month, avg_sent)) +
geom_line() +
theme_minimal() +
labs(x = "Month",
y = "Sentiment",
title = "Average sentiment of headlines about Meghan Markle by month, AFINN") +
geom_smooth(method = "loess")
sentimentr
headline_sentiment %>%
group_by(date_month) %>%
summarise(avg_sent = mean(ave_sentiment)) %>%
ggplot(aes(date_month, avg_sent)) +
geom_line() +
theme_minimal() +
labs(x = "Month",
y = "Sentiment",
title = "Average sentiment of headlines about Meghan Markle by month, sentimentr") +
geom_smooth(method = "loess")
Both the AFFIN sentiment and the sentimentr show a decline in average sentiment over time. AFFIN shows a steady incline of sentiment from mid-2016 to early 2018, between the announcement that Harry and Meghan had started dating and the first official public appearance together. From there, the decline is quite sudden between early 2018 and mid-2019. Between this time period Harry and Meghan announced their engagement, got married, announced their pregnancy and welcomed baby Archie. I noticed that, according to AFFIN, the headlines before the pregnancy announcement were broadly positive but became more negative around the time there was controversy around a New York-based baby shower and the discontent of the Markle family. While the loess line evens off between mid-2019 to mid-2020, the actual sentiment varies wildly.
The sentimentr line shows a more meandering decline from mid-2016, although the actual sentiment is just as volatile. One notable low point of sentiment are some legal issues experienced by Meghan’s brother’s in early-2017. Another is in early-2019 - although I cannot pinpoint an exact subject for this, I can see multiple headlines around sexist and racist comments received, terrorist attack risks and feuds with the rest of the Royal family.
Sentiment over time by publication
sentimentr
headline_sentiment %>%
group_by(date_month, publication) %>%
summarise(avg_sent = mean(ave_sentiment)) %>%
ggplot(aes(date_month, avg_sent, fill = publication, colour = publication)) +
geom_line() +
theme_minimal() +
labs(x = "Month",
y = "Sentiment",
title = "Average sentiment of headlines about Meghan Markle by month, sentimentr",
color = "Publication") +
scale_colour_manual(values=c("#2A2222", "#004DB3", "#EB1801"))
When sentimentr is broken down by publication, we can see that The Express does not seem to have contributed as many headlines as The Sun and the Daily Mail, and those it has contributed appear to be relatively steady in sentiment.
The Sun appears to have wildly fluctuating sentiment, with a notable high in mid-2017 (just before the first public appearance). The sentiment appears to stay continuously negative from this point onwards. A particular low point of sentiment is between the pregnancy being announced and the baby shower. This appears to be focussed around a perceived fallout with Kate Middleton, rumours of arguments with staff and ongoing family issues.
The Daily Mail also shows fluctuating sentiment, with highs and lows that reflect the pattern shown by The Sun. The main deviation from The Sun appears to be from 2020 onwards where the Daily Mail headlines get more positive while The Sun’s get more negative.
AFFIN
affin_sent %>%
group_by(date_month, publication) %>%
summarise(avg_sent = mean(sentiment)) %>%
ggplot(aes(date_month, avg_sent, fill = publication, colour = publication)) +
geom_line() +
theme_minimal() +
labs(x = "Month",
y = "Sentiment",
title = "Average sentiment of headlines about Meghan Markle by month, AFFIN",
color = "Publication") +
scale_colour_manual(values=c("#2A2222", "#004DB3", "#EB1801"))
The AFFIN sentiment plotted over time tells a similar story to sentimentr with regards to The Express and The Sun. AFFIN does, however, appear to give much more positive sentiment to headlines in Q4 2018 (around the time the Duchess announced her pregnancy).
Weeks of interest
I want to find the weeks across all publications that generated the highest number of headlines to see if that corresponded to any major life events in the Meghan’s life.
headline_sentiment %>%
group_by(date_week) %>%
summarise(num_headlines = n()) %>%
arrange(desc(num_headlines)) %>%
head(5)
To create wordclouds from the headlines generated during these weeks, I need to remove the punctuation and create some additional stop words to remove, including “Meghan” and “Markle”.
# Replace all punctuation with no space
master_df_text$word <- gsub('[[:punct:] ]+','', master_df_text$word)
# Create additional stopwords
add_stops <- c("meghan", "markle", "meghans", "markles", "prince", "harry", "royal", "meghan's", "markle's")
Week of 20-05-2018 - Meghan Markle marries Price Harry
master_df_text %>%
filter(date_week == '2018-05-20') %>%
filter(!word %in% add_stops) %>%
count(word) %>%
with(wordcloud(word, n, max.words = 100))
As expected, a lot of the words here are wedding related (wedding, day, tiara, bride, guests, tradition, gown). There seems to be a broadly positive sentiment (friend, history, perfect, modern, beautiful, tribute, moment) with a note to some notable guests on the day (Oprah, Clooney, stars, invited, George, Stella, McCartney, celebrity).
Week of 05-05-2019 - The birth of Archie
master_df_text %>%
filter(date_week == '2019-05-05') %>%
filter(!word %in% add_stops) %>%
count(word) %>%
with(wordcloud(word, n, max.words = 100))
Words related to the birth were prevalent (baby, Archie, birth, family, newborn, mum, mother, born, due, boy, induced) along with a dusting of family members both Royal (Queen, Harry, Kate, William) and on Meghan’s side (father, sister, Thomas, Ragland).
Week of 21-10-2018 - Meghan and Harry announce their pregnancy
master_df_text %>%
filter(date_week == '2018-10-21') %>%
filter(!word %in% add_stops) %>%
count(word) %>%
with(wordcloud(word, n, max.words = 100))
Words related to pregnancy are used (baby, bump, pregnancy, pregnant, due) as well as some mentions of the Royal couple’s visit to Australia (tour, Australia, Melbourne, Bondi, Sydney, Fiji, Australian, beach).
Week of 09-12-2018 - Feud with with Markle family, argument with Kate Middleton, rumours of issues with staff
master_df_text %>%
filter(date_week == '2018-12-09') %>%
filter(!word %in% add_stops) %>%
count(word) %>%
with(wordcloud(word, n, max.words = 100))
The most commonly used words here relate to other life events (wedding, baby, pregnant) some other controversies come to light here such as family feuds (Dad, rift, Kate, Samantha, Thomas, family) and living arrangements (Frogmore, Cottage).
Week of 07-07-2019 - Archie’s christening
master_df_text %>%
filter(date_week == '2019-07-07') %>%
filter(!word %in% add_stops) %>%
count(word) %>%
with(wordcloud(word, n, max.words = 100))
During this week, Archie’s christening took place (christening, Archie, godparents). There was some controversy over the refusal to make the godparents of Archie public information (godparents, secret, public), more feud rumors (feud, Kate, family, Thomas), and a visit to Wimbledon with Serena Williams where a member of the public was asked not to take photos (Serena, Wimbledon, snub, photo).
Which headlines have the highest and lowest sentiment?
Now, I will create wordclouds from the headlines that, according to sentimentr, have the most positive and negative sentiments.
Words in the headlines with the most positive sentiment
headline_sentiment %>%
arrange(desc(ave_sentiment)) %>%
select(headline, ave_sentiment) %>%
head(100) %>%
unnest_tokens(word, headline) %>%
anti_join(stop_words) %>%
filter(!word %in% add_stops) %>%
count(word) %>%
with(wordcloud(word, n, max.words = 100, colors = "#229954"))
Key themes from the most positive headlines were the pregnancy and birth of Archie (birth, baby, child, boy), the wedding (bride, dress, wedding) and general character attributes about the Duchess (Duchess, star, fans, woman, charity, sweet, modern, smart). Oddly Piers Morgan, an outspoken critic of the Duchess, makes an appearance. An example of one of these headlines is PIERS MORGAN: Hearty congratulations, Harry, you picked a real keeper (even if your romance did destroy my beautiful friendship with the amazing Meghan Markle).
Words in the headlines with the most negative sentiment
headline_sentiment %>%
arrange(desc(ave_sentiment)) %>%
select(headline, ave_sentiment) %>%
tail(100) %>%
unnest_tokens(word, headline) %>%
anti_join(stop_words) %>%
filter(!word %in% add_stops) %>%
count(word) %>%
with(wordcloud(word, n, max.words = 100, colors = "#C70039"))
The themes of the headlines with the most negative sentiment focussed around the rumoured feuds that the tabloids have plagued Meghan about (feud, Dad, Sister, father, Samantha, Kate, letter). Some of the more violent and distessing words (attack, racist, death forced, shock, warning, terror, racism) have a more upsetting context. Examples of these are:
-
Kate Middleton and Meghan Markle victims of shocking ‘racist and sexist’ attacks
-
Palace staff are forced to spend hours moderating ‘hundreds of thousands’ of vile sexist and racist comments on Palace social media pages aimed at Kate and Meghan fuelled by rivalry between the duchesses’ warring fans
-
Website targeting Meghan Markle and Prince Harry refuses to take down racist abuse
-
Prince Harry’s girlfriend Meghan Markle on death threat ordeal: ‘People wanted to kill me’
One headline even makes the claim the Duchess is related to America’s first serial killer.
Word clouds by publication
Finally, I wanted to see what positive and negative words were used the most in headlines used by each publication. This isn’t as accurate as it could be since each word is taken out of the context it was originally expressed in - for example, trump is considered positive while bump is negative.
-
All three publications have racism within the negative category, which is interesting considering a lot of the attention focussed on the Duchess has been considered as racist by some observers.
-
The Express appeared to focus on scandal (rumours, shock, furious, criticism, blunder, rift) as a high proportion of all the headlines published about the Duchess.
-
The Sun and The Express both often referred to the Duchess as sexy in among other positive character traits (cute, stunning, perfect, smart, powerful, confident). The Daily Mail used words such as sparkle, trendy, smart, modern, stunning, beauty, lovely, chic and adorable.
-
In all publications, fans are hugely present within the headlines.
Particular thanks to Text Mining with R - Wordclouds (Silge and Robinson, 2020) for inspiration creating these word clouds.
The Sun
master_df_text %>%
filter(publication == "The Sun",
!word %in% add_stops) %>%
inner_join(get_sentiments("bing")) %>%
count(word, sentiment, sort = TRUE) %>%
acast(word ~ sentiment, value.var = "n", fill = 0) %>%
comparison.cloud(colors = c("#8d0e00", "#EB1801"),
max.words = 100)
Daily Mail
master_df_text %>%
filter(publication == "Daily Mail",
!word %in% add_stops) %>%
inner_join(get_sentiments("bing")) %>%
count(word, sentiment, sort = TRUE) %>%
acast(word ~ sentiment, value.var = "n", fill = 0) %>%
comparison.cloud(colors = c("#002e6b", "#004DB3"),
max.words = 100)
The Express
master_df_text %>%
filter(publication == "The Express",
!word %in% add_stops) %>%
inner_join(get_sentiments("bing")) %>%
count(word, sentiment, sort = TRUE) %>%
acast(word ~ sentiment, value.var = "n", fill = 0) %>%
comparison.cloud(colors = c("#8c0f1b", "#EA1A2E"),
max.words = 100)
Visualise on Tableau Public
The final results are held in a publicly hosted Tableau dashboard. This plots sentiment (using AFFIN) over time while also showing the density of headlines per week, key life milestones and a search bar to find headlines relating to specific topics.