Meghan Markle and the Media - Analysis of the headlines

R

Outline

Since the first rumours of her romance with Prince Harry, Meghan Markle has been a mainstay feature on the pages of the tabloids. In my previous post, I scraped three British tabloids to create a data set with almost 11,000 headlines about the Duchess of Sussex.

In this part, I’ll perform analysis on the text and the sentiment of the headlines to see what patterns can be observed.

Executive summary

Sentiment over time

To analyse the sentiment of the headlines over time, I wanted to aggregate the dates by week and month for easier visualisation.

library(wordcloud)
library(tidytext)
library(dplyr)
library(reshape2)
library(ggplot2)
library(lubridate)
# Create new columns with week and month date truncation
headline_sentiment <- headline_sentiment %>% 
  mutate(date_week = round_date(date, "week")) %>% 
  mutate(date_month = round_date(date, "month"))

master_df_text <- master_df_text %>% 
  mutate(date_week = round_date(date, "week")) %>% 
  mutate(date_month = round_date(date, "month"))

As outlined in my previous post, I decided to play with two separate sentiment analysis packages. The first was using the AFFIN lexicon from the tidytext package (Silge and Robinson, 2020) which measures individual words on their sentiment scaled from -5 (most negative) to 5 (most positive). The second was getting sentence-level sentiment with sentimentr (Rinker, 2019) which calculates text polarity sentiment.

AFFIN

affin_sent %>% 
  group_by(date_month) %>% 
  summarise(avg_sent = mean(sentiment)) %>% 
  ggplot(aes(date_month, avg_sent)) +
  geom_line() +
  theme_minimal() +
  labs(x = "Month",
         y = "Sentiment",
         title = "Average sentiment of headlines about Meghan Markle by month, AFINN") +
  geom_smooth(method = "loess")

AFFIN sentiment time series

sentimentr

headline_sentiment %>% 
  group_by(date_month) %>% 
  summarise(avg_sent = mean(ave_sentiment)) %>% 
  ggplot(aes(date_month, avg_sent)) +
  geom_line() +
  theme_minimal() +
  labs(x = "Month",
         y = "Sentiment",
         title = "Average sentiment of headlines about Meghan Markle by month, sentimentr") +
  geom_smooth(method = "loess")

sentimentr sentiment time series

Both the AFFIN sentiment and the sentimentr show a decline in average sentiment over time. AFFIN shows a steady incline of sentiment from mid-2016 to early 2018, between the announcement that Harry and Meghan had started dating and the first official public appearance together. From there, the decline is quite sudden between early 2018 and mid-2019. Between this time period Harry and Meghan announced their engagement, got married, announced their pregnancy and welcomed baby Archie. I noticed that, according to AFFIN, the headlines before the pregnancy announcement were broadly positive but became more negative around the time there was controversy around a New York-based baby shower and the discontent of the Markle family. While the loess line evens off between mid-2019 to mid-2020, the actual sentiment varies wildly.

The sentimentr line shows a more meandering decline from mid-2016, although the actual sentiment is just as volatile. One notable low point of sentiment are some legal issues experienced by Meghan’s brother’s in early-2017. Another is in early-2019 - although I cannot pinpoint an exact subject for this, I can see multiple headlines around sexist and racist comments received, terrorist attack risks and feuds with the rest of the Royal family.

Sentiment over time by publication

sentimentr

 headline_sentiment %>% 
  group_by(date_month, publication) %>% 
  summarise(avg_sent = mean(ave_sentiment)) %>% 
  ggplot(aes(date_month, avg_sent, fill = publication, colour = publication)) +
  geom_line() +
  theme_minimal() +
  labs(x = "Month",
         y = "Sentiment",
         title = "Average sentiment of headlines about Meghan Markle by month, sentimentr",
         color = "Publication") +
  scale_colour_manual(values=c("#2A2222", "#004DB3", "#EB1801"))

sentimentr by publication

When sentimentr is broken down by publication, we can see that The Express does not seem to have contributed as many headlines as The Sun and the Daily Mail, and those it has contributed appear to be relatively steady in sentiment.

The Sun appears to have wildly fluctuating sentiment, with a notable high in mid-2017 (just before the first public appearance). The sentiment appears to stay continuously negative from this point onwards. A particular low point of sentiment is between the pregnancy being announced and the baby shower. This appears to be focussed around a perceived fallout with Kate Middleton, rumours of arguments with staff and ongoing family issues.

The Daily Mail also shows fluctuating sentiment, with highs and lows that reflect the pattern shown by The Sun. The main deviation from The Sun appears to be from 2020 onwards where the Daily Mail headlines get more positive while The Sun’s get more negative.

AFFIN

 affin_sent %>% 
  group_by(date_month, publication) %>% 
  summarise(avg_sent = mean(sentiment)) %>% 
  ggplot(aes(date_month, avg_sent, fill = publication, colour = publication)) +
  geom_line() +
  theme_minimal() +
  labs(x = "Month",
         y = "Sentiment",
         title = "Average sentiment of headlines about Meghan Markle by month, AFFIN", 
         color = "Publication") +
  scale_colour_manual(values=c("#2A2222", "#004DB3", "#EB1801"))

AFFIN sentiment time series

The AFFIN sentiment plotted over time tells a similar story to sentimentr with regards to The Express and The Sun. AFFIN does, however, appear to give much more positive sentiment to headlines in Q4 2018 (around the time the Duchess announced her pregnancy).

Weeks of interest

I want to find the weeks across all publications that generated the highest number of headlines to see if that corresponded to any major life events in the Meghan’s life.

headline_sentiment %>% 
  group_by(date_week) %>% 
  summarise(num_headlines = n()) %>% 
  arrange(desc(num_headlines)) %>% 
  head(5)

Weeks with most headlines

To create wordclouds from the headlines generated during these weeks, I need to remove the punctuation and create some additional stop words to remove, including “Meghan” and “Markle”.

# Replace all punctuation with no space
master_df_text$word <- gsub('[[:punct:] ]+','', master_df_text$word)

# Create additional stopwords
add_stops <- c("meghan", "markle", "meghans", "markles", "prince", "harry", "royal", "meghan's", "markle's")

Week of 20-05-2018 - Meghan Markle marries Price Harry

master_df_text %>% 
  filter(date_week == '2018-05-20') %>% 
  filter(!word %in% add_stops) %>% 
  count(word) %>%
  with(wordcloud(word, n, max.words = 100))

Wedding word cloud

As expected, a lot of the words here are wedding related (wedding, day, tiara, bride, guests, tradition, gown). There seems to be a broadly positive sentiment (friend, history, perfect, modern, beautiful, tribute, moment) with a note to some notable guests on the day (Oprah, Clooney, stars, invited, George, Stella, McCartney, celebrity).

Week of 05-05-2019 - The birth of Archie

master_df_text %>% 
  filter(date_week == '2019-05-05') %>% 
  filter(!word %in% add_stops) %>% 
  count(word) %>%
  with(wordcloud(word, n, max.words = 100))

Birth of Archie word cloud

Words related to the birth were prevalent (baby, Archie, birth, family, newborn, mum, mother, born, due, boy, induced) along with a dusting of family members both Royal (Queen, Harry, Kate, William) and on Meghan’s side (father, sister, Thomas, Ragland).

Week of 21-10-2018 - Meghan and Harry announce their pregnancy

master_df_text %>% 
  filter(date_week == '2018-10-21') %>% 
  filter(!word %in% add_stops) %>% 
  count(word) %>%
  with(wordcloud(word, n, max.words = 100))

Pregnancy word cloud

Words related to pregnancy are used (baby, bump, pregnancy, pregnant, due) as well as some mentions of the Royal couple’s visit to Australia (tour, Australia, Melbourne, Bondi, Sydney, Fiji, Australian, beach).

Week of 09-12-2018 - Feud with with Markle family, argument with Kate Middleton, rumours of issues with staff

master_df_text %>% 
  filter(date_week == '2018-12-09') %>% 
  filter(!word %in% add_stops) %>% 
  count(word) %>%
  with(wordcloud(word, n, max.words = 100))

Feud word cloud

The most commonly used words here relate to other life events (wedding, baby, pregnant) some other controversies come to light here such as family feuds (Dad, rift, Kate, Samantha, Thomas, family) and living arrangements (Frogmore, Cottage).

Week of 07-07-2019 - Archie’s christening

master_df_text %>% 
  filter(date_week == '2019-07-07') %>% 
  filter(!word %in% add_stops) %>% 
  count(word) %>%
  with(wordcloud(word, n, max.words = 100))

Christening word cloud

During this week, Archie’s christening took place (christening, Archie, godparents). There was some controversy over the refusal to make the godparents of Archie public information (godparents, secret, public), more feud rumors (feud, Kate, family, Thomas), and a visit to Wimbledon with Serena Williams where a member of the public was asked not to take photos (Serena, Wimbledon, snub, photo).

Which headlines have the highest and lowest sentiment?

Now, I will create wordclouds from the headlines that, according to sentimentr, have the most positive and negative sentiments.

Words in the headlines with the most positive sentiment

headline_sentiment %>% 
  arrange(desc(ave_sentiment)) %>% 
  select(headline, ave_sentiment) %>% 
  head(100) %>% 
  unnest_tokens(word, headline) %>% 
  anti_join(stop_words) %>% 
  filter(!word %in% add_stops) %>% 
  count(word) %>%
  with(wordcloud(word, n, max.words = 100, colors = "#229954"))

Positive word cloud

Key themes from the most positive headlines were the pregnancy and birth of Archie (birth, baby, child, boy), the wedding (bride, dress, wedding) and general character attributes about the Duchess (Duchess, star, fans, woman, charity, sweet, modern, smart). Oddly Piers Morgan, an outspoken critic of the Duchess, makes an appearance. An example of one of these headlines is PIERS MORGAN: Hearty congratulations, Harry, you picked a real keeper (even if your romance did destroy my beautiful friendship with the amazing Meghan Markle).

Words in the headlines with the most negative sentiment

headline_sentiment %>% 
  arrange(desc(ave_sentiment)) %>% 
  select(headline, ave_sentiment) %>% 
  tail(100) %>% 
  unnest_tokens(word, headline) %>% 
  anti_join(stop_words) %>% 
  filter(!word %in% add_stops) %>% 
  count(word) %>%
  with(wordcloud(word, n, max.words = 100, colors = "#C70039"))

Negative word cloud

The themes of the headlines with the most negative sentiment focussed around the rumoured feuds that the tabloids have plagued Meghan about (feud, Dad, Sister, father, Samantha, Kate, letter). Some of the more violent and distessing words (attack, racist, death forced, shock, warning, terror, racism) have a more upsetting context. Examples of these are:

One headline even makes the claim the Duchess is related to America’s first serial killer.

Word clouds by publication

Finally, I wanted to see what positive and negative words were used the most in headlines used by each publication. This isn’t as accurate as it could be since each word is taken out of the context it was originally expressed in - for example, trump is considered positive while bump is negative.

Particular thanks to Text Mining with R - Wordclouds (Silge and Robinson, 2020) for inspiration creating these word clouds.

The Sun

master_df_text %>% 
  filter(publication == "The Sun",
         !word %in% add_stops) %>%
  inner_join(get_sentiments("bing")) %>%
  count(word, sentiment, sort = TRUE) %>%
  acast(word ~ sentiment, value.var = "n", fill = 0) %>%
  comparison.cloud(colors = c("#8d0e00", "#EB1801"),
                   max.words = 100)

Sun word cloud

Daily Mail

master_df_text %>% 
  filter(publication == "Daily Mail",
         !word %in% add_stops) %>%
  inner_join(get_sentiments("bing")) %>%
  count(word, sentiment, sort = TRUE) %>%
  acast(word ~ sentiment, value.var = "n", fill = 0) %>%
  comparison.cloud(colors = c("#002e6b", "#004DB3"),
                   max.words = 100)

Daily Mail word cloud

The Express

master_df_text %>% 
  filter(publication == "The Express",
         !word %in% add_stops) %>%
  inner_join(get_sentiments("bing")) %>%
  count(word, sentiment, sort = TRUE) %>%
  acast(word ~ sentiment, value.var = "n", fill = 0) %>%
  comparison.cloud(colors = c("#8c0f1b", "#EA1A2E"),
                   max.words = 100)

Express word cloud

Visualise on Tableau Public

The final results are held in a publicly hosted Tableau dashboard. This plots sentiment (using AFFIN) over time while also showing the density of headlines per week, key life milestones and a search bar to find headlines relating to specific topics.