Trying to read the tea leaves of box office futures through the ever-changing lens of social media is a maddeningly difficult yet strangely satisfying endeavor. Thanks to the increase in the amount of data available, things are always improving. This is a new science that offers no road map to success, but its flashes of brilliance and ease of use stoke the drive to fine-tune the data and create better forecasts.
The main challenge is that the social media landscape is constantly evolving. Demographics change, the usages of platforms vary and new ones arise while others decline. I have been through three resets of tracking methodology over the last 10 years due to these challenges, in an effort to improve prediction accuracy and keep up with the changing landscape. The latest of these changes came two months ago, as we modified our tracking system at Box Office Pro to account for Instagram as well as Facebook and Twitter, bringing our total tracked variables per film to 74. While that is a gold mine, it represents a massive onslaught of data being collected every day for over 100 upcoming titles.
So how do we start to tackle the data set? Deciding which of the 74 variables are useful for analyzing a movie’s potential is no small feat. First off, we need to collect data for a couple of months to get some historical data points; then we look at opening-day and weekend box office and run a correlation of both data sets. The data period of interest for our most recent collection was Friday to Thursday before release. After this was done, I determined the following 10 data points as the most useful for predicting opening-day box office.
Top 10 Social Media Variables in Predicting Opening-Day Box Office
- Facebook fans added
- Instagram post comments
- Facebook post shares
- Facebook post interactions
- Facebook power rank
- Facebook post likes
- Instagram post video views
- Facebook post views
- Facebook wow post reaction
- Twitter page likes
There are some caveats, however, associated with the above list. First, we do not have a statistically significant sample size, which means that over time these variables could very well change in significance. Immediately, it became clear that we needed to exclude Avengers: Endgame from the data set, because its numbers as the top performer (by far) on both social media and at the box office dramatically skewed virtually all data points. Also, we are no longer able to track individual tweet strings on Twitter for films, as we have in the past—that was a great resource and would no doubt be on this list if it were still available.
Getting back to the table, what this means is that for the last couple of months the number of fans added on the official Facebook page of a film from the Friday to Thursday before release explained more of the changes in opening day box office than any other single variable of the 74.
Now that we have found some of the needles in the haystack, we can begin to use these numbers to provide another useful data point for our predictions and forecasts. There are numerous areas I need to expand on. More historical films for all variables would be a huge help. Creating sub-groups for different genres also could be very beneficial, as different genres could have different variables explaining more of the opening-day box office variance. For instance, young-adult-targeted comedies might have Instagram post video views as the number one correlated variable for determining box office earnings, while animated films might find the Facebook Power Rank (likes + shares + comments) on top. Another asset this new information affords is a wonderful yardstick for sequels, as we continue to track them in the coming years. Being able to pinpoint exactly what John Wick 3’s numbers showed would be one of the most important indicators for John Wick 4, for instance—discovering variations in online demand and thus the sequel’s potential performance before its actual opening.
There is no such thing as a silver bullet when it comes to social media tracking for box office potential. In the past, some have cherry-picked data or run really small sample sizes and claimed huge predictive success that didn’t stand up over time. The fact is, anything that can offer an informed guesstimate is a win, especially when the data is freely available. I look forward to adding to our 74 variables where it makes sense and fine tuning the methodology to look at data from more than just a week out.