100 Years of Set Locations

Post image for 100 Years of Set Locations

A few years ago, my sister convinced me to see a film she’d just watched, called The Fall. Within minutes of popping in the Blu-ray, I was hooked; I couldn’t look away. The film follows five characters wandering the globe, and it transports you from the Namibian Desert to the rice terraces of Bali to Prague’s Charles Bridge as effortlessly as most films pan left. One fan’s even devoted an entire blog to covering the locations. And in a world where 127 Hours and Buried can compete for minimalist sets, I always love to see a film that instead embodies exploration. So after watching, I immediately headed to IMDB to see a full list of set locations, and wasted a day looking up every one.

Recently, I’ve found myself seeking this information more often, out of my personal and local interest. And so to make this easier, I’ve taken the top 2000 films from 1910-2010 according to IMDB, and fed their locations into the Google Map below. You can browse in the window below or click the “full screen” link to see a larger version. The map starts zoomed out to get the big picture, but zoom in, click on each marker for more information, and enjoy exploring for yourself:

Full Screen

Notes: To accomplish this, I used Google’s new data management tool: Fusion Tables. I was completely amazed by their functionality; all I needed to do was enter in the location data from IMDB, and the map was generated automatically. These data were taken from, as linked in the map, and the 2000 films were selected as the top films from 1910-2010 by most IMDB user votes, as I believed that was the best simple approximation of general interest in each film. Some locations couldn’t be found by Google Maps – and in what has to be the most entertaining error handling I’ve seen, those data points are mapped to the Bermuda Triangle. And finally, The Fall actually isn’t in the top 2000 film list, so after adding that, the totals are 2001 films, and 9736 locations.

Actuals and Estimates – Studio Accuracy in Box Office Prediction

Post image for Actuals and Estimates – Studio Accuracy in Box Office Prediction

Every Sunday, each film studio estimates and reports their weekend box office results. On Monday, they announce the actual results. And in between, hundreds of newspapers, trade magazines, and online posts are written about the weekend’s box office winners. Movie studios care about these stories because many people presumably follow the “wisdom of crowds.” In other words, people may hear about a blockbuster and think: “If so many others saw this movie, maybe I should too.” In addition, movie studios often use opening weekend box office numbers in their advertising for the following week (e.g. “See the #1 movie in America”). Consequently, studios have an incentive to overestimate their Sunday expectations. So a little over a month ago, Neil Malhotra, an associate professor at the University of Pennsylvania, suggested that he and I look at the differences between those predictions and the reality.

We began by collecting data on first weekend “estimates” and “actuals” for the 1,064 movies that opened between 2003 and 2010 in at least 1,000 theaters. The first test we performed was simply to see if the estimates were accurate. If Sunday estimates were not biased upwards, then on average the estimates would be above the actuals as often as below. However, there were more than twice as many overestimates as underestimates (716 to 349), which is highly unlikely to be due to chance alone. To put this in context, this is about as likely as flipping a coin 98 times and seeing only heads, which means that studios are consistently overestimating. On average, studio estimates for Sunday revenues were higher than the actuals by 6.38%. But interestingly, the amount of inflation varies depending on several factors, including the newness of the film being estimated. Below, we plot the average inflations over all films’ first five opening weekends.

You can see that the opening weekend has a vastly higher inflation than the following weekends. This leads to two plausible theories: 1) There’s much more press about opening-weekend results than other weekends, so studios have more reason to inflate the first weekend than following weekends, or 2) The first weekend is inherently the least predictable, and in cases of uncertainty, studios always choose to go with optimistic numbers.

From here, Neil and I wanted to see a breakdown of inflation by studios. Among only the opening weekends, we found the following results:

From this chart, it becomes immediately clear that there is great variation in the inflation levels across studios. Sony is the biggest inflator of the large studios, with an average Sunday inflation of over 10%, with Rogue Pictures close in second. We should also mention that Dreamworks’ and Weinstein’s inflations are within the margin of error, and not statistically distinguishable from zero inflation.

Finally, Neil and I wanted to test the theory that studios would inflate their estimates to achieve a #1 ranking and accordingly generate better press. To examine this, we started with the revenue difference between the top film and the second film at the end of each Saturday (i.e. the two films contending for the #1 ranking). Then we compared that to the inflation. The idea is that we should see a negative relationship; in cases where two films are really closely fighting for first place, studios would have more incentive to inflate than in cases where the result is a foregone conclusion. Here are the results visually:

As you can see above, we do see a decline in bias as the contention amount increases. The line in the center of the chart represents the regression line, with a slope of just under -0.1% per each million dollars. That is, according to the data, a perfectly even race for first place would result in an average bias (for both the first and second place films) of 6.6%, while a $50M discrepancy between the two leaders would result in an average bias of 1.6% – which conforms exactly with our theory that studios are more likely to inflate their numbers when it grants (or sustains) a first place ranking. However, only a small portion of the variation in results is explained by this theory, which leads us to believe that there are still remaining and unknown factors at work.

Notes: Neil and I restricted the analysis to films with more than 1,000 theaters showing the film, as it’s unreasonable to expect accurate estimation in smaller releases. All numbers presented only reflect the difference between Sunday estimates and Sunday actuals, factoring out the Friday and Saturday results (from published dailies), since those are already known at the time of estimation. All data are from Box Office Mojo, checked against when possible.