—Galactic Horse Knows All The Best Places (NASA)
Trust, Complexity, And The 5-Star Rating
No two rating systems are created equal. Yelp and Netflix, for example, both feature five-star ratings and reviews, yet the relationships we have with each are full of nuance and have very different effects on our motivations for using the sites and contributing reviews, our assessment of the reviews and those who wrote them, our trust in the various user communities, and on the actions we take when using each site.
A handful of us recently spent some time discussing rating systems and user reviews, prompted by Christina Cacioppo’s post on the topic from late last year. Our conversation coalesced around the variables that affect our interpretation of various 5-star ratings, and the ways they are deployed by different sites. Below are two hypotheses about the differences between crowdsourced recommendation systems.
1. The primary user benefit of a rating system affects our motivations to rate and review, our rating behavior, and our perception of the aggregate ratings (and the collective who contributed them).
Rating films on Netflix helps the system learn about you, affecting the experience you as an individual will enjoy later. Rating venues on Yelp primarily affects the collective: the experience the community as a whole may have in the future. This “public vs. private” distinction can affect our use of each site. Ratings in Netflix’s system have a personal utility value – you rate movies and programs because you are trying to help the system learn about you, and after doing so, are recommended content you are more likely to enjoy.
On the flip-side, there is a performance element to rating behavior on Yelp. The site delivers a degree of social utility, and users are influenced by the fact that reviews submitted to the site are publicly displayed – the Hawthorne Effect is a psychological term that describes the phenomenon whereby people modify their behavior when they know they are being watched or observed. (Public postings are Yelp’s default, whereas you have to dig a little to find an individual’s reviews on Netflix). Writing in public, users might be slightly more likely to post reviews that demonstrate their support of particular businesses they enjoy, deriving some social capital from being seen to endorse particular venues. Similarly, Yelp offers the opportunity to bring public justice to venues that disappoint users.
2. The level of trust people place in an aggregate rating system is determined by the cohesiveness of the community and the sophistication of the recommendation engine.
Trust introduces an interesting dynamic to socially-fueled recommendation systems. There are thousands of Yelp users active in NYC, a community of diverse and varied tastes and ideas about what constitutes a good dining experience. The sheer size of the community can create as much noise as it does signal. One person’s 2 star review filled with cringing over $17 pancakes will help some users, but be entirely irrelevant to others who don’t think $17 pancakes warrant such a dismal review. As Percolate co-founder Noah Brier noted in 2009, the integrity of aggregate rating systems is inherently challenged when there are unclear values for the rating scales they employ.
Ratings in Netflix’s system have a personal utility value – you rate movies and programs to help the system learn about you so it can recommended content you are more likely to enjoy. Yelp, on the other hand, asks you to help the collective.
An alternative to Yelp’s public free-for-all is Foursquare Explore, which serves recommendations based on a much smaller selection of users picked from a one’s existing community. The level of trust in the recommendations offered is likely to rise when you personally know the reviewers. Foursquare Explore kicks this up two more notches by providing a venue check-in count from each of your friends. This provides a helpful additional parameter, letting users know how often their friends frequent places they recommend. In the screenshot below, for instance, we can see that a friend whose taste in food we respect has checked into the recommended bar 40 times – a much more powerful recommendation than a simple, “Twelve people in your network have been here.”
(Click to Embiggen)
Another contributor to trust is a sophisticated algorithm, or at least, the perception of one – significant degree of the public’s perception of the value and reliability of Netflix comes from the purported sophistication of its algorithm. The company has not only heavily invested in making it as accurate as possible, but it has promoted this fact significantly. Doing so, it has built public trust in the sophistication, correctness and value of recommendations powered by it. We believe this public perception of sophistication affects the way people apply ratings within the system – users end up being more honest with Netflix not only because they are invested in the selections it provides them, but because they trust the algorithm to provide useful recommendations.
These musings remind us that, as with most digital experiences, context is everything – taking things like primary end user benefit, diversity of audience, size of user base, and sophistication of recommendation engines into account when consulting a socially-fueled recommendation. Just aggregating reviews is not enough – true advantage comes from creating a system users trust enough to use, which lowers the opportunity cost of their endeavors, and provides understandable feedback about the the experience they’re considering. Balancing these things out isn’t simple, but valid attempts are more likely to succeed.
This post was co-written by Strategist Joshua Green.

