January 05, 2010

Waking a Sleeping Chowhound: Another Star-Ratings Misstep?

Adding new social media features to established communities is always disruptive and not always a good idea.

In Chowhound Comes of Age (For Better or Worse) , Luke Tsai writes about how and why the addition of "industry-standard 5-star ratings" to restaurants on Chowhound.com has quaked the community there.

...Log on to the Chowhound message board for the San Francisco Bay Area and you'll find lengthy threads about where to find, say, the most decadent slice of chocolate cake or the best pajeon (Korean seafood pancakes) in the East Bay. You'll find highly technical analyses of the roasting and brewing methodology of local coffee purveyors.
...Up until fairly recently, one thing you wouldn't find on Chowhound was the kind of star ratings system favored by almost every other restaurant guide, whether in print or on the web — from Frommer's to Zagat to Yelp. On Chowhound, you couldn't give a restaurant any kind of quantitative rating.

In short, it was message boards about food, for and by chowhounds - self selected folks who liked to go off the beaten track to find something interesting to eat. Specifically, they liked to go to the places that were unrated, or rated poorly on other sites, just to find any diamonds-in-the-rough, especially unusual items.

It had no ads, no ratings, no shills (because of strong moderation), and no membership fees. It bootstrapped as a contribution financed community site. Eventually it was sold to CNET, which was sold to CBS, which has added ads and ratings in an attempt to capture revenue.

...Jacquilynne Schlesier, the site's community manager, has been helping to moderate Chowhound since the pre-CNET all-volunteer days. "Our users are incredibly passionate and incredibly knowledgeable," Schlesier says. "But it can be a little daunting if you're someone who's not a long-term chowhound." To help make the process less intimidating, they've revamped the site's restaurant listings — individual pages that have all the basic information about a particular restaurant along with links to relevant discussions on the message boards. It's on these pages that the star-rating feature appears.

Generating revenue is good goal. Most food sites that make money have ratings. Your typical product manager would get this far in their reasoning and implement an industry-standard 5-star rating system. This is what CBS/Chowhound apparently did.

But according to many of the site's devotees, the latest set of changes is particularly "unchowish," in large part because of the star-rating feature. ... Among other criticisms, [the founder of Chowhound] questions how it's possible to "rate a bakery that is horrendous except for one item so great it's worth a 100-mile trip along the same rating scale as a pretty-good diner, an inconsistent high-end sushi place, and an exemplary Italian-ice cart."

This is an excellent point. There is a context mismatch between the discussions (interesting food items) and the rating for a restaurant overall.

Why bother asking Chowhound users for a star rating? It's not like they were clamoring for this feature. This looks like Yelp envy to me. I saw similar lazy product design while at Yahoo! around the time Digg originally exploded in growth - property after property wanted to add "Thumbs up" buttons to everything from the weather to search results. [This was a bad design choice for almost all of them - fortunately, during this me-too frenzy, the legal mess from the posting of the DVD crack key helped most Yahoo! product managers figure out that the Yahoo! audience and Digg's were almost mutually exclusive.]

After spending some time at the Chowhound, I've noticed that those participating in the discussions aren't rating much. I couldn't find a restaurant in my area with more than 5 reviews, and five is probably the absolute minimum number of ratings that should be required for the average rating to mean anything. And even then, the average overall is going to be 4.5 stars - familiar to outside users, sure, but in the end pretty useless as a gauge of quality. And, unless CBS is going to buy ratings from someone else, they will never have enough to be useful in a regional search. Bootstrapping 5-star ratings from scratch is a big mistake.

If not 5-star overall ratings, what else?

Clearly the staff needs to find revenue, and advertising is what they've bet the farm on - so increasing the number of users and user-engagement is required. They had do to something.

But, given just the things discussed in this post, there are several other reputation-based things they could try instead...

1) Let the active board posters determine the context! If it's Best Pastrami Sandwich or Most Exotic Menu - let them give the awards to the restaurant. The simplest implementation of this is tagging, but allowing users to create award categories makes search-ranking easier.

2) Allow discussions/posts to be tagged as well - both with the name of the places that are discussed as well as the same user-generated topics...

3) Allow users to mark a place as a "favorite" which both increases the popularity of the place and puts that place on their profile. Combined with tagging, this is an advertisers dream!

4) Implement a karma system for contributors to discussions, increasing the search-rank value of the businesses they discuss, tag, favorite, etc.

All of these techniques are discussed in detail in our upcoming O'Reilly/Yahoo! Press book: Building Web Reputation Systems, also available in searchable draft form on our wiki.

The Chowhounds have valuable expertise they are sharing, they deserve better tools than a poor copy of every-other restaurant site!

December 16, 2009

The Sensical Moment: Asking for User Opinion When the Time is Right

If you're asking for explicit user opinions in your reputation system (ratings, reviews or even just a simple “Like”), pay special attention to exactly when you are asking for them. You'll get better data if you try to gather opinions when it makes most sense to do so: try to find the sensical moments to solicit user input.

Ideally, you'll catch reviewers in moments where they're…

Sufficiently Invested

Can you make it too easy for users to give reviews? You may not think so—if you're in the early stages of deploying your reputation system (or building your site), then you're probably more worried about getting people to use the system at all. And putting obstacles in front of potential reviewers certainly doesn't sound like a good way to alleviate those fears. But, long-term, the success of your reputation system will depend on quality, honest and unbiased opinions.

It may well be in your best interest to limit those who can, and cannot, give ratings. Require that users register, at least. Plain and simple. It should be the bare minimum level of investment that a user should make to voice an opinion on your site.

You may want to go even further. Yahoo! Answers, for instance, limits certain functions (rating questions & answers) to only those users who've achieved a certain status (Level 2) on the site.

Recommendation: Make it easy, but not too easy, for users to give an opinion. Bake in some degree of accountability and ownership for publicly stated opinions.

Appropriately Informed

Don't ask your users to provide opinions on things they haven't experienced. This may be tricky, because the temptation will be strong to make rating objects as easy and low-friction as possible, which typically means putting rating controls in an easy-to-find location and keeping them there consistently. But consider the reputation value of 5-star ratings on YouTube (which we covered here only recently): do you suppose those generally-lackluster ratings distributions would improve if YouTube only allowed users to rate a video after first watching it? (To completion?)

This shortcoming is not limited to YouTube: years ago, Saleem Khan noted a trend on Digg where people were Digging up submissions with no way to have actually read the associated articles. (They couldn't have read them—the articles in question had gone offline before the favorable reviews continued to pour in.)

And even Apple has fallen victim to this oversight. Early iterations of the App Store rating system allowed for anyone to rate an iPhone app—whether they'd ever actually installed the app or not! This violates the "sufficient investment" principle, above, but it also seriously calls into question those reviewers' qualification to review. There's simply no way those ratings could have carried any real value—the reviewers weren't making informed decisions.

Apple eventually fixed this oversight. Now, you're given the opportunity to rate any app from the App Store interface, but when you try to do so for an app you've never tried?

MustOwn.png


Recommendation: Place ratings inputs either spatially or temporally downstream of the act of consumption.

But Not Overly Biased

Although Apple addressed that problem, they also introduced a new one. Now, when iPhone users attempt to delete an app from their device, they are asked to first rate the app.

iphone-rate.jpg

This is, of course, a horrible time to ask a user to rate an application. After they've made the decision that they no longer need the app and just as they're in the process of deleting it. Even an app that a user loved may fare poorly under these circumstances.

Perhaps it's truly a horrible app—in which case a bad rating would be justified— or perhaps the user just no longer has any use for it. (Maybe it's a game that he or she has already beaten, or a Twitter client made superfluous by a newer, sexier alternative.) By the time a user is uninstalling an iPhone app, the love affair with that app—if there ever was one—is unmistakably on the wane, and the average ratings likely reflect that fact.

Recommendation: Don't ask for ratings at the low-point of a user's relationship to the rated object.

And not too distracted

Another major sin of the App Store's "parting shot" rating request is that it makes the act of rating into a roadblock. In this excellent comment, PJ Cabrera makes the point:

Who knows how many users are just inputting anything just to move on, without paying attention to what they're doing[?]
True, there is a "No Thanks" button, but its meaning is ambiguous and some reviewers may mistake its intent (perhaps reading it as a "Cancel this deletion" action instead.) It is hard for users to give honest and considered opinions when they are still caught up in the experience that you're asking them to evaluate.

It's common practice, when buying a new car, to receive a customer satisfaction survey from the manufacturer. (This survey is used as an input into the car-selling reputation of the dealership you bought from.) Why do you suppose that the manufacturers will typically wait a week or more before sending you the survey? It's because they know that with a little time and distance from the (often stressful) day of the transaction that you're more likely to give a measured, thoughtful and accurate assessment of the transaction. (You're probably also more inclined to give a positive review, but that's an discussion for another post.)

Recommendation: Respect the primary tasks that a user may be engaged in on your site. Don't interrupt them unnecessarily in order to solicit ratings.

Special thanks to Laurent Stanevich for providing the iPhone app rating screenshot.

December 09, 2009

A Sneak-Peek at Reputation Concepts

Reputation Wednesday is an ongoing series of essays about reputation-related matters. This week, Bryce shares a simple work-in-progress and solicits your input to make it better.

Once upon a time, in (what feels like) a previous life, I illustrated some moderately well-received concept maps: diagrams intended to communicate some simple concepts about software systems and show the interrelationships between their moving parts.

Throughout work on Building Web Reputation Systems, it has always been my intent to attempt a compelling, engaging and fun-to-read concept map. Something to demonstrate the concepts that we've drawn on throughout the book. That was my intent anyway—it just never occurred to me how much work writing a book was going to be. So it hasn't been until fairly recently (like… um, tonight, actually) that I've been able to start pulling something together.

Adhering to our open policy, here, then is that very first rough-and-ugly (and incomplete!) sketch. (Click it for the full version on Flickr.)

RepConcepts.png

I usually don't use Omnigraffle in the design of these concept maps, but it's looseness and speed of idea-capture just felt right for this one, so I'll probably let the general shape of the map simmer for a while in it before moving it over to Illustrator for some fun touches and polish.

This sketch is, admittedly, incomplete. I have a paper version, drafted beforehand, that's easily 150% this size (in terms of # of concepts and linkages.) Please feel free to comment here, or over on Flickr. Hopefully you've enjoyed this brief light interlude, and I'll share more about the progress on the Reputation Systems Concept Map as it evolves.

December 02, 2009

The Cake is a Lie: Reputation, Facebook Apps, and "Consent" User Interfaces

Reputation Wednesday is an ongoing series of essays about reputation-related matters. This week, Randy comes back from the IIW with a simple idea for improving application permissioning.

In early November, I attended the 9th meeting of the Internet Identity Workshop. One of the working sessions I attended was on Social Consent user interface design. After the session, I had an insight that reputation might play a pivotal role in solving one of the key challenges presented. I shared my detailed, yet simple, idea with Kevin Marks and he encouraged me to share my thoughts through a blog post—so here goes…

The Problem: Consent Dialogs

The technical requirements for the dialog are pretty simple: applications have to ask users for permission to access their sensitive personal data in order to produce the desired output—whether that's to create an invitation list, or to draw a pretty graph, or to create a personalized high-score table including your friends, or to simply sign and attach an optional profile photo to a blog comment.

The problem, however, is this—users often don't understand what they are being asked to provide, or the risks posed by granting access. It's not uncommon for a trivial quiz application to request access to virtually the same amount of data as much more "heavyweight"applications (like, say, an app to migrate your data between social networks.) Explaining this to users—in any reasonable level of detail—just before running the application causes them to (perhaps rightfully) get spooked and abandon the permission grant.

Conflicting Interests

The platform providers want to make sure that their users are making as informed a decision as possible, and that unscrupulous applications don't take advantage of their users.

The application developers want to keep the barriers to entry as low as possible. This fact creates a lot of pressure to (over)simplify the consent flow. One designer quipped that it reduces the user decision to a dialog with only two buttons: "Go" and "Go Away" (and no other text.)

The working group made no real progress. Kevin proposed creating categories, but that didn't get anywhere because it just moved the problem onto user education—"What permissions does QuizApp grant again?"

Reputation to the Rescue?

All consent dialogs of this stripe suffer from the same problem: Users are asked to make a trust decision about an application that, by definition, they know nothing about!

This is where identity meets trust, and that's the kind of problem that reputation is perfect for. Applications should have reputations in the platform's database. That reputation can be displayed as part of the information provided when granting consent.

Here's one proposed model (others are possible, this is offered as an exemplar).

The Cake is a Lie: Your Friends as Canaries in the Coal Mine of New Apps

First a formalism: when an application wants to access a user's private Information (I), they have a set of intended Purposes (P) they wish to use it for. Therefore, the consent could be phrased thusly:

"If you let me have your (I), I will give you (P). [Grant] [Deny]"

Example: "If you give me access to your friends list, I will give you cake."

In this system, I propose that the applications be compelled to declare this formulation as part of the consent API call. (P) would be stored along with the app's record in the platform database. So far, this is only slightly different from what we have now, and of course, the application could omit or distort the request.

This is where the reputation comes in. Whenever a user uninstalls an application, the user is asked to provide a reason, including abusive use of data and specifically asks a question to see if the promise of (P) was kept.

"Did this application give you the [cake] it promised?"

All negative feedback is kept—to be re-used later when other new users install the app and encounter the consent dialog. If they have friends who have uninstalled this application already complaining that "If (I) then (P)" string was false, then the moral equivalent of this would appear scrawled in the consent box:


"Randy says the [cake] was unsatisfactory.
Bryce says the [cake] was unsatisfactory.
Pamela says the application spammed her friends list."

Afterthoughts

Lots of improvements are possible (not limiting it to friends, and letting early-adopters know that they are canaries in the coal mine.) These are left for future discussion.

Sure, this doesn't help early adopters.

But application reputation quickly shuts down apps that do obviously evil stuff.

Most importantly, it provides some insight to users by which they can make more informed consent decisions.

(And if you don't get the cake reference, you obviously haven't been playing Portal.)

December 01, 2009

Pardon our dust...

The book is coming into it's next phase, and we're cleaning up all the messy bits before we hand it off to O'Reilly. This entails renaming every image file and other grubbiness that is bound to break things for a day or two.

Please bear with us while we get things back in order on the blog and wiki.

Bryce and Randy

November 18, 2009

Reputation is Identity

Reputation Wednesday is an ongoing series of essays about reputation-related matters. This week's entry discusses the ways that reputation can make for richer user identities on your site. It is lightly adapted from our draft of Chapter 8.

Imagine you're at a party, and your friend Ted wants you to meet his friend, Mary. He might very well say something like… "I want you to meet my friend, Mary. She's the brunette over by the buffet line." A fine, beginning, to be sure. It helps to know who you're dealing with. But now imagine that Ted ended there as well. He doesn't take you by the arm, walk you over to Mary, and introduce you face to face. Maybe he walks off to get another drink. Um… this does not bode well for your new friendship with Mary.

Sadly, until fairly recently, this has been the state of identity on much of the Web. When people were represented at all, they were often nothing more than a meager collection of sparse data elements: a username; maybe an avatar; just enough identifying characteristics that you might recognize them again later, but not much else.

With the advent of social on the web, things have improved. Perhaps the biggest improvement has been that now people's relationships formulate a sizable component of their identity and presence on most sites. Now, mutual friends or acquaintances can act as a natural entree to forming new relationships. So at least Ted now will go that extra step and walk you over to that buffet table for a proper introduction.

But, you still won't know much about Mary, will you? Once introductions are out of the way, what will you possibly have to talk about? The addition of reputation to your site will provide that much-needed final dimension to your users' identities: depth. Wouldn't it be nice to review a truly rich and deep view of Mary's identity on your site before deciding what you and she will or won't have in common?

Here are but a few reasons why user identities on your site will be stronger with reputation than they would be without.

  • Reputation is based on history and the simple act of recording those histories – a user's past actions, or voting history, or the history of their relationship to the site – provides you with a lot of content (and context) that you can present to other users. This is a much richer model of identity than just a display-name and an avatar.
  • Visible histories reveal shared affinities and allow users with common interests to find each other. If you are a Top Contributor in the Board Games section of a site, then like-minded folks can find you, follow you, or invite you to participate in their activities.

    You will, however, find contexts where this is not desirable. On a question-and-answer site like Yahoo! Answers, for instance, don't be surprised to find out that many users won't want their questions about gonorrhea or chlamydia to appear as part of their historical record. Err on the side of giving your users control over what appears, or give them the ability to hide their participation history altogether.

  • A past is hard to fake. Most site identities are cheap. In and of themselves, they just don't mean much. A couple of quick form-fields, a 'Submit' button and practically anyone (or no one– bots welcome!) can become a full-fledged member of most sites. It is much harder, however, to fake a history of interaction with a site for any duration of time.

    We don't mean to imply that it can't be done – harvesting 'deep' identities is practically an offshoot industry of the MMORPG world (See the figure above.) But it does provide a fairly high participatory hurdle to jump. When done properly, user karma can assure some level of commitment and engagement from your users. (Or at least allow you to ascertain those levels quickly.)

  • Reputation disambiguates identity conflicts. Hopefully, you've moved away from publicly identifying users on your site by their unique identifier. (You have read the Tripartite Identity Pattern, right?) But this introduces a whole new headache: identity spoofing. If your public namespace doesn't guarantee uniqueness (or even if it does– it'll be hard to guard against similar-appearing/l33t-speak equivalents and the like) then you'll have this problem.

    Once your community is at scale, trolls will take great delight in appropriating others' identities – assuming the same display name, uploading the same avatar – purely in an effort to disrupt conversations. It's not a perfect defense, but always associate a contributor's identity with his or her participation history or reputation to help mitigate these occurrences. You will, at least, have armed the community with the information they need to decide who's legit and who's an interloper.

These are some of the reasons that extending user identities with reputation is useful. Chapter 8 of Building Web Reputation Systems offers a series of considerations for how to do so most effectively.

November 11, 2009

5-Star Failure?

Reputation Wednesday is an ongoing series of essays about reputation-related matters. This week's entry confirms that poorly chosen reputation inputs will indeed yield poor results.

Pity the poor, beleaguered 5-Star rating. Not so very long ago, it was the belle of the online ratings ball: its widespread adoption by high-profile sites like Amazon, Yahoo!, and Netflix influenced a host of imitators, and—at one point—star-ratings were practically an a priori choice for site designers when considering how best to capture their users' opinions. Their no-brainer inclusion had almost reached cargo cult design status.

This has subsided in recent years, as stars have received stiff competition from hot, upstart mechanisms like "Digg-style" voting (what we, when contributing to the Yahoo! Pattern Library, rechristened as Vote to Promote.) And Facebook's "Like" action (which, I guess, was ahem, "inspired by" FriendFeed though, let us not forget that for a time, also flirted with Thumbs Up & Down rating of feed items.) Definitely, within the past 2 or 3 years, stars 'obvious' appeal as the ratings mechanism of choice is no longer so obvious.

Even more recently, 5-Star ratings fall from grace is almost complete. YouTube fired the first volley, declaring that, by and large, people on YouTube overwhelmingly give 5 stars to videos on that site. (For readers of this site, you'll recall that we blogged about similar J-Curve distributions that are prevalent on Yahoo! as well.)

And then the venerable Wall Street Journal declared that On the Internet, Everyone's a Critic But They're Not Very Critical:

One of the Web's little secrets is that when consumers write online reviews, they tend to leave positive ratings: The average grade for things online is about 4.3 stars out of five.
And, just like that, as quickly as 'stars are it' rose to prominence, 'stars are dead' is rapidly becoming the accepted wisdom. (Don't believe me? Read the comments when TechCrunch covered the YouTube discovery, and you'll see folks all-but-rushing to prop up a variety of their 'preferred rating mechanism' in stars' place.)

Are stars dead?

This is, of course, the wrong way to frame the question. Stars, thumbs, favorites, or sliders: any of these ratings input mechanisms are dead-on-arrival if they're not carefully considered within the context of use. 5-Star ratings require a little more cognitive investment than a simple 'I Like This' statement, so--before designing 5-star ratings into your system--consider the following.

Will it be clear to users what you're asking them to assess? It's not entirely surprising that YouTube's ratings overwhelmingly tend toward the positive. That's a long-observed and well understood phenomenon in the social sciences called Acquiescence Bias. It is "the tendency of a respondent to agree with a statement when in doubt." And 5-star ratings, in the case of YouTube, are nothing but doubt. What, exactly, is a fair and accurate quantitative assessment for a video on YouTube? The input mechanism does provide some clues, in the form of text hints for the various ratings levels (ranging from 'Poor' to 'Awesome!') but these are highly subjective and - themselves - way too open to interpretation.

Is a scale necessary? If the primary decision you're asking users to make is 'good vs. bad' or 'I liked it' or 'I didn't', then are multiple steps of decisioning really adding anything to their evaluation?

Are comparisons being made? Should I, as a user, rate videos in comparison to other similar videos on YouTube? What, exactly, distinguishes a 5-star football to the groin video from a 2-star? Am I rating against like videos? Or all videos on YouTube? (Or every video I've ever seen!?)

Have they watched the video? One way to encourage more-thoughtful ratings is to place the input mechanism at the proper juncture: make some attempt, at least, to ensure that the user is rating the thing only after having experienced it. YouTube's 5-star mechanism is fixed and always-present, encouraging drive-by ratings, premature ratings or just general sloppiness of assessment.

So, are stars inappropriate for YouTube, at least in the way that they've designed them? Probably, yes.

To wrap up, some quick links. Check out this elegant and innovative design that the folks at Steepster recently rolled out, and think about the ways it cleverly addresses all four of the concerns listed above.

And to see a really in-depth study of 5-star ratings used effectively, check out Using 5-Star Ratings from Christopher Allen & Shannon Appelcline's excellent series on Systems for Collective Choice.


October 28, 2009

Ebay's Merchant Feedback System

Reputation Wednesday is an ongoing series of essays about reputation-related matters. This week, we explore, to some depth, one of the Web's longest-running and highest-profile reputation systems. (We also test-drive our new Google-maps powered zoomable diagrams. Wheee!)

EBay contains the Internet's most well-known and studied user reputation or karma system: seller feedback. Its reputation model, like most others that are several years old, is complex and continuously adapting to new business goals, changing regulations, improved understanding of customer needs, and the never-ending need to combat reputation manipulation through abuse.

Rather than detail the entire feedback karma model here, we'll focus on claims that are from the buyer and about the seller. An important note about eBay feedback is that buyer claims exist in a specific context: a market transaction-a successful bid at auction for an item listed by a seller. This specificity leads to a generally higher quality-karma score for sellers than they would get if anyone could just walk up and rate a seller without even demonstrating that they'd ever done business with them; see Chapter 1- Implicit Reputation.

The scrolling/zooming diagram below shows how buyers influence a seller's karma scores on eBay. Though the specifics are unique to eBay, the pattern is common to many karma systems. For an explanation of the graphical conventions used, see Chapter 2.

The reputation model in this figure was derived from the following eBay pages: http://pages.ebay.com/help/feedback/scores-reputation.html and http://pages.ebay.com/services/buyandsell/welcome.html, both current as of July 2009.

We have simplified the model for illustration, specifically by omitting the processing for the requirement that only buyer feedback and Detailed Seller Ratings (DSR) provided over the previous 12 months are considered when calculating the positive feedback ratio, DSR community averages, and–by extension–power seller status. Also, eBay reports user feedback counters for the last month and quarter, which we are omitting here for the sake of clarity. Abuse mitigation features, which are not publicly available, are also excluded.

This diagram illustrates the seller feedback karma reputation model, which is made out of typical model components: two compound buyer input claims-seller feedback and detailed seller ratings-and several roll-ups of the seller's karma: community feedback ratings (a counter), feedback level (a named level), positive feedback percentage (a ratio), and the power seller rating (a label).

The context for the buyer's claims is a transaction identifier-the buyer may not leave any feedback before successfully placing a winning bid on an item listed by the seller in the auction market. Presumably, the feedback primarily describes the quality and delivery of the goods purchased. A buyer may provide two different sets of complex claims, and the limits on each vary:

  • 1. Typically, when a buyer wins an auction, the delivery phase of the transaction starts and the seller is motivated to deliver the goods of the quality advertised in a timely manner. After either a timer expires or the goods have been delivered, the buyer is encouraged to leave feedback on the seller, a compound claim in the form of a three-level rating-positive, neutral, or negative-and a short text-only comment about the seller and/or transaction. The ratings make up the main component of seller feedback karma.
  • 2. Once each week in which a buyer completes a transaction with a seller, the buyer may leave detailed seller ratings, a compound claim of four separate 5-star ratings in these categories: item as described,communications,shipping time,and shipping and handling charges.The only use of these ratings, other than aggregation for community averages, is to qualify the seller as a power seller.

EBay displays an extensive set of karma scores for sellers: the amount of time the seller has been a member of eBay; color-coded stars; percentages that indicate positive feedback; more than a dozen statistics track past transactions; and lists of testimonial comments from past buyers or sellers. This is just a partial list of the seller reputations that eBay puts on display.

The full list of displayed reputations almost serves as a menu of reputation types present in the model. Every process box represents a claim displayed as a public reputation to everyone, so to provide a complete picture of eBay seller reputation, we'll simply detail each output claim separately:

  • 3. The feedback score counts every positive rating given by a buyer as part of seller feedback, a compound claim associated with a single transaction. This number is cumulative for the lifetime of the account, and it generally loses its value over time-buyers tend to notice it only if it has a low value.

It is fairly common for a buyer to change this score, within some time limitations, so this effect must be reversible. Sellers spend a lot of time and effort working to change negative and neutral ratings to positive ratings to gain or to avoid losing a power seller rating. When this score changes, it is then used to calculate the feedback level.

  • 4. The feedback level claim is a graphical representation (in colored stars) of the feedback score. This process is usually a simple data transformation and normalization process; here we've represented it as a mapping table, illustrating only a small subset of the mappings. This visual system of stars on eBay relies, in part, on the assumption that users will know that a red shooting star is a better rating than a purple star. But we have our doubts about the utility of this representation for buyers. Iconic scores such as these often mean more to their owners, and they might represent only a slight incentive for increasing activity in an environment in which each successful interaction equals cash in your pocket.
  • 5. The community feedback rating is a compound claim containing the historical counts for each of the three possible seller feedback ratings-positive, neutral, and negative-over the last 12 months, so that the totals can be presented in a table showing the results for the last month, 6 months, and year. Older ratings are decayed continuously, though eBay does not disclose how often this data is updated if new ratings don't arrive. One possibility would be to update the data whenever the seller posts a new item for sale.

The positive and negative ratings are used to calculate the positive feedback percentage.

  • 6. The positive feedback percentage claim is calculated by dividing the positive feedback ratings by the sum of the positive and negative feedback ratings over the last 12 months. Note that the neutral ratings are not included in the calculation. This is a recent change reflecting eBay's confidence in the success of updates deployed in the summer of 2008 to prevent bad sellers from using retaliatory ratings against buyers who are unhappy with a transaction (known as tit-for-tat negatives). Initially this calculation included neutral ratings because eBay feared that negative feedback would be transformed into neutral ratings. It was not.

This score is an input into the power seller rating, which is a highly-coveted rating to achieve. This means that each and every individual positive and negative rating given on eBay is a critical one–it can mean the difference for a seller between acquiring the coveted power seller status, or not.

  • 7. The Detailed Seller Ratings community averages are simple reversible averages for each of the four ratings categories: item as described,communications,shipping time,and shipping and handling charges.There is a limit on how often a buyer may contribute DSRs.

EBay only recently added these categories as a new reputation model because including them as factors in the overall seller feedback ratings diluted the overall quality of seller and buyer feedback. Sellers could end up in disproportionate trouble just because of a bad shipping company or a delivery that took a long time to reach a remote location. Likewise, buyers were bidding low prices only to end up feeling gouged by shipping and handling charges. Fine-grained feedback allows one-off small problems to be averaged out across the DSR community averages instead of being translated into red-star negative scores that poison trust overall. Fine-grained feedback for sellers is also actionable by them and motivates them to improve, since these DSR scores make up half of the power seller rating.

  • 8. The power seller rating, appearing next to the seller's ID, is a prestigious label that signals the highest level of trust. It includes several factors external to this model, but two critical components are the positive feedback percentage, which must be at least 98%, and the DSR community averages, which each must be at least 4.5 stars (around 90% positive). Interestingly, the DSR scores are more flexible than the feedback average, which tilts the rating toward overall evaluation of the transaction rather than the related details.

Though the context for the buyer's claims is a single transaction or history of transactions, the context for the aggregate reputations that are generated is trust in the eBay marketplace itself. If the buyers can't trust the sellers to deliver against their promises, eBay cannot do business. When considering the roll-ups, we transform the single-transaction claims into trust in the seller, and–by extension–that same trust rolls up into eBay. This chain of trust is so integral and critical to eBay's continued success that they must continuously update the marketplace's interface and reputation systems.

October 21, 2009

User Motivations & System Incentives

Reputation Wednesday is an ongoing series of essays about reputation-related matters. This week's entry summarizes our model for describing user motivations and incentives for participation in reputation systems.

This is a short summary of a large section of Chapter 6 of our book, Building Web Reputation Systems, entitled Incentives for User Participation, Quality, and Moderation. For this blog post, the content is being shuffled a bit. First we will name the motivations and related incentive models, then we'll describe how reputation systems interact with each motivational category. To read a more detailed discussion of the incentive sub-categories, read the Chapter 6.

Motivations and Incentives for social media participation:

  • Altruistic motivation: for the good of others
    • Tit-for-Tat or Pay-it-Forward incentives: I do it because someone else did it for me first"
    • Friendship incentives: "I do it because I care about others who will consume this"
    • Know-it-All or Crusader or Opinionated incentives: "I do it because I know something everyone else needs to know"
  • Commercial motivation: to generate revenue
    • Direct revenue incentives: Extracting commercial value (better yet, cash) directly from the user as soon as possible
    • Branding incentives: Creating indirect value by promotion - revenue will follow later
  • Egocentric motivation: for self-gratification
    • Fulfillment incentives: The desire to complete a task, assigned by oneself, a friend, or the application
    • Recognition incentives: The desire for the praise of others
    • The Quest for Mastery: Personal and private motivation to improve oneself

Altruistic or Sharing Incentives

Altruistic, or sharing, incentives reflect the giving nature of users who have something to share-a story, a comment, a photo, an evaluation-and who feel compelled to share it on your site. Their incentives are internal: they may feel an obligation to another user or to a friend, or they may feel loyal to (or despise) your brand.

When you're considering reputation models that offer altruistic incentives, remember that these incentives exist in the realm of social norms-they're all about sharing, not accumulating commercial value or karma points. Avoid aggrandizing users driven by altruistic incentives-they don't want their contributions to be counted, recognized, ranked, evaluated, compensated, or rewarded in any significant way. Comparing their work to anyone else's will actually discourage them from participating.

(See more on Tit-for-Tat, Friend, and Know-it-All altruistic incentives.)

Commercial Incentives

Commercial incentives reflect people's motivation to do something for money, though the money may not come in the form of direct payment from the user to the content creator. Advertisers have a nearly scientific understanding of the significant commercial value of something they call branding. Likewise, influential bloggers know that their posts build their brand, which often involves the perception of them as subject matter experts. The standing that they establish may lead to opportunities such as speaking engagements, consulting contracts, improved permanent positions at universities or prominent corporations, or even a book deal. A few bloggers may actually receive payment for their online content, but more are capturing commercial value indirectly.

Reputation models that exhibit content control patterns based on commercial incentives must communicate a much stronger user identity. They need strong and distinctive user profiles with links to each user's valuable contributions and content. For example, as part of reinforcing her personal brand, an expert in textile design would want to share links to content that she thinks her fans will find noteworthy.

But don't confuse the need to support strong profiles for contributors with the need for a strong or prominent karma system. When a new brand is being introduced to a market, whether it's a new kind of dish soap or a new blogger on a topic, a karma system that favors established participants can be a disincentive to contribute content. A community decides how to treat newcomers-with open arms or with suspicion. An example of the latter is eBay, where all new sellers must "pay their dues" and bend over backward to get a dozen or so positive evaluations before the market at large will embrace them as trustworthy vendors. Whether you need karma in your commercial incentive model depends on the goals you set for your application. One possible rule of thumb: If users are going to pass money directly to other people they don't know, consider adding karma to help establish trust.

(See more on Direct revenue and Branding commercial incentives.)

Egocentric Incentives

Egocentric incentives are often exploited in the design online in computer games and many reputation based web sites. The simple desire to accomplish a task taps into deeply hard-wired motivations described in behavioral psychology as classical and operant conditioning (which involves training subjects to respond to food-related stimulus) and schedules of reinforcement. This research indicates that people can be influenced to repeat simple tasks by providing periodic rewards, even a reward as simple as a pleasing sound.

But, an individual animal's behavior in the social vacuum of a research lab is not the same as the ways in which we very social humans reflect our egocentric behaviors to one another. Humans make teams and compete in tournaments. We follow leaderboards comparing ourselves to others and comparing groups that we associate ourselves with. Even if our accomplishments don't help another soul or generate any revenue for us personally, we often want to feel recognized for them. Even if we don't seek accolades from our peers, we want to be able to demonstrate mastery of something-to hear the message "You did it! Good job!"

Therefore, in a reputation system based on egocentric incentives, user profiles are a key requirement. In this kind of system, users need someplace to show off their accomplishments-even if only to themselves. Almost by definition, egocentric incentives involve one or more forms of karma. Even with only a simple system of granting trophies for achievements, users will compare their collections to one another. New norms will appear that look more like market norms than social norms: people will trade favors to advance their karma, people will attempt to cheat to get an advantage, and those who feel they can't compete will opt out altogether.

Egocentric incentives and karma do provide very powerful motivations, but they are almost antithetical to altruistic ones. The egocentric incentives of many systems have been over-designed, leading to communities consisting almost exclusively of experts. Consider just about any online role playing game that survived more than three years. For example, to retain its highest-level users and the revenue stream they produce, Worlds of Warcraft must continually produce new content targeted at those users. If they stop producing new content for their most dedicated users, their business will collapse. This elder game focus stunts WoW's growth -- parent company Blizzard has all-but-abandoned improvements aimed at acquiring new users. When new users do arrive (usually in the wake of a marketing promotion), they end up playing alone because the veteran players are only interested in the new content and don't want to bother going through the long slog of playing through the lowest levels of the game yet again.

(See more on Fulfillment, Recognition, and Quest-for-Mastery egocentric incentives.)

October 14, 2009

A Case Study: Yahoo! Answers Community Moderation

Reputation Wednesday is an ongoing series of essays about reputation-related matters. This week's entry announces two important milestones.


We are proud to announce that our Chapter 12 Case Study—Yahoo! Answers Community Content—is now available for review! This chapter is a doozy. Using the structure and guidance from the rest of the book, it attempts to describe, in detail, a project that has saved Yahoo! millions of dollars in customer care costs (and produced a stronger, more content-vibrant community in the process.) No excerpts here. It's all good stuff—go read it.

And, coinciding with this draft chapter release, Randy and I can also announce that we've achieved an important milestone for the book: draft complete status. Our editor Mary blessed it on Monday. We're expecting feedback from our early reviewers soon that will dictate the tempo and scope of re-writes, so… stay tuned! We will, of course, continue to blog here and stick faithfully to our Reputation Wednesday schedule.

Whew.