Bias in Recommendation Systems

Modern-day web systems rely on user feedback (such as click activity or ratings) to build ML models to personalize recommendations. Such recommender systems form the core of several popular applications that recommend short videos (Instagram, TikTok, YouTube), timeline feeds (Twitter, Facebook), the next product to buy (Amazon, Ebay), etc.

User Interaction Adds Bias

Recommender systems that rely on users' behavioral data are full of biases. This data, for instance, can be affected by the following factors:

1. Selection bias: It arises due to the user’s self-selection behavior. For example, a user might rate a movie they like but rarely rate a movie they dislike. Training on such a dataset is challenging since high ratings account for the majority of observed ratings. These ratings are not a representative sample of all ratings; that is, the rating data is often missing not at random, thus incorporating a selection bias. The following figure from a rating survey highlights this issue, where users tend to only rate items that they like.

2. Exposure bias: The user is likely to watch a recommended video, even if it is not the best fit. This action is then taken as positive feedback by the system to recommend similar videos further, resulting in exposure bias in the data.

3. Conformity bias: A user may be influenced by public opinions and might not select their true preferences. This results in conformity bias.

4. Position bias: The user will likely watch one of the top 5 videos in response to a YouTube search. The bias arising due to the display position of the item is referred to as Position bias.

Biases present in data

The above biases are due to user behaviors, but there are biases present in the training data itself, such as the popularity bias, where some items are more popular than others and hence generate more feedback from users, eventually making recommendations biased towards them.

Another bias generally seen in training data is Unfairness, which is due to the recommender system unfairly discriminating against certain groups of individuals, such as on the basis of gender, race, age, wealth, education level, etc. For example, in the context of job recommendation, it has been found that women see fewer advertisements about high-paying jobs due to gender imbalance in training data.

Finally, due to the Feedback Loop in recommender systems, these biases only intensify over time, resulting in a “the rich get richer” effect. This is because a bias in data results in a bias in recommendations, which in turn impacts the exposure and selection of users, causing further biases in the data.

An illustration of how popularity bias can exacerbate through feedback

Feedback loops are detected by measuring the diversity of popularity of items in the system. The popularity of all items generally follows a long-tail distribution, where most of the items are not interacted with at all. In general, the more diversity of outputs of the recommender systems (aka the long-tail distribution has a greater entropy), the less the system suffers from degenerate feedback loops. On the other hand, low scores imply a homogeneous system that suffers from popularity bias.

How to solve it?

Due to the above biases, the data observed by a recommendation system used to personalize users’ preferences might deviate from reflecting users’ true preferences.

The ubiquity of such systems in modern web companies has resulted in a growing interest in ameliorating bias in recommender systems. The following methods have gained significant interest due to their effectiveness in reducing bias.

1. Propensity Score: Propensity scores can be calculated and fed back into the training loop to reduce bias due to the user's observation of items. The propensity score models the probability of exposure to a specific item. These probabilities are then later used to reweigh the interactions while retraining. For example, if the probability of exposure to an item is high, it can be downweighed during training to reduce the effects of its exposure on the user.

2. Data Imputation: Selection bias happens due to missing data (e.g., users prefer to give high ratings than low ratings). Data imputation can be used to solve for selection bias by imputing the missing entries with pseudo-labels.

3.Modeling Popularity Influence Conformity bias occurs when users are influenced by popular opinion. One way to reduce its effect is to disentangle the effect caused by conformity by leveraging the average popularity of the item and offsetting it from the user’s ratings.

4. Sampling: Apart from propensity-score based weighing, sampling of items while retraining can be used to address exposure bias. The sampling determines which items to choose, and its distribution can be used as item confidence weights. Thus, it can limit the effect due to exposure by choosing an appropriate sampling distribution.

5. Click Models: Position bias, where the item is more likely to be interacted with due to its display position, can be mitigated using click models. The idea is to model the generative process of clicks as a variable dependent on both item’s position and relevance. Later, during retraining, the effect of its position can be offset to determine the item's relevance.

6. Regularization: Regularization can be used to mitigate popularity bias and unfairness in recommender systems. It has been shown that introducing suitable regularization terms can result in more balanced recommendation results.

7. Rebalancing: A simple method to tackle unfairness is to rebalance the training dataset with specific fairness objectives like gender parity. This can be done using re-labeling the positive labels in favor of the minority class or resampling the training data to balance the size of both classes.

8. Adversarial Learning: Apart from regularization, adversarial learning can be employed to ameliorate popularity bias and unfairness. The basic idea is to introduce an adversary whose aim is to confuse the recommender by giving a signal to recommend more niche items. Eventually, the adversary learns the implicit association between popular and niche items while the recommender captures niche items that correlate with the user’s history, resulting in more long-tail item recommendations.

9. Reinforcement Learning: Finally, reinforcement learning can counter the loop amplification effect of biases by deploying a more intelligent strategy to balance exploitation and exploration adaptively. However, such methods require the policy to be deployed online to be evaluated.

We are just getting started

At UpTrain, we are actively listening to the pain points of data scientists and are building tools to ease their lives!

We would love to hear your thoughts in the comment section :D