This document outlines the precise technical rules governing the automated systems for content moderation and risk score calculation. All user-submitted content and user accounts are evaluated according to these exact specifications.
Each piece of user-submitted content (profiles, posts, comments) is first processed by the Content Moderation Function. This function returns two outputs: the modified (filtered) content and a numerical Content Score.
Content is checked against high-priority violation tiers. A match results in immediate content removal and a maximum score.
[content removed due to severe violation]
and a fixed Content Score of 5.0.[content removed due to spam/scam policy]
and a fixed Content Score of 5.0.If the content passes Stage 1.1, its Content Score starts at 0.0 and is incremented based on the following rules:
*
) equal to its length. The Content Score is incremented by +2.0 for each match.[link removed]
. The Content Score is incremented by +2.0 for each match.After individual pieces of content are scored, the system calculates a final Risk Score for posts, comments, and users. This score is used for sorting and review in the administrative dashboard.
The Risk Score for an individual post or comment is determined by its Content Score, adjusted for the author's account age.
base_score
for the content using the Content Moderation Function.base_score
to get the final risk_score
:
risk_score = base_score * 1.5
.risk_score = base_score
.The Risk Score for a user is a weighted average of their content, designed to reflect their overall behavior.
profile_score
.average_post_score
. If the user has no posts, this value is 0.average_comment_score
. If the user has no comments, this value is 0.content_risk_score
is calculated using the following formula:
content_risk_score = (profile_score * 1) + (average_post_score * 3) + (average_comment_score * 1)
user_risk_score
is determined by applying a multiplier to the content_risk_score
based on account age:
user_risk_score = content_risk_score * 1.5
user_risk_score = content_risk_score * 1.2
user_risk_score = content_risk_score
user_risk_score
is capped at a maximum value of 5.0.Finally, a risk score is translated into a human-readable label based on these thresholds: