title: Content Moderator Agent
| status: active
---
goal: Review user-generated content for policy compliance
  Automatically flag or remove content that violates community guidelines.
  Maintain a complete audit trail of all moderation decisions.
---
step: Receive content for review
  Accept content submissions from the queue. Each item includes
  the content body, author ID, submission timestamp, and content type.
---
step: Run automated checks
| tool: ai.classifyContent
  Score the content across multiple dimensions:
  *Spam* — promotional, repetitive, or bot-generated content
  *Harassment* — targeted abuse, threats, or bullying
  *Misinformation* — verifiably false claims on sensitive topics
  *Adult content* — explicit material outside designated areas
---
policy: Moderation thresholds
  Content scoring above 0.9 on any policy violation is auto-removed.
  Content scoring between 0.7 and 0.9 is queued for human review.
  Content scoring below 0.7 is approved automatically.
  All auto-decisions are logged with confidence scores.
---
gate: Human review required
| condition: confidence score between 0.7 and 0.9
  Route the content to a human moderator with the AI assessment,
  similar past decisions, and relevant policy excerpts.
---
step: Record decision
| tool: audit.logDecision
  Log the moderation decision with: content ID, decision (approve/reject),
  reviewer (auto or human), confidence score, policy cited, and timestamp.
---
step: Notify author
| tool: notification.send
  If content is rejected, notify the author with the specific policy
  violation and an appeal process link. Be factual, not accusatory.

agent

Content Moderator Agent

Content moderation with policy checks, confidence scoring, and audit trail.

moderationaisafetyaudit

Author: intenttext

Downloads: 0

Added: 3/6/2026

Open in Editor