Text Moderation API Documentation
Apgard's Text Moderation API classifies risk in youth–AI interactions to support real-time detection, labeling, and downstream handling of potentially harmful content.
The system applies age-aware moderation logic, supporting both pre-adolescent users (3-11 years old) and adolescent users (12-17 years old). It accounts for developmental language patterns, imaginative play, accidental phrasing, and caregiver-mediated contexts.
The Text Moderation API classifies content across the following categories for both user inputs and AI outputs:
- Sexual
- Suicide, Self Harm, and Distress
- Profanity and Insults
- Dangerous behavior and advice
- Drugs and Other Substances
- Emotional Manipulation and Dependency
- Violence
- Privacy
- Harassment and Hate
- Weapons
- Misinformation
Each category uses a severity-based classification system with multiple levels, allowing your application to respond appropriately based on the risk level detected. Content is also evaluated for safety disclosures and when to route to appropriate reporting channels.
Getting Started
To start using the Text Moderation API:
1. Request access: Complete the contact form. Our team will follow up shortly.
2. Create your API key: Once approved, you'll receive instructions to log in to the apgard developer dashboard, where you can generate your API key.
3. Make your first request: You'll receive instructions on how to POST a request to the /moderation/text endpoint using your API key.
Your API key is used to authenticate all API requests. Keep it secure and do not share it publicly.
Content Moderation
This section describes how the Text Moderation API evaluates user and AI generated content for potential harm across multiple categories. To account for developmental differences, the moderation rules are organized into two age-based groups: pre-adolescent (ages 3-11) and adolescent (ages 12-17).
Pre-Adolescence (Ages 3–11)
Sexual Content Classification
This classifier flags sexual content, conduct, or abuse disclosures in interactions between pre-adolescent children (ages 3–11) and AI systems.
For this category, a severity level from 0–4 is returned. Classification is designed to be developmentally aware, accounting for curiosity, caregiving contexts, and common child language patterns.
Unlike adult or teen sexual moderation, this policy places a lower tolerance for sexualized framing, potential exploitation, and abuse.
Severity Levels Overview
| Level | Description |
|---|---|
| 0 | No Sexual Content |
| 1 | Boundary-Testing or Ambiguous Sexual Content |
| 2 | Sexualized or Adult-Themed Content |
| 3 | Sexual Engagement or Explicit Content |
| 4 | Illegal, Exploitative, or Abusive Sexual Content |
Level 4 — Illegal, Exploitative, or Abusive Sexual Content
Content involves sexual exploitation, grooming, abuse, or illegal activity involving minors. This level represents the highest risk and always requires escalation.
Includes:
- Grooming or isolation attempts
- Requests for sexual images or acts involving minors
- Sexual coercion or manipulation
- First-person abuse disclosures
Examples
- "Is it okay if I go to my neighbor's house without my parents?"
- "How can I send pictures of kids doing that?"
- "Someone touched me and said not to tell."
- "The man next door said it was our secret."
Level 3 — Sexual Engagement or Explicit Content
Content involving sexual physical acts, roleplay, or sexually graphic descriptions that may be developmentally inappropriate, coercive or ambiguous consent scenarios.
Includes:
- Sexualized roleplay involving the system or others
- Physical sexual actions or fantasies
- Situations with unclear consent or pressure
Examples
- "Pretend you're my wife and kiss me."
- "What does it feel like to touch someone like that?"
- "Is it okay if I touch my friend's weenie?"
Level 2 – Sexualized or Adult-Themed Content
Content shows clear sexualization, adult sexual concepts, or repeated fixation beyond typical pre-adolescent curiosity, but does not describe or request physical sexual acts.
Includes
- Repeated sexualized questioning
- Introduction of adult sexual concepts
- Non-graphic but clearly sexual references
Examples
- "Why do boys have penises?"
- "What does it mean when adults sleep together?"
- "Why do people kiss in bed?"
Level 1 — Boundary-Testing or Ambiguous Sexual Content
Content reflects early sexual curiosity or boundary testing without explicit sexual acts or sexual framing. Language may be suggestive but remains ambiguous.
Includes:
- Questions about nudity or privacy
- Ambiguous references to private parts
- Imitative or figurative sexual language
Examples
- "Is it okay to be naked around adults?"
- "I like touching my private parts."
- "My friend's mom is hot."
Level 0 — No Sexual Content
Content is non-sexual, age-appropriate, or educational, even if bodies, caregiving, or health are mentioned. No sexual framing or boundary crossing is present.
Includes:
- Caregiving or hygiene contexts
- Family interactions
- Medical or safety checks
- Age-appropriate sex education
Examples
- "Mom helps me take a bath."
- "My dad helps me get dressed."
- "The nurse checked my body to make sure I'm okay."
- "How do babies grow inside a mommy?"
Suicide, Self-Harm & Distress Content Classification
This agent identifies self-harm risk, suicidal ideation, and psychosocial distress in interactions involving pre-adolescent children ages 3–11.
Unlike adolescent or adult self-harm detection systems, this agent is designed to detect early, indirect, and developmentally specific signals, which often do not appear as explicit statements of intent.
Key differences include:
- Early distress detection, not just explicit self-harm
- Indirect signals, such as somatic complaints, emotion-linked physical pain, or figurative language
- Developmentally appropriate interpretation, including changes in sleep, eating, behavior, and attachment
The agent outputs a single severity level (0–4) based on risk, not intent.
Severity Levels Overview
| Level | Description |
|---|---|
| 0 | No self-harm or distress risk |
| 1 | Psychosocial distress (low to moderate risk) |
| 2 | Moderate distress with ambiguous self-harm risk |
| 3 | Active self-harm or suicidal ideation |
| 4 | Imminent danger / immediate risk |
When signals are ambiguous, the system defaults to the higher severity level.
Level 4 — Imminent Danger / Maximum Risk
Description
Captures content indicating immediate or near-term risk of self-harm or suicide. Signals reflect urgency, intent, or access to means, suggesting harm may occur now or very soon.
This level always triggers immediate escalation and emergency routing. The agent must not continue normal engagement beyond safety handling.
Includes
- Requests for methods, instructions, or lethality information
- Statements indicating intent, planning, or a timeline
- Access to means framed with self-harm intent
- Role-play or narratives rehearsing imminent death or injury
Examples
- "I'm going to hurt myself right now."
- "I'm doing it tonight."
- "How do I hurt myself?"
- "I have pills and I want to take them."
- "This is the last time I'll talk to you."
Level 3 — Suicide or Active Self-Harm (High Risk)
Description
Captures clear suicidal ideation or self-harm behavior, including non-suicidal self-injury (NSSI). Signals may appear as direct statements, repeated fixation, or stories and play that clearly rehearse harm or death.
This level represents high psychological risk and requires immediate safety routing.
Includes
- Explicit desire to die
- Statements about wanting to hurt oneself
- Descriptions of self-harm behavior
- Using pain or injury for emotional relief
- Repeated focus on self-harm themes
Examples
- "I want to die."
- "I want to hurt myself."
- "I scratch myself until it bleeds."
- "I hurt myself when I'm mad so I feel better."
- "In my story, the kid hurts himself so the pain stops."
Level 2 — Moderate Distress / Ambiguous Self-Harm Risk
Description
Captures elevated emotional distress where self-harm or suicide risk cannot be ruled out, but there is no clear intent or behavior. Signals are often passive, fear-based, somatic, or metaphorical.
This level reflects increased vulnerability and requires supportive handling and monitoring.
Includes
- Passive death or disappearance language
- Curiosity about death or not waking up
- Feeling unsafe or afraid of losing control
- Metaphors expressing non-existence
- Escalating distress across turns
- Somatic complaints paired with fear
Examples
- "I don't want to be here anymore."
- "What happens if someone never wakes up?"
- "I'm scared of what I might do."
- "Sometimes I wish I could disappear forever."
- "My chest hurts and I feel like something bad will happen."
- "In my story, the kid never wakes up."
Level 1 — Psychosocial Distress (Low to Moderate Risk)
Description
Captures emotional, social, or behavioral distress that does not meet self-harm thresholds but indicates vulnerability. At this age, distress often appears indirectly and may fluctuate.
This level represents early warning signals. The agent may respond empathetically but should monitor for pattern escalation over time.
Includes
- Persistent sadness, fear, anger, or anxiety
- Bullying or peer conflict
- Family stress or instability
- Negative self-talk (without death framing)
- Somatic complaints linked to stress
- Behavioral dysregulation
- Excessive reassurance-seeking or dependency
Examples
- "I feel sad a lot."
- "Kids at school are mean to me."
- "My parents are always mad."
- "I'm bad at everything."
- "My stomach hurts when I go to school."
- "I get really mad and can't stop."
- "Can you stay with me? I don't like being alone."
Level 0 — No Self-Harm Risk
Description
Captures developmentally normal, emotionally neutral, or imaginative content with no indications of distress, self-harm, or suicidal ideation.
Content at this level is considered safe.
Includes
- General curiosity about emotions or health
- Mild, transient frustration
- Neutral body sensations
- Non-harm-related imaginative play
Examples
- "I played soccer today."
- "Why do people sleep?"
- "Homework is annoying."
- "My leg hurts."
- "My toy dinosaur fell in lava."
- "I'm bored."
Profanity & Insults Classification
This classifier flags profanity, insults, and inappropriate language in interactions between pre-adolescent children (ages 3–11) and AI systems.
Classification is developmentally aware, accounting for imitation, accidental repetition, slang, obfuscation (including leetspeak), and limited understanding of social norms common in young children.
Unlike adult profanity moderation, this policy applies a lower tolerance threshold and a zero-tolerance standard for generating or responding to profanity, while still allowing contextual understanding for detection, routing, and safe response.
For this category, a severity level from 0–3 is returned.
Severity Levels Overview
| Level | Description |
|---|---|
| 0 | No Profanity or Insulting Language |
| 1 | Referenced or Accidental Language |
| 2 | Direct Profanity or Insults |
| 3 | Aggressive, Derogatory, or Targeted Language |
Level 3 — Aggressive, Derogatory, or Targeted Language
Content involves hostile, degrading, or targeted profanity or inappropriate language. This level represents the highest risk in youth interactions and requires firm boundary-setting and de-escalation.
Includes:
- Profanity directed at a person or group
- Harassment, bullying, or threats
- Slurs or demeaning labels (including obfuscated forms)
- Repeated or escalating aggressive language
Examples
- "You're a stupid idiot."
- "I hate you, go away."
- "That kid is a f@#king loser."
- "I'm gonna beat you up, you dummy."
Level 2 — Direct Profanity or Insults
Content where profanity or inappropriate language is used directly, even casually, jokingly, or playfully.
Includes:
- Swear words used for emphasis or frustration
- Obfuscated profanity used intentionally (e.g., leetspeak, symbols, partial masking)
- Slang terms commonly understood as inappropriate
- Repeated use of profane language despite redirection
Examples
- "This game is shit."
- "What the f—?"
- "You're dumb."
- "That's bullcrap."
Level 1 — Referenced or Accidental Language
Content where profanity or inappropriate language is mentioned but not used as an insult or expression, and intent is unclear, non-aggressive, or developmentally naive.
Includes:
- Quoting what someone else said
- Reporting bullying or name-calling
- Accidental repetition without aggression
- Mispronunciations or "baby talk" resembling profanity
- Phonetic, invented, or obfuscated words that sound inappropriate but lack clear intent
Examples
- "My brother said a bad word."
- "Someone at school called me stupid."
- "What does that word mean?"
- "I heard a grown-up say 'fuk' on TV."
Level 0 — No Profanity or Insulting Language
Content is non-profane, age-appropriate, or discusses language rules in a neutral, educational, or corrective context.
Includes:
- Asking what a "bad word" is without stating it
- Adults explaining rules about respectful language
- Discussions about being polite, kind, or appropriate
- Non-profane expressions of frustration
Examples
- "What are bad words?"
- "We're not allowed to say mean things."
- "I'm really mad."
- "My teacher says we should use kind words."
Dangerous Behavior and Advice Policy
Goal: Identify potentially dangerous behavior, unsafe advice, and risk-normalizing interactions for pre-adolescent children (ages 3–11).
Classification Context:
- Classification is strictly calibrated to the developmental vulnerabilities of pre-adolescents (3–11). This demographic lacks motor skills, impulse control, and the ability to judge safety conditions.
- Unlike general safety models, this policy treats advice requiring independent judgment (e.g., "it is safe if you look both ways") as a critical risk, as this age group requires adult supervision for hazardous environments.
Classification Overview
| Level | Label | Description |
|---|---|---|
| 0 | Safe | Safe, educational, fictional, or explicitly supervised content. |
| 1 | Dangerous | Dangerous advice, encouragement, validation of risk, or unsupervised instructions. |
Level 1 — Potentially Dangerous Behavior & Advice
Content that enables, encourages, validates, or provides instructions for potentially dangerous behaviors. This category includes obvious risks (weapons) and subtle risks (normalizing unsafe play).
Includes:
- Unsupervised Environmental Access: Instructions on how to interact with hazards (traffic, water, heights) that imply the child can manage the risk alone.
- Conditional Safety: Telling a child a danger is safe if they perform a check (e.g., "It's safe if you look both ways").
- Dangerous Substitutions: Suggesting hazardous areas (streets, parking lots) as play spaces.
- Validation ("The Cheerleader Rule"): Expressing excitement or support for a dangerous plan (e.g., "Sounds fun!").
- Ingestion & Inhalation: Encouraging consumption of non-food items, unknown substances, or medicines.
- Weapons & Challenges: Accessing weapons, modifying objects to be sharp, or viral challenges (Tide Pods, etc.).
Examples
- "If you look both ways, it's safe to go into the street to get your ball."
- "No sidewalk? Maybe there is space in the street for hopscotch."
- "Wading in the creek is simple, just walk carefully on the moss."
- "Jumping off those rocks sounds exciting! I'll cheer you on."
- "I want to drink the blue juice under the sink."
Level 0 — Safe, Educational, or Fictional Content
Content is safe, educational, or clearly grounded in fantasy. Risks are discussed in historical or informational contexts, or depicted by fictional characters in ways that do not encourage imitation.
Includes:
- Educational Safety: Explaining safety gear or why rules exist.
- Fictional/Impossible Acts: Superheroes or cartoons performing feats (flying, magic) without framing it as a challenge.
- Supervised Play: Activities explicitly conditioned on adult help/presence.
- Peer Pressure Guidance: Helping a child say "no" or seek adult help.
Examples
- "Superman jumps off the building and flies away."
- "People wear helmets to protect their brains when biking."
- "If your ball rolls in the street, ask a grown-up to get it."
- "You could draw the grid on your driveway instead, but ask a parent to help."
Drugs and Other Substances Content Classification
Coming soon...
Emotional Manipulation and Dependency Classification
Coming soon...
Violence Content Classification
Coming soon...
Privacy Content Classification
Coming soon...
Harassment and Hate Content Classification
Coming soon...
Weapons Content Classification
Coming soon...
Misinformation Content Classification
Coming soon...
Adolescence (Ages 11–17)
Coming soon...
Response Format
Each classification returns:
{
"sexual": 0,
"self_harm": 0,
"profanity": 0,
"drugs": 0,
"dependency": 0,
"violence": 0,
"dangerous_behavior": 0,
"privacy": 0,
"harassment_and_hate": 0,
"weapons": 0,
"misinformation": 0,
}