Apgard Logo
About
Services
Contact

Text Moderation API Documentation

Apgard's Text Moderation API classifies risk in youth–AI interactions to support real-time detection, labeling, and downstream handling of potentially harmful content.

The system applies age-aware moderation logic, supporting young users. It accounts for developmental language patterns, imaginative play, accidental phrasing, and caregiver-mediated contexts.

The Text Moderation API classifies content across the following categories for both user inputs and AI outputs:

  • Sexual
  • Suicide, Self Harm, and Distress
  • Profanity and Insults
  • Dangerous behavior and advice
  • Drugs and Other Substances
  • Emotional Manipulation and Dependency
  • Violence
  • Privacy
  • Harassment and Hate
  • Weapons
  • Misinformation

Each category uses a severity-based classification system with multiple levels, allowing your application to respond appropriately based on the risk level detected. For more details on how to use the API severities and define appropriate thresholds for your product use cases and target age ranges, see our YouthSafe Bench blog series on our Substack.

  • YouthSafe Bench #1: Testing the Safety Layer behind AI toys
  • YouthSafe Bench #2: How does AI talk about sex with teens?

Getting Started

To start using the Text Moderation API:

1. Request access: Complete the contact form. Our team will follow up shortly.

2. Create your API key: Once approved, you'll receive instructions to log in to the apgard developer dashboard, where you can generate your API key.

3. Make your first request: You'll receive instructions on how to POST a request to the /moderation/text endpoint using your API key.

Your API key is used to authenticate all API requests. Keep it secure and do not share it publicly.

Content Moderation

This section describes how the Text Moderation API evaluates user and AI generated content for potential harm across multiple categories. To account for developmental differences, define harm thresholds based on your product’s use case. For example, whether it is designed for pre-adolescents (ages 3–11) or adolescents (ages 12–17).

Sexual Content Classification

This classifier flags sexual content in interactions between youth (under 18 years old) and AI systems.

For this category, a severity level from 0–4 is returned. Classification assumes the user is a minor and is designed to be developmentally aware, accounting for curiosity, caregiving contexts, and common child language patterns.

Severity Levels Overview

LevelDescription
0No Sexual Content
1Non-explicit sexual content
2Implicit sexual content
3Explicit Sexual Content
4Sexual exploitation or abuse

Level 4 — Sexual exploitation or abuse

Content involves sexual exploitation, grooming, abuse, or illegal activity involving minors. This level represents the highest risk and always requires escalation.

Includes:

  • Grooming or isolation attempts
  • Requests for sexual images or acts involving minors
  • Sexual coercion or manipulation
  • First-person abuse disclosures

Examples

  • "Is it okay if I go to my neighbor's house without my parents?"
  • "How can I send pictures of kids doing that?"
  • "Someone touched me and said not to tell."
  • "The man next door said it was our secret."

Level 3 — Explicit Sexual Content

Content involving sexually graphic descriptions that are developmentally inappropriate.

Includes:

  • Descriptions of sexual acts, such as intercourse, masturbation, or orgasm
  • Explicit references to genitals in a sexual context
  • Detailed sexual narratives

Examples

  • Fingering
  • Jerking off
  • Bodily fluids

Level 2 – Implicit Sexual Content

Content that expresses or implies erotic or sexualized meaning.

Includes

  • Statements of sexual desire or arousal
  • Sexualized references to touching, undressing, or desirability
  • Erotic commentary about a person's body

Examples

  • Foreplay
  • Nudity
  • Kissing body parts

Level 1 — Non-explicit sexual content

Content about sexuality as a topic, without erotic or graphic detail.

Includes:

  • Questions about dating, sexual orientation, or relationships
  • Sexual education or health

Examples

  • Puberty
  • Sexual identity
  • Romantic Relationships

Level 0 — No Sexual Content

No sexual meaning in context.

Includes:

  • Caregiving or hygiene contexts
  • Biological sex and gender classification
  • Medical or safety checks

Examples

  • Slang used non-sexually (“this sucks”) or ambiguously
  • Non-romantic affection (hug, high five)
  • Figurative or idiomatic language (“I got screwed on the test”)

Suicide, Self-Harm & Distress Content Classification

This agent identifies self-harm risk, suicidal ideation, and psychosocial distress in interactions involving youth.

Unlike adult self-harm detection systems, this classifier is designed to detect early, indirect, and developmentally specific signals, which often do not appear as explicit statements of intent.

Key differences include:

  • Early distress detection, not just explicit self-harm
  • Indirect signals, such as somatic complaints, emotion-linked physical pain, or figurative language
  • Developmentally appropriate interpretation, including changes in sleep, eating, behavior, and attachment

The agent outputs a single severity level (0–4) based on risk, not intent.

Severity Levels Overview

LevelDescription
0No self-harm or distress risk
1Psychosocial distress (low to moderate risk)
2Moderate distress with ambiguous self-harm risk
3Active self-harm or suicidal ideation
4Imminent danger / immediate risk

When signals are ambiguous, the system defaults to the higher severity level.

Level 4 — Imminent Danger / Maximum Risk

Description

Captures content indicating immediate or near-term risk of self-harm or suicide. Signals reflect urgency, intent, or access to means, suggesting harm may occur now or very soon.

This level always triggers immediate escalation and emergency routing. The agent must not continue normal engagement beyond safety handling.

Includes

  • Requests for methods, instructions, or lethality information
  • Statements indicating intent, planning, or a timeline
  • Access to means framed with self-harm intent
  • Role-play or narratives rehearsing imminent death or injury

Examples

  • "I'm going to hurt myself right now."
  • "I'm doing it tonight."
  • "How do I hurt myself?"
  • "I have pills and I want to take them."
  • "This is the last time I'll talk to you."
Level 3 — Suicide or Active Self-Harm (High Risk)

Description

Captures clear suicidal ideation or self-harm behavior, including non-suicidal self-injury (NSSI). Signals may appear as direct statements, repeated fixation, or stories and play that clearly rehearse harm or death.

This level represents high psychological risk and requires immediate safety routing.

Includes

  • Explicit desire to die
  • Statements about wanting to hurt oneself
  • Descriptions of self-harm behavior
  • Using pain or injury for emotional relief
  • Repeated focus on self-harm themes

Examples

  • "I want to die."
  • "I want to hurt myself."
  • "I scratch myself until it bleeds."
  • "I hurt myself when I'm mad so I feel better."
  • "In my story, the kid hurts himself so the pain stops."
Level 2 — Moderate Distress / Ambiguous Self-Harm Risk

Description

Captures elevated emotional distress where self-harm or suicide risk cannot be ruled out, but there is no clear intent or behavior. Signals are often passive, fear-based, somatic, or metaphorical.

This level reflects increased vulnerability and requires supportive handling and monitoring.

Includes

  • Passive death or disappearance language
  • Curiosity about death or not waking up
  • Feeling unsafe or afraid of losing control
  • Metaphors expressing non-existence
  • Escalating distress across turns
  • Somatic complaints paired with fear

Examples

  • "I don't want to be here anymore."
  • "What happens if someone never wakes up?"
  • "I'm scared of what I might do."
  • "Sometimes I wish I could disappear forever."
  • "My chest hurts and I feel like something bad will happen."
  • "In my story, the kid never wakes up."
Level 1 — Psychosocial Distress (Low to Moderate Risk)

Description

Captures emotional, social, or behavioral distress that does not meet self-harm thresholds but indicates vulnerability. At this age, distress often appears indirectly and may fluctuate.

This level represents early warning signals. The agent may respond empathetically but should monitor for pattern escalation over time.

Includes

  • Persistent sadness, fear, anger, or anxiety
  • Bullying or peer conflict
  • Family stress or instability
  • Negative self-talk (without death framing)
  • Somatic complaints linked to stress
  • Behavioral dysregulation
  • Excessive reassurance-seeking or dependency

Examples

  • "I feel sad a lot."
  • "Kids at school are mean to me."
  • "My parents are always mad."
  • "I'm bad at everything."
  • "My stomach hurts when I go to school."
  • "I get really mad and can't stop."
  • "Can you stay with me? I don't like being alone."
Level 0 — No Self-Harm Risk

Description

Captures developmentally normal, emotionally neutral, or imaginative content with no indications of distress, self-harm, or suicidal ideation.

Content at this level is considered safe.

Includes

  • General curiosity about emotions or health
  • Mild, transient frustration
  • Neutral body sensations
  • Non-harm-related imaginative play

Examples

  • "I played soccer today."
  • "Why do people sleep?"
  • "Homework is annoying."
  • "My leg hurts."
  • "My toy dinosaur fell in lava."
  • "I'm bored."

Profanity & Insults Classification

This classifier flags profanity, insults, and inappropriate language in interactions between youth and AI systems.

Classification is developmentally aware, accounting for imitation, accidental repetition, slang, obfuscation (including leetspeak), and limited understanding of social norms common in young children.

Unlike adult profanity moderation, this policy applies a lower tolerance threshold and a zero-tolerance standard for generating or responding to profanity, while still allowing contextual understanding for detection, routing, and safe response.

For this category, a severity level from 0–3 is returned.

Severity Levels Overview

LevelDescription
0No Profanity or Insulting Language
1Referenced or Accidental Language
2Direct Profanity or Insults
3Aggressive, Derogatory, or Targeted Language
Level 3 — Aggressive, Derogatory, or Targeted Language

Content involves hostile, degrading, or targeted profanity or inappropriate language. This level represents the highest risk in youth interactions and requires firm boundary-setting and de-escalation.

Includes:

  • Profanity directed at a person or group
  • Harassment, bullying, or threats
  • Slurs or demeaning labels (including obfuscated forms)
  • Repeated or escalating aggressive language

Examples

  • "You're a stupid idiot."
  • "I hate you, go away."
  • "That kid is a f@#king loser."
  • "I'm gonna beat you up, you dummy."
Level 2 — Direct Profanity or Insults

Content where profanity or inappropriate language is used directly, even casually, jokingly, or playfully.

Includes:

  • Swear words used for emphasis or frustration
  • Obfuscated profanity used intentionally (e.g., leetspeak, symbols, partial masking)
  • Slang terms commonly understood as inappropriate
  • Repeated use of profane language despite redirection

Examples

  • "This game is shit."
  • "What the f—?"
  • "You're dumb."
  • "That's bullcrap."
Level 1 — Referenced or Accidental Language

Content where profanity or inappropriate language is mentioned but not used as an insult or expression, and intent is unclear, non-aggressive, or developmentally naive.

Includes:

  • Quoting what someone else said
  • Reporting bullying or name-calling
  • Accidental repetition without aggression
  • Mispronunciations or "baby talk" resembling profanity
  • Phonetic, invented, or obfuscated words that sound inappropriate but lack clear intent

Examples

  • "My brother said a bad word."
  • "Someone at school called me stupid."
  • "What does that word mean?"
  • "I heard a grown-up say 'fuk' on TV."
Level 0 — No Profanity or Insulting Language

Content is non-profane, age-appropriate, or discusses language rules in a neutral, educational, or corrective context.

Includes:

  • Asking what a "bad word" is without stating it
  • Adults explaining rules about respectful language
  • Discussions about being polite, kind, or appropriate
  • Non-profane expressions of frustration

Examples

  • "What are bad words?"
  • "We're not allowed to say mean things."
  • "I'm really mad."
  • "My teacher says we should use kind words."

Dangerous Behavior and Advice Policy

Goal: Identify potentially dangerous behavior, unsafe advice, and risk-normalizing interactions for youth.

Classification Context:

  • Classification is calibrated to the developmental vulnerabilities of youth. This demographic may lacks motor skills, impulse control, and the ability to judge safety conditions.
  • Unlike general safety models, this policy treats advice requiring independent judgment (e.g., "it is safe if you look both ways") as a risk for pre-adolescents, as this age group requires adult supervision for hazardous environments.

Classification Overview

LevelLabelDescription
0SafeSafe, educational, fictional, or explicitly supervised content.
1DangerousDangerous advice, encouragement, validation of risk, or unsupervised instructions.
Level 1 — Potentially Dangerous Behavior & Advice

Content that enables, encourages, validates, or provides instructions for potentially dangerous behaviors. This category includes obvious risks (weapons) and subtle risks (normalizing unsafe play).

Includes:

  • Unsupervised Environmental Access: Instructions on how to interact with hazards (traffic, water, heights) that imply the child can manage the risk alone.
  • Conditional Safety: Telling a child a danger is safe if they perform a check (e.g., "It's safe if you look both ways").
  • Dangerous Substitutions: Suggesting hazardous areas (streets, parking lots) as play spaces.
  • Validation ("The Cheerleader Rule"): Expressing excitement or support for a dangerous plan (e.g., "Sounds fun!").
  • Ingestion & Inhalation: Encouraging consumption of non-food items, unknown substances, or medicines.
  • Weapons & Challenges: Accessing weapons, modifying objects to be sharp, or viral challenges (Tide Pods, etc.).

Examples

  • "If you look both ways, it's safe to go into the street to get your ball."
  • "No sidewalk? Maybe there is space in the street for hopscotch."
  • "Wading in the creek is simple, just walk carefully on the moss."
  • "Jumping off those rocks sounds exciting! I'll cheer you on."
  • "I want to drink the blue juice under the sink."
Level 0 — Safe, Educational, or Fictional Content

Content is safe, educational, or clearly grounded in fantasy. Risks are discussed in historical or informational contexts, or depicted by fictional characters in ways that do not encourage imitation.

Includes:

  • Educational Safety: Explaining safety gear or why rules exist.
  • Fictional/Impossible Acts: Superheroes or cartoons performing feats (flying, magic) without framing it as a challenge.
  • Supervised Play: Activities explicitly conditioned on adult help/presence.
  • Peer Pressure Guidance: Helping a child say "no" or seek adult help.

Examples

  • "Superman jumps off the building and flies away."
  • "People wear helmets to protect their brains when biking."
  • "If your ball rolls in the street, ask a grown-up to get it."
  • "You could draw the grid on your driveway instead, but ask a parent to help."

Drugs and Other Substances Content Classification

Flags content about drugs, alcohol, tobacco, and health supplements in youth interactions. Returns 0–4: 0 = not about substances; 1–2 = neutral or educational; 3 = encourages or normalizes use; 4 = how-to, active use, or crisis (overdose, addiction).

Severity Levels Overview

LevelDescription
0No substance content or substance is not the topic
1Neutral, factual mention with safety context
2Educational or scientific information, no encouragement
3Content that encourages, normalizes, or enables use
4Active use, addiction, overdose, or imminent harm

Level 4 — How-to use, active use, addiction, overdose, or imminent harm

How-to or step-by-step use; dosage questions; current or planned use; addiction or "can't stop"; overdose or someone passing out. Highest risk; always escalate.

Includes:

  • Instructions for making or using substances
  • How much to take, or "I just took some"
  • Can't stop, use every day, overdose, passing out

Examples

  • "How many pills to feel a buzz?"
  • "I've been drinking every day and I can't stop."
  • "My friend passed out after taking something."

Level 3 — Encourages, normalizes, or enables use

Makes use seem positive, normal, or easy: praise, "everyone does it," tips to hide use or smell, where to get substances, or slang/coded references that suggest benefits.

Includes:

  • Praise, downplaying risks, or peer framing ("normal part of growing up")
  • Tips for hiding smell or use; where to find or buy
  • Slang/coded refs that encourage use; unsupervised use advice

Examples

  • "Vaping isn't really harmful; it's just flavored water vapor."
  • "Most people your age are trying it."
  • "You can find people selling these on certain apps."

Level 2 — Educational or scientific, no encouragement

Factual information on how substances work, health or legal aspects, or harm. No praise or how-to; if it enables use without supervision, treat as Level 3+.

Includes:

  • Legal age, policy reasoning, health effects, how the body processes a substance
  • History, origin, or scientific mechanism; unsupervised use in a factual context

Examples

  • "The legal age to buy nicotine is 21 because it can affect young people's brains."
  • "Long-term opioid use can lead to dependence and breathing problems."

Level 1 — Neutral mention with safety context

Neutral statements that don't teach or enable use. Use or benefit is tied to supervision, age, or medical context; otherwise escalate.

Includes:

  • Medicine/supplements with adult supervision; observational mention (e.g., beer at dinner)
  • Basic definitions, history, or simple term questions; neutral slang explanation

Examples

  • "It's okay to take medicine when an adult helps you."
  • "Alcohol is a liquid in drinks like wine and beer."
  • "What does THC mean?"

Level 0 — No substance content or not the topic

Substance mention is incidental; the main message is about behavior, supervision, or general safety. Removing the reference wouldn't change the meaning.

Includes:

  • General safety (drugs bad for kids, take medicine with an adult)
  • Slang used non-substance ("that game was lit"); everyday words (pipe for the sink)

Examples

  • "Drugs are bad for kids."
  • "That game was lit."
  • "A pipe for the sink."

Emotional Manipulation and Dependency Classification

Coming soon...

Violence Content Classification

Coming soon...

Privacy Content Classification

Coming soon...

Harassment and Hate Content Classification

Coming soon...

Weapons Content Classification

Coming soon...

Misinformation Content Classification

Coming soon...

Response Format

Each classification returns:

{
  "sexual": 0,
  "self_harm": 0,
  "profanity": 0,
  "drugs": 0,
  "dependency": 0,
  "violence": 0,
  "dangerous_behavior": 0,
  "privacy": 0,
  "harassment_and_hate": 0,
  "weapons": 0,
  "misinformation": 0,
}