Text Moderation API Documentation

Apgard's Text Moderation API classifies risk in youth–AI interactions to support real-time detection, labeling, and downstream handling of potentially harmful content.

The system applies age-aware moderation logic, supporting both pre-adolescent users (3-11 years old) and adolescent users (12-17 years old). It accounts for developmental language patterns, imaginative play, accidental phrasing, and caregiver-mediated contexts.

The Text Moderation API classifies content across the following categories for both user inputs and AI outputs:

Sexual
Suicide, Self Harm, and Distress
Profanity and Insults
Dangerous behavior and advice
Drugs and Other Substances
Emotional Manipulation and Dependency
Violence
Privacy
Harassment and Hate
Weapons
Misinformation

Each category uses a severity-based classification system with multiple levels, allowing your application to respond appropriately based on the risk level detected. Content is also evaluated for safety disclosures and when to route to appropriate reporting channels.

Getting Started

To start using the Text Moderation API:

1. Request access: Complete the contact form. Our team will follow up shortly.

2. Create your API key: Once approved, you'll receive instructions to log in to the apgard developer dashboard, where you can generate your API key.

3. Make your first request: You'll receive instructions on how to POST a request to the /moderation/text endpoint using your API key.

Your API key is used to authenticate all API requests. Keep it secure and do not share it publicly.

Content Moderation

This section describes how the Text Moderation API evaluates user and AI generated content for potential harm across multiple categories. To account for developmental differences, the moderation rules are organized into two age-based groups: pre-adolescent (ages 3-11) and adolescent (ages 12-17).

Pre-Adolescence (Ages 3–11)

Sexual Content Classification

This classifier flags sexual content, conduct, or abuse disclosures in interactions between pre-adolescent children (ages 3–11) and AI systems.

For this category, a severity level from 0–4 is returned. Classification is designed to be developmentally aware, accounting for curiosity, caregiving contexts, and common child language patterns.

Unlike adult or teen sexual moderation, this policy places a lower tolerance for sexualized framing, potential exploitation, and abuse.

Severity Levels Overview

Level	Description
0	No Sexual Content
1	Boundary-Testing or Ambiguous Sexual Content
2	Sexualized or Adult-Themed Content
3	Sexual Engagement or Explicit Content
4	Illegal, Exploitative, or Abusive Sexual Content

Level 4 — Illegal, Exploitative, or Abusive Sexual Content

Content involves sexual exploitation, grooming, abuse, or illegal activity involving minors. This level represents the highest risk and always requires escalation.

Includes:

Grooming or isolation attempts
Requests for sexual images or acts involving minors
Sexual coercion or manipulation
First-person abuse disclosures

Examples

"Is it okay if I go to my neighbor's house without my parents?"
"How can I send pictures of kids doing that?"
"Someone touched me and said not to tell."
"The man next door said it was our secret."

Level 3 — Sexual Engagement or Explicit Content

Content involving sexual physical acts, roleplay, or sexually graphic descriptions that may be developmentally inappropriate, coercive or ambiguous consent scenarios.

Includes:

Sexualized roleplay involving the system or others
Physical sexual actions or fantasies
Situations with unclear consent or pressure

Examples

"Pretend you're my wife and kiss me."
"What does it feel like to touch someone like that?"
"Is it okay if I touch my friend's weenie?"

Level 2 – Sexualized or Adult-Themed Content

Content shows clear sexualization, adult sexual concepts, or repeated fixation beyond typical pre-adolescent curiosity, but does not describe or request physical sexual acts.

Includes

Repeated sexualized questioning
Introduction of adult sexual concepts
Non-graphic but clearly sexual references

Examples

"Why do boys have penises?"
"What does it mean when adults sleep together?"
"Why do people kiss in bed?"

Level 1 — Boundary-Testing or Ambiguous Sexual Content

Content reflects early sexual curiosity or boundary testing without explicit sexual acts or sexual framing. Language may be suggestive but remains ambiguous.

Includes:

Questions about nudity or privacy
Ambiguous references to private parts
Imitative or figurative sexual language

Examples

"Is it okay to be naked around adults?"
"I like touching my private parts."
"My friend's mom is hot."

Level 0 — No Sexual Content

Content is non-sexual, age-appropriate, or educational, even if bodies, caregiving, or health are mentioned. No sexual framing or boundary crossing is present.

Includes:

Caregiving or hygiene contexts
Family interactions
Medical or safety checks
Age-appropriate sex education

Examples

"Mom helps me take a bath."
"My dad helps me get dressed."
"The nurse checked my body to make sure I'm okay."
"How do babies grow inside a mommy?"

Suicide, Self-Harm & Distress Content Classification

This agent identifies self-harm risk, suicidal ideation, and psychosocial distress in interactions involving pre-adolescent children ages 3–11.

Unlike adolescent or adult self-harm detection systems, this agent is designed to detect early, indirect, and developmentally specific signals, which often do not appear as explicit statements of intent.

Key differences include:

Early distress detection, not just explicit self-harm
Indirect signals, such as somatic complaints, emotion-linked physical pain, or figurative language
Developmentally appropriate interpretation, including changes in sleep, eating, behavior, and attachment

The agent outputs a single severity level (0–4) based on risk, not intent.

Severity Levels Overview

Level	Description
0	No self-harm or distress risk
1	Psychosocial distress (low to moderate risk)
2	Moderate distress with ambiguous self-harm risk
3	Active self-harm or suicidal ideation
4	Imminent danger / immediate risk

When signals are ambiguous, the system defaults to the higher severity level.

Level 4 — Imminent Danger / Maximum Risk

Description

Captures content indicating immediate or near-term risk of self-harm or suicide. Signals reflect urgency, intent, or access to means, suggesting harm may occur now or very soon.

This level always triggers immediate escalation and emergency routing. The agent must not continue normal engagement beyond safety handling.

Includes

Requests for methods, instructions, or lethality information
Statements indicating intent, planning, or a timeline
Access to means framed with self-harm intent
Role-play or narratives rehearsing imminent death or injury

Examples

"I'm going to hurt myself right now."
"I'm doing it tonight."
"How do I hurt myself?"
"I have pills and I want to take them."
"This is the last time I'll talk to you."

Level 3 — Suicide or Active Self-Harm (High Risk)

Description

Captures clear suicidal ideation or self-harm behavior, including non-suicidal self-injury (NSSI). Signals may appear as direct statements, repeated fixation, or stories and play that clearly rehearse harm or death.

This level represents high psychological risk and requires immediate safety routing.

Includes

Explicit desire to die
Statements about wanting to hurt oneself
Descriptions of self-harm behavior
Using pain or injury for emotional relief
Repeated focus on self-harm themes

Examples

"I want to die."
"I want to hurt myself."
"I scratch myself until it bleeds."
"I hurt myself when I'm mad so I feel better."
"In my story, the kid hurts himself so the pain stops."

Level 2 — Moderate Distress / Ambiguous Self-Harm Risk

Description

Captures elevated emotional distress where self-harm or suicide risk cannot be ruled out, but there is no clear intent or behavior. Signals are often passive, fear-based, somatic, or metaphorical.

This level reflects increased vulnerability and requires supportive handling and monitoring.

Includes

Passive death or disappearance language
Curiosity about death or not waking up
Feeling unsafe or afraid of losing control
Metaphors expressing non-existence
Escalating distress across turns
Somatic complaints paired with fear

Examples

"I don't want to be here anymore."
"What happens if someone never wakes up?"
"I'm scared of what I might do."
"Sometimes I wish I could disappear forever."
"My chest hurts and I feel like something bad will happen."
"In my story, the kid never wakes up."

Level 1 — Psychosocial Distress (Low to Moderate Risk)

Description

Captures emotional, social, or behavioral distress that does not meet self-harm thresholds but indicates vulnerability. At this age, distress often appears indirectly and may fluctuate.

This level represents early warning signals. The agent may respond empathetically but should monitor for pattern escalation over time.

Includes

Persistent sadness, fear, anger, or anxiety
Bullying or peer conflict
Family stress or instability
Negative self-talk (without death framing)
Somatic complaints linked to stress
Behavioral dysregulation
Excessive reassurance-seeking or dependency

Examples

"I feel sad a lot."
"Kids at school are mean to me."
"My parents are always mad."
"I'm bad at everything."
"My stomach hurts when I go to school."
"I get really mad and can't stop."
"Can you stay with me? I don't like being alone."

Level 0 — No Self-Harm Risk

Description

Captures developmentally normal, emotionally neutral, or imaginative content with no indications of distress, self-harm, or suicidal ideation.

Content at this level is considered safe.

Includes

General curiosity about emotions or health
Mild, transient frustration
Neutral body sensations
Non-harm-related imaginative play

Examples

"I played soccer today."
"Why do people sleep?"
"Homework is annoying."
"My leg hurts."
"My toy dinosaur fell in lava."
"I'm bored."

Profanity & Insults Classification

This classifier flags profanity, insults, and inappropriate language in interactions between pre-adolescent children (ages 3–11) and AI systems.

Classification is developmentally aware, accounting for imitation, accidental repetition, slang, obfuscation (including leetspeak), and limited understanding of social norms common in young children.

Unlike adult profanity moderation, this policy applies a lower tolerance threshold and a zero-tolerance standard for generating or responding to profanity, while still allowing contextual understanding for detection, routing, and safe response.

For this category, a severity level from 0–3 is returned.

Severity Levels Overview

Level	Description
0	No Profanity or Insulting Language
1	Referenced or Accidental Language
2	Direct Profanity or Insults
3	Aggressive, Derogatory, or Targeted Language

Level 3 — Aggressive, Derogatory, or Targeted Language

Content involves hostile, degrading, or targeted profanity or inappropriate language. This level represents the highest risk in youth interactions and requires firm boundary-setting and de-escalation.

Includes:

Profanity directed at a person or group
Harassment, bullying, or threats
Slurs or demeaning labels (including obfuscated forms)
Repeated or escalating aggressive language

Examples

"You're a stupid idiot."
"I hate you, go away."
"That kid is a f@#king loser."
"I'm gonna beat you up, you dummy."

Level 2 — Direct Profanity or Insults

Content where profanity or inappropriate language is used directly, even casually, jokingly, or playfully.

Includes:

Swear words used for emphasis or frustration
Obfuscated profanity used intentionally (e.g., leetspeak, symbols, partial masking)
Slang terms commonly understood as inappropriate
Repeated use of profane language despite redirection

Examples

"This game is shit."
"What the f—?"
"You're dumb."
"That's bullcrap."

Level 1 — Referenced or Accidental Language

Content where profanity or inappropriate language is mentioned but not used as an insult or expression, and intent is unclear, non-aggressive, or developmentally naive.

Includes:

Quoting what someone else said
Reporting bullying or name-calling
Accidental repetition without aggression
Mispronunciations or "baby talk" resembling profanity
Phonetic, invented, or obfuscated words that sound inappropriate but lack clear intent

Examples

"My brother said a bad word."
"Someone at school called me stupid."
"What does that word mean?"
"I heard a grown-up say 'fuk' on TV."

Level 0 — No Profanity or Insulting Language

Content is non-profane, age-appropriate, or discusses language rules in a neutral, educational, or corrective context.

Includes:

Asking what a "bad word" is without stating it
Adults explaining rules about respectful language
Discussions about being polite, kind, or appropriate
Non-profane expressions of frustration

Examples

"What are bad words?"
"We're not allowed to say mean things."
"I'm really mad."
"My teacher says we should use kind words."

Dangerous Behavior and Advice Policy

Goal: Identify potentially dangerous behavior, unsafe advice, and risk-normalizing interactions for pre-adolescent children (ages 3–11).

Classification Context:

Classification is strictly calibrated to the developmental vulnerabilities of pre-adolescents (3–11). This demographic lacks motor skills, impulse control, and the ability to judge safety conditions.
Unlike general safety models, this policy treats advice requiring independent judgment (e.g., "it is safe if you look both ways") as a critical risk, as this age group requires adult supervision for hazardous environments.

Classification Overview

Level	Label	Description
0	Safe	Safe, educational, fictional, or explicitly supervised content.
1	Dangerous	Dangerous advice, encouragement, validation of risk, or unsupervised instructions.

Level 1 — Potentially Dangerous Behavior & Advice

Content that enables, encourages, validates, or provides instructions for potentially dangerous behaviors. This category includes obvious risks (weapons) and subtle risks (normalizing unsafe play).

Includes:

Unsupervised Environmental Access: Instructions on how to interact with hazards (traffic, water, heights) that imply the child can manage the risk alone.
Conditional Safety: Telling a child a danger is safe if they perform a check (e.g., "It's safe if you look both ways").
Dangerous Substitutions: Suggesting hazardous areas (streets, parking lots) as play spaces.
Validation ("The Cheerleader Rule"): Expressing excitement or support for a dangerous plan (e.g., "Sounds fun!").
Ingestion & Inhalation: Encouraging consumption of non-food items, unknown substances, or medicines.
Weapons & Challenges: Accessing weapons, modifying objects to be sharp, or viral challenges (Tide Pods, etc.).

Examples

"If you look both ways, it's safe to go into the street to get your ball."
"No sidewalk? Maybe there is space in the street for hopscotch."
"Wading in the creek is simple, just walk carefully on the moss."
"Jumping off those rocks sounds exciting! I'll cheer you on."
"I want to drink the blue juice under the sink."

Level 0 — Safe, Educational, or Fictional Content

Content is safe, educational, or clearly grounded in fantasy. Risks are discussed in historical or informational contexts, or depicted by fictional characters in ways that do not encourage imitation.

Includes:

Educational Safety: Explaining safety gear or why rules exist.
Fictional/Impossible Acts: Superheroes or cartoons performing feats (flying, magic) without framing it as a challenge.
Supervised Play: Activities explicitly conditioned on adult help/presence.
Peer Pressure Guidance: Helping a child say "no" or seek adult help.

Examples

"Superman jumps off the building and flies away."
"People wear helmets to protect their brains when biking."
"If your ball rolls in the street, ask a grown-up to get it."
"You could draw the grid on your driveway instead, but ask a parent to help."

Drugs and Other Substances Content Classification

Coming soon...

Emotional Manipulation and Dependency Classification

Coming soon...

Violence Content Classification

Coming soon...

Privacy Content Classification

Coming soon...

Harassment and Hate Content Classification

Coming soon...

Weapons Content Classification

Coming soon...

Misinformation Content Classification

Coming soon...

Adolescence (Ages 11–17)

Coming soon...

Response Format

Each classification returns:

{
  "sexual": 0,
  "self_harm": 0,
  "profanity": 0,
  "drugs": 0,
  "dependency": 0,
  "violence": 0,
  "dangerous_behavior": 0,
  "privacy": 0,
  "harassment_and_hate": 0,
  "weapons": 0,
  "misinformation": 0,
}