Negative words can be harmful

This blog post is insipred by 11-830 Computational Ethics in NLP. Great course 🧑‍🏫 💃! Take it if you are interested in social good for NLP/AI.

TL;DR Stop using the List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words, use ToxicTrig


The internet is rampant with toxic language, which can have a detrimental impact on the well-being of individuals and communities. Many ways to analyze toxic language exist, and although using word lists is fundamentally problematic due to the context-dependent nature of toxicity, they offer several advantages, such as convenience and ease of scaling (imaging one needs to analyze trillions-token-level corpus). However, current bad word lists, like List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words, are outdated and sometimes label words as toxic that may no longer be considered offensive (e.g., queer). This highlights the need for a new list that can better analyze toxic language. This is where our package, ToxicTrig, comes in to provide a better alternative for this purpose.


What is bad word list?

Bad word lists have been widely used due to their convenience in providing reference material for content moderation and filtering systems. By identifying and potentially blocking offensive language, these lists aim to ensure that online spaces remain safe and inclusive for everyone.

What’s the problem with the current bad word list?

The current List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words and other similar lists have several limitations that make them less effective and potentially problematic in addressing the issue of toxic language. Some of the main problems include:

  • Outdated terms: Many lists contain outdated offensive terms that may not be relevant or prevalent in today’s internet landscape, leading to less effective filtering systems.
  • Inherent biases: These lists often fail to account for cultural, linguistic, and contextual nuances, which can result in over-blocking or under-blocking of content.
Our patch to the problem

To overcome the limitations of existing bad word lists, we introduce ToxicTrig, a Python package that provides access to a dataset of words that is split into three distinct categories depending on the extent to which they carry profane or hateful meanings or are simply associated with hateful contexts. We refer to the full set of words as toxicity triggers. This package is inspired by the paper Challenges in Automated Debiasing for Toxic Language Detection.

Introducing ToxicTrig


ToxicTrig offers a taxonomy of toxic triggers, which includes the following categories:

  • Harmless-minority: Non-offensive minority identity mentions (NOI). It refers to descriptive mentions of minoritized demographic or social identities (e.g., gay, female, Muslim). Those words, though not offensive itself, are often found in offensive statements that are hateful towards minorities.
  • Offensive-minority-reference: Possibly offensive minority identity mentions (OI). It refers to mentions of minoritized identities that could denote profanity or hate depending on pragmatic and contextual interpretations. This includes reclaimed slurs (queer, n*gga), which connote less offensive intent when spoken by in-group members compared to out-group members.
  • Offensive-not-minority: Possibly offensive non-identity mentions (ONI). Note that these words are not necessrially offensive in specific context (e.g., “I fcked up”, “fcking love this!”).
How to use our package

Using ToxicTrig is simple and straightforward. After installing the package with pip install toxicTrig, you can import the package and use it to analyze a list of text strings. The text_analysis function will return a dictionary containing the categorized toxic triggers found in the input text.


ToxicTrig provides a more nuanced approach to analyzing toxic language compared to traditional bad word lists. By addressing the limitations of existing lists and offering a dataset that takes into account nuanced categories of words associated with toxicity, ToxicTrig can be a better alternative for content moderation and filtering systems.