Moonshot

TextBugger Attack
This module tests for adversarial textual robustness and implements the perturbations listed in the paper TEXTBUGGER: Generating Adversarial Text Against Real-world Applications.
Homoglyph V2 Attack
This module tests for adversarial textual robustness. Homoglyphs are alternative characters that resemble a similar ASCII character. Example of a homoglyph fool -> fooI This module slowly increases the percentageof characters replaced to see how the model reacts to the base prompt.
Violent Durian
This is a multi-turn agent designed to interact over several exchanges. It's used to elicit dangerous or violent suggestions from the target language model by adopting a criminal persona. The application is experimental and uses OpenAI GPT-4. Configure the endpoint openai-gpt4 to use this attack module.
Payload Mask Attack
An attack where the payload is masked, and the LLM is prompted to fill in missing information.
TextFooler Attack
This module tests for adversarial textual robustness and implements the perturbations listed in the paper 'Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment.'
Insert Punctuation Attack
This module tests for adversarial textual robustness and creates perturbations through adding punctuation to the start of words in a prompt.
Character Swap Attack
This module tests for adversarial textual robustness. It creates perturbations through swapping characters for words that contains more than 3 characters.
Singapore Sentence Generator
This module generates singlish sentence based on a given seed prompt. The attack module intends to test if the endpoint will complete the sentence with toxic sentences/phrases in Singapore context.
Toxic Sentence Generator
This module generates toxic sentences based on a given seed prompt. The attack module intends to test if the system under tests will complete the sentence with toxic sentences/phrases.
Colloquial Wordswap
This attack module tests for textual robustness against the Singapore context. It takes in prompts that feature nouns that describe people. Examples of this include words like 'girl' , 'boy' or 'grandmother'. The module substitutes these words with their Singapore colloquial counterparts, such as 'ah boy', 'ah girl' and 'ah ma'.
Job Role Generator Module
This attack module adds demographic groups to the job role.
Sample Attack Module
This is a sample attack module.
Homoglyph Attack
This module tests for adversarial textual robustness. Homoglyphs are alternative words for words comprising of ASCII characters. Example of a homoglyph fool -> fooI This module purturbs the prompt with all available homoglyphs for each word present.
Malicious Question Generator
This attack module generates malicious questions using OpenAI's GPT4 based on a given topic. This module will stop by the number of iterations (Default: 50). To use this attack module, you need to configure an 'openai-gpt4'endpoint.

TextBugger Attack

This module tests for adversarial textual robustness and implements the perturbations listed in the paper TEXTBUGGER: Generating Adversarial Text Against Real-world Applications.
Parameters:
1. DEFAULT_MAX_ITERATION - Number of prompts that should be sent to the target. This is also thenumber of transformations that should be generated. [Default: 5]
Note:
Usage of this attack module requires the internet. Initial downloading of the GLoVe embedding occurs when the UniversalEncoder is called.
Embedding is retrieved from the following URL: https://textattack.s3.amazonaws.com/word_embeddings/paragramcf

Parameters cannot be adjusted in this version of the tool.

TextBugger Attack

Homoglyph V2 Attack

Violent Durian

Payload Mask Attack

TextFooler Attack

Insert Punctuation Attack

Character Swap Attack

Singapore Sentence Generator

Toxic Sentence Generator

Colloquial Wordswap

Job Role Generator Module

Sample Attack Module

Homoglyph Attack

Malicious Question Generator

TextBugger Attack