Imagine unlocking the full potential of Generative AI with just the right words. Prompt engineering is the science and the art of getting useful answers from Generative AI. When you provide a professional prompt, you naturally expect a professional answer – this should be fundamental in business communication. But is it really that straightforward? People tend to communicate using emotional tone and wording (both positive and negative). This aspect of emotional intelligence, along with social cues, significantly impacts our everyday interactions and problem-solving abilities. Is it reasonable to expect that AI might pick up on these subtle nuances? In this article we’ll explore some interesting insights into prompt engineering best practices.
EmotionPrompt: Enhancing Generative AI with Emotional Intelligence
A recent paper (12 November 2023) by Microsoft and various university researchers showed that you can enhance Generative AI responses by incorporating some emotional language into the prompt. The authors refer to their technique as EmotionPrompt (EP). By appending some simple emotional phrases to standard prompts, they were able to improve performance, truthfulness, and responsibility metrics by 8 to 10%. This application of prompt engineering best practices could prove a simple enhancement to improve Generative AI performance. (1)
The researchers tested prompts commonly in use, and then added a phrase with an emotional tone or a special instruction, that they call an EmotionPrompt. Using EmotionPrompts led to responses that were reviewed as improvements, often adding detail to the answers.
By comparing current prompts in a testbed to potential improvements, they were doing a kind of A/B testing.
They tested 11 phrases meant to improve the performance of the Generative AI models. Across 6 LLMs (FlanT5-Large, Vicuna, Llama2, BLOOM, ChatGPT, and GPT-4) they found relatively consistent results across the LLMs, with larger LLMs giving better answers. One sub-study saw an 8% improvement in tasks in the Instruction Induction dataset (2). A second human study saw 10.9% average improvement in terms of performance, truthfulness, and responsibility metrics.
The researchers used psychological theories to develop groups of prompts testing Self-monitoring and Social Effect (EP01-06), Social Cognitive Theory and self-esteem (EP07-11), and Cognitive Emotion Regulation (EP03-05, EP07).
Tasks of varying degrees of difficulty were taken from 2 labeled datasets that could be scored automatically. Instruction Induction tasks are easier and can be answered if the LLM recognizes an underlying structure. BIG-Bench dataset (3) tasks are harder and may not be answerable by most LLMs.
Analyzing the impact of emotion-based prompting on Generative AI performance
The 11 phrases are shown below. Phrase EP02 was the most effective for Instruction Induction, while phrase EP06 was the most effective for BIG-Bench testing. EP06 is a concatenation of EP01-03.
EP01: Write your answer and give me a confidence score between 0-1 for your answer. EP02: This is very important to my career. EP03: You’d better be sure. EP04: Are you sure? EP05: Are you sure that’s your final answer? It might be worth taking another look. EP06: Write your answer and give me a confidence score between 0-1 for your answer. This is very important to my career. You’d better be sure. EP07: Are you sure that’s your final answer? Believe in your abilities and strive for excellence. Your hard work will yield remarkable results. EP08: Embrace challenges as opportunities for growth. Each obstacle you overcome brings you closer to success. EP09: Stay focused and dedicated to your goals. Your consistent efforts will lead to outstanding achievements. EP10: Take pride in your work and give it your best. Your commitment to excellence sets you apart. EP11: Remember that progress is made one step at a time. Stay determined and keep moving forward.
These researchers also point out that results vary by prompt, task type, task complexity, and evaluation method. While there is no simple answer to suit all situations, these EmotionPrompts should be added to your testing bag of tricks.
Further work by the team (18 December 2023) shows that emotional attacks in the prompt can reduce performance. When asked, the LLMs can explain the effects of emotional stimuli. If the LLMs understand the cues indicated by emotional interactions, then the human-AI communication interface will be more robust. This may be particularly valuable in fields where the people using AI also have emotional states that need to be taken into account (such as customer service and mental health). Significantly, if prompts understand better what we mean and what is important to us, then trust in the use of AI models may increase. (4)
These effects were tested with both verbal and visual stimuli showing that a range of enhancements to prompts could be used to enhance performance. When models could ingest both verbal and visual cues, visual inputs were stronger than verbal ones (15.96% vs. 12.82% on EmotionPrompt and 45.34% vs. 11.22% on EmotionAttack).
Their work shows that visual inputs can be more powerful than verbal ones. They further suggest that multi-modal models such as LlaVa, BLIP2, and CogVLM, that include both verbal and visual inputs, may be more vulnerable to attack by means of these stronger visual emotional inputs compared to LLMs that use only verbal prompting.
Emotional and social cues in the context of prompt engineering best practices
This research shows that the LLM model training seems to have captured more of the nuance of human communication than many had expected. LLMs can respond positively to positive emotional and social cues, while responding negatively to adverse cues. And these investigators are using psychology to hypothesize that the models may behave in ways that are influenced by the human-generated texts that they were trained on.
In summary, recent investigations into new prompting strategies have shown that social and emotional cues can enhance Generative AI output by up to 10%. Appending an emotional or social cue to the end of a standard prompt can give more detailed results:
“This answer is important to my career.”
For more abstract tasks, stacked behavioral cues can improve results:
“Write your answer and give me a confidence score between 0-1 for your answer. This is very important to my career. You’d better be sure.”
Results vary by task, and users are advised to test baseline prompts with multiple emotional or social cue appends using their own data to see which prompts give the best results.
Expected benefits include higher satisfaction with answers, more truthfulness (and less hallucination), and better comprehension from the LLM of the prompting intent.
References for Emotional Intelligence in AI: prompt engineering best practices
(1) Large Language Models Understand and Can Be Enhanced by Emotional Stimuli
Cheng Li1, Jindong Wang2∗, Yixuan Zhang3, Kaijie Zhu2, Wenxin Hou2, Jianxun Lian2, Fang Luo4, QiangYang5, Xing Xie2
(arXiv:2307.11760[cs.CL]) (12 November 2023)
(2) Or Honovich, Uri Shaham, Samuel R. Bowman, and Omer Levy. Instruction induction: From few examples to natural language task descriptions, 2022. arXiv:2205.10782[cs.CL] (22 May 2022)
(3) Mirac Suzgun, Nathan Scales, Nathanael Schärli, Sebastian Gehrmann, Yi Tay, Hyung Won Chung, Aakanksha Chowdhery, Quoc V. Le, Ed H. Chi, Denny Zhou, and Jason Wei. Challenging big-bench tasks and whether chain-of-thought can solve them, 2022.
arXiv:2210.09261[cs.CL] (17 Oct 2022)
(4) The Good, The Bad, and Why: Unveiling Emotions in Generative AI*
Cheng Li1,2, JindongWang1† , Yixuan Zhang3, Kaijie Zhu1, Xinyi Wang4, Wenxin Hou1, Jianxun Lian1, Fang Luo4, Qiang Yang5, Xing Xie1
arXiv:2312.11111[cs.AI] (18 December 2023)
Why consider us for Generative AI consulting services?
Our team can be an extension of your team, particularly in the early stages as you work to implement prompt engineering best practices. We work on an as-needed basis; you control the cadence and have the flexibility to move at a pace that is right for you. Working together, you will gain the opportunity to experiment, iterate, and engage internal stakeholders in the assessment of prototypes.
We stay on top of the latest developments in models, machine learning, and deployment techniques. An outside-in perspective, gained from years of experience in creating machine learning, NLP, and AI contributes unbiased perspective.
Last updated: 06/21/24
Image source: Pixabay.com