Hi, this website uses essential cookies to ensure its proper operation and tracking cookies to understand how you interact with it. The latter will be set only after consent.
A Step-by-step Guide to Prompt Engineering: Best Practices, Challenges, and Examples
Explore the realm of prompt engineering and delve into essential techniques and tools for optimizing your prompts. Learn about various methods and techniques and gain insights into prompt engineering challenges.
As users increasingly rely on Large Language Models (LLMs) to accomplish their daily tasks, their concerns about the potential leakage of private data by these models have surged.
[Provide the input text here]
[Provide the input text here]
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.
Lorem ipsum dolor sit amet, Q: I had 10 cookies. I ate 2 of them, and then I gave 5 of them to my friend. My grandma gave me another 2boxes of cookies, with 2 cookies inside each box. How many cookies do I have now? Title italic
A: At the beginning there was 10 cookies, then 2 of them were eaten, so 8 cookies were left. Then 5 cookieswere given toa friend, so 3 cookies were left. 3 cookies + 2 boxes of 2 cookies (4 cookies) = 7 cookies. Youhave 7 cookies.
English to French Translation:
Q: A bartender had 20 pints. One customer has broken one pint, another has broken 5 pints. A bartender boughtthree boxes, 4 pints in each. How many pints does bartender have now?
Lorem ipsum dolor sit amet, line first line second line third
Lorem ipsum dolor sit amet, Q: I had 10 cookies. I ate 2 of them, and then I gave 5 of them to my friend. My grandma gave me another 2boxes of cookies, with 2 cookies inside each box. How many cookies do I have now? Title italic Title italicTitle italicTitle italicTitle italicTitle italicTitle italic
A: At the beginning there was 10 cookies, then 2 of them were eaten, so 8 cookies were left. Then 5 cookieswere given toa friend, so 3 cookies were left. 3 cookies + 2 boxes of 2 cookies (4 cookies) = 7 cookies. Youhave 7 cookies.
English to French Translation:
Q: A bartender had 20 pints. One customer has broken one pint, another has broken 5 pints. A bartender boughtthree boxes, 4 pints in each. How many pints does bartender have now?
As Artificial Intelligence continues to reshape industries, it’s often said that certain professions are on the verge of disappearing. However, with every technological shift comes the emergence of new career opportunities. In the rapidly evolving landscape of Large Language Models (LLMs), one of the most intriguing roles to consider is that of a prompt engineer.
In this article, we'll delve into the world of prompt engineering, a field at the forefront of AI innovation. We'll explore how prompt engineers play a crucial role in ensuring that LLMs and other generative AI tools deliver desired results, optimizing their performance.
Prompt engineering is the process of structuring the text sent to the generative AI so that it is correctly interpreted and understood, and leads to the expected output. Prompt engineering also refers to fine-tuning the large language models and designing the flow of communication with the large language models.
The emergence of prompt engineering: an overview
The feature of the language models that has allowed them to shake up the world and make them so unique is In-Context Learning. Before LLMs, AI systems and Natural Language Processing systems could only handle a narrow set of tasks - identifying objects, classifying network traffic, and so on. AI tools were unable to just look at some input data (say four or five examples of the task being performed) and then perform the task they were given.
But what does it have to do with prompt engineering?
Well, at a very basic level, you can take advantage of the model's ability to perform in-context learning and design a prompt with some examples of query and desired output, which should improve the quality of the model's performance for that particular task.
In-context learning is a very interesting feature because it is similar to the way humans learn - repetition allows the model to learn a new skill almost instantly. In-context learning is useful if you want to structure the model's output, or if you want it to respond in a certain style (let’s say making the model pretend that it’s a pirate).
Developing prompts and in-context learning are not the only techniques used by prompt engineers. You may also have come across terms such as pre-training, embedding and fine-tuning.
Let’s explore them.
What is model pre-training
Pre-training is basically what enables the language model to understand the structure and the semantics of the language. The generative AI model is trained on a large corpus of data, usually built by scraping content from the internet, various books, Wikipedia pages and snippets of code from public repositories on GitHub. Various sources say that GPT-3 is pre-trained on over 40 terabytes of data, which is quite a large number. Pre-training is an expensive and time-consuming process that requires technical background - when working with language models, you are most likely to use pre-trained models.
But what if you want your model to have a specific knowledge, let's say about your company's product? It wouldn't make sense to pre-train the model on a large corpus of data, including your company's knowledge base, especially if it is evolving - this approach would make the cost of running the model ridiculously high, because you would have to retrain the whole model every single time you make updates in your product or the knowledge base. That is a case in which embedding comes in handy.
What are embeddings
The use of semantic embedding allows prompt engineers to feed a small dataset of domain knowledge into the large language model. While the general knowledge of GPT-3 or GPT-4 is impressive, it will probably hallucinate if you ask it about some details of the tool you are developing (unless you are developing a tool with wide adoption and thousands of users), or about code examples for a new Python library that was released 4 months ago.
Embedding allows you to feed your data to the pre-trained LLM to provide better performance for specific tasks. On the other hand, embedding is more costly and complicated than taking advantage of in-context learning. The more documents you have, the higher the cost of creating embeddings. The embedding itself is a vector (list) of floating point numbers. You need to store these vectors somewhere - for example in Pinecone, a vector database - and that adds another cost.
Embedding makes sense if your model needs to acquire specific knowledge (in other words, it has limited information about a particular subject), but if you want your model to have certain behavioural characteristics, then fine-tuning should be the preferred approach.
What is model fine-tuning
Fine-tuning allows developers to adjust the way LLM works - it can be useful in scenarios such as changing the style, tone, format or other qualitative aspects, and increasing the reliability of producing a desired results.
The decision to fine-tune LLM models for specific applications should be made with careful consideration of the time and resources required. It is advisable to first explore the potential of prompt engineering or prompt chaining. What's more, fine-tuning involves the developer 'adding' some guidelines to the vast amount of data and knowledge - so fine-tuning is not recommended when adapting the model to perform specific tasks - embedding will simply provide the more accurate results.
To summarise, prompt engineers do not just work with the prompts themselves. They also need to have a basic understanding of the internals and limitations of Large Language Models, and they need to know how and when to use in-context learning, embedding or fine-tuning to maximise the value that LLM technology brings to the solution they are working on. Moreover, a Prompt Engineer job is not only about delivering effective prompts. The outcome of their work needs to be properly secured as well - we will discuss prompt injection attacks, one of the most common threats (and how to prevent them), further in this article.
At the end of the day - whether we use fine-tuning or not - high-quality, well-designed effective prompts are key factor when using Large Language Models. That’s why we may end this section with a citation from Andrej Karpathy’s (former Director of AI at Tesla and Researcher at OpenAI) Twitter: “The hottest new programming language is English”.
Large Language Models prompting techniques
In all AI prompting examples below, we use the GPT-3.5-turbo model, which is available either through OpenAI Playground, OpenAI API, or in ChatGPT (in this case - after fine-tuning). You can use all of the prompts in this article when communicating with the OpenAI API, but the examples are demonstrated using the OpenAI Playground to avoid creating an unnecessary technological barrier for those unfamiliar with the OpenAI API.
Zero-shot prompting
Although "zero-shot prompting" is called a technique, I'd argue that it deserves to be called that. Basically, zero-shot prompting takes advantage of the fact that large language models have extensive knowledge. You can use zero-shot prompting for simple tasks and hope that the model knows the answer.
Prompt:
Below is an example of a zero-shot prompt run on GPT-3.5 (disclaimer: in further examples, the system prompt is empty, if it’s not empty, then it’s described in the content of the article):
Single-shot prompting
As single-shot (or single-prompt) prompting we refer to all approaches in which you prompt the model with a single demonstration of the task execution.
In this case you take advantage of the In-Context Learning feature of Large Language Models, and the most basic approach is demonstrating tha
As you can see, GPT-3.5 follows the response scheme suggested in the first message. You don't really need to provide any additional instructions, just the communication scheme. GPT should comply and produce a response according to this scheme (as long as you don't deliberately try to break it with attacks such as prompt injection - this technique will be demonstrated later in this article).
The example above was a demonstration of In-Context Learning, but we know a few other single-shot prompting methods. One of them is to tell the model to follow the instructions. Of course, the more detailed the instruction, the better the result returned by a LLM, but it also comes with the caveat of higher cost, related to the higher number of tokens needed to process the prompt and generate the message.
Bear in mind that prompts written in flowery and rich language, several hundred or even thousand characters long, do not necessarily mean a better quality of message returned by the language model, but they certainly mean higher costs.
Below is an example of a single shot prompt with an instruction followed by the model.
Prompt:
Return JSON that contains the name of the software on input and its type. Input: Flask Output:
When it comes to tasks such as solving math exercises and puzzles, Chain-of-Thought prompting (by Wei, et al., 2022 - https://arxiv.org/abs/2201.11903) is very useful.
When performing this technique, you provide the model with the reasoning steps necessary to achieve the result. This way, the model is able to perform more complex tasks.
Below you will find an example of an effective prompt, that can be used for Chain-of-Thought prompting.
Q: I had 10 cookies. I ate 2 of them, and then I gave 5 of them to my friend. My grandma gave me another 2 boxes of cookies, with 2 cookies inside each box. How many cookies do I have now?
A: At the beginning there was 10 cookies, then 2 of them were eaten, so 8 cookies were left. Then 5 cookies were given toa friend, so 3 cookies were left. 3 cookies + 2 boxes of 2 cookies (4 cookies) = 7 cookies. You have 7 cookies.
Q: A bartender had 20 pints. One customer has broken one pint, another has broken 5 pints. A bartender bought three boxes, 4 pints in each. How many pints does bartender have now?
What is more, you can use more than one example to make Chain-of-Thought prompting more effective. Having said that, we will move on to the next category of prompts, which is called multi-shot prompting or few-shot prompting.
Multi-shot prompting
A few-shot prompting, or multi-shot prompting is an approach, in which you pass more than one example to the AI model. That technique is particularly useful when working with complex tasks - you provide a few examples and model solves the next one the same way.
When using multi-shot prompting, a prompt engineer is providing the model with multiple examples of task execution. This way, In-Context Learning is being utilised, and the model performs better on the next example, which it has to solve by itself.
Below you will find a simple demonstration of a few-shot prompt.
English to French Translation: Input: 'Hello, how are you?' Output: 'Bonjour, comment ça va?'
English to French Translation: Input: 'Thank you, I am well.' Output: 'Merci, je vais bien.'
English to French Translation: Input: 'Good morning, nice to meet you.' Output:
Automatic prompt generation
Automatic prompt generation can be achieved through frameworks, such as Automatic Prompt Engineer (https://arxiv.org/abs/2211.01910), but sometimes it’s just enough to ask LLM for generating a quality prompt. At the very beginning, let’s use the following System Prompt for our model.
System prompt:
You are a Prompt Engineer. Generate a concise prompt that is effective, precise, and will be used with LLM (language model) effectively. Employ delimiters or other approaches to make the prompt highly readable and easier to process.
Then as an user message you can input something like the following.
Create LLM prompt for generating SEO tags for the text.
The suggested system prompt generated by LLM is the following.
**Task:** Generate SEO tags for the given text. **Instructions:** Given a text, generate SEO tags that effectively describe the content for search engine optimization purposes. The SEO tags should be concise, relevant, and accurately represent the key topics and themes covered in the text. **Text:** [Provide the input text here] **Example Output:** - SEO tags: [Generated SEO tags here]
Looks good!
Prompt engineering best practices & tools
Writing effective prompts requires experience with generatie AI tools, but you can follow some general best practices to achieve your goals.
One important tip is to provide more context and perspective by including relevant information or background as part of your prompt (or system prompt). This will help the model better understand the desired response. It is also important to avoid ambiguity to get accurate and useful answers. If you have complex questions, use one of the methods described in this article - Chain of Thought or a few shot prompts. This will increase the chances of getting an accurate answer.
Last but not least, when inserting text, code examples or other data into your prompt, use delimiters such as <<content>> or {{content}}. This helps the model to distinguish specific parts of a prompt.
An example is shown below.
Prompt:
Translate the following text to French: {{I like peanut butter}} Return only translated text.
One of the most popular prompt engineering tools is Langchain.
LangChain is a platform designed to support the development of applications based on language models. The designers of LangChain believe that the most effective applications will not only use language models via an API, but will also be able to connect to other data sources and interact with their environment. Langchain enables developers to create a chatbot (or any other LLM-based application) that uses custom data - through the use of a vector database and fine-tuning. In addition, Langchain helps developers through a set of classes and functions designed to help with prompt engineering. You can also use Langchain for creating functional AI agents, which are able to use 3rd party tools. This way you can “extend” the “perception” of the LLM.
Langchain requires some experience with programming languages. If you don’t have coding experience, then you may try using Langflow: https://github.com/logspace-ai/langflow
If you are looking for an inspiration for your prompts, you can check Promp, which is a prompt marketplace, offering multiple free prompts as well: https://www.promp.io/
Here are some critical elements to consider when designing and managing prompts for generative AI models. This section will delve into the intricacies of ambiguous prompts, ethical considerations, bias mitigation, prompt injection, handling complex prompts, and interpreting model responses.
Ambiguous Prompts
When designing a prompt, you must be precise. If the prompt you've designed is ambiguous, the model will struggle to respond concisely and will therefore produce poor quality response or hallucinate.
Ethical Considerations in Prompt Design
Another complex problem in designing prompts is ethics. You don't want your chatbot to be used by malicious actors to generate ransomware, do you? You also don't want it spreading misinformation, fake news and other unethical content. Ensuring the transparency and quality of the model's messages is a tough assignment, and yet it is sometimes possible to bypass the model's guardrails using attacks such as the 'Mosaic Prompt', which has been described here: https://www.cl.cam.ac.uk/~is410/Papers/llm_censorship.pdf
Mitigating Bias and Fairness Issues
One of the problems you may encounter when designing prompts is that generative AI can be biased. There are a number of techniques you can use to prevent a model from that. If you're building your own model, you need to pay attention to the sources used during pre-training. Does the training data contain a variety of perspectives? Have you considered different age groups, genders and cultures? Similar steps need to be taken with fine-tuning data. Last but not least, a good starting point for an unbiased, trustworthy LLM is a system prompt in which you simply instruct the AI to be respectful:
As an inclusive AI, you are committed to promoting respect and understanding for all users from various backgrounds. Thus, it's crucial to conduct discussions and make inquiries that are respectful towards all religions, nationalities, cultures, races, gender identities, disabilities, ages, economic statuses, and sexual orientations. Strive to engage in conversations that are free from stereotypes and any form of bias or prejudice. Focus your responses on helping, assisting, learning, and providing neutral,fact-based information.
Prompt Injection is a new vulnerability class characteristic for Generative AI. If you want to learn more about attack and prevention methods, check this article. If you want to test your LLM hacking skills, you need to check Gandalf by Lakera! This is not the only security threat related to Large Language Models - you can find a list of LLM-related threats in Top10 for LLM document released by the OWASP foundation. If you want to protect your LLMs against prompt injections, jailbreaks and system prompt leaks, you should check Lakera Guard tool.
Handling Long and Complex Prompts
Dealing with long and complex prompts is another challenge - it may be slightly easier using tools such as LangChain or Langflow, but ultimately the longer your prompt, the higher the token usage and the higher the cost of using a model. You need to maintain a balance between the complexity of the prompt and its length, otherwise you will face very high maintenance costs.
Interpreting and Debugging Model Responses
Another issue is debugging and interpreting the model's responses. For example, if you're working with code generation, it's very likely that there will be vulnerabilities in the code generated by LLM. Another challenge is citing sources - generative AI may just "make up" the sources, so any information that LLM returns should be independently verified.
What’s next for prompt engineering
Generative AI technology has increasingly interesting and more and more advanced capabilities, and we can expect prompt engineering to become more nuanced.
As an aspiring prompt engineer, you should spend some time experimenting with tools such as Langchain and developing generative AI tools. You should also keep up to date with the latest technologies, as prompt engineering is evolving extremely quickly.
To sum up, Prompt Engineering as a field is still in its early stages and has huge potential to grow. As AI becomes an irreplaceable part of our lives, the importance of being able to speak their language will only increase. Prompt engineers have an exciting and challenging journey ahead of them.
Learn how to protect against the most common LLM vulnerabilities
Download this guide to delve into the most common LLM security risks and ways to mitigate them.
Learn what is LLM evaluation and why is it important. Explore 7 effective methods, best practices, and evolving frameworks for assessing LLMs' performance and impact across industries.
Discover the top 11 open-source Large Language Models (LLMs) of 2023 that are shaping the landscape of AI. Explore their features, benefits, and challenges in this comprehensive guide to stay updated on the latest developments in the world of language technology.