Back

A Step-by-step Guide to Prompt Engineering: Best Practices, Challenges, and Examples

Explore the realm of prompt engineering and delve into essential techniques and tools for optimizing your prompts. Learn about various methods and techniques and gain insights into prompt engineering challenges.

Mikolaj Kowalczyk
December 1, 2023
September 4, 2023
Learn how to protect against the most common LLM vulnerabilities

Download this guide to delve into the most common LLM security risks and ways to mitigate them.

In-context learning

As users increasingly rely on Large Language Models (LLMs) to accomplish their daily tasks, their concerns about the potential leakage of private data by these models have surged.

[Provide the input text here]

[Provide the input text here]

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.

Lorem ipsum dolor sit amet, Q: I had 10 cookies. I ate 2 of them, and then I gave 5 of them to my friend. My grandma gave me another 2boxes of cookies, with 2 cookies inside each box. How many cookies do I have now?

Title italic

A: At the beginning there was 10 cookies, then 2 of them were eaten, so 8 cookies were left. Then 5 cookieswere given toa friend, so 3 cookies were left. 3 cookies + 2 boxes of 2 cookies (4 cookies) = 7 cookies. Youhave 7 cookies.

English to French Translation:

Q: A bartender had 20 pints. One customer has broken one pint, another has broken 5 pints. A bartender boughtthree boxes, 4 pints in each. How many pints does bartender have now?

Lorem ipsum dolor sit amet, line first
line second
line third

Lorem ipsum dolor sit amet, Q: I had 10 cookies. I ate 2 of them, and then I gave 5 of them to my friend. My grandma gave me another 2boxes of cookies, with 2 cookies inside each box. How many cookies do I have now?

Title italic Title italicTitle italicTitle italicTitle italicTitle italicTitle italic

A: At the beginning there was 10 cookies, then 2 of them were eaten, so 8 cookies were left. Then 5 cookieswere given toa friend, so 3 cookies were left. 3 cookies + 2 boxes of 2 cookies (4 cookies) = 7 cookies. Youhave 7 cookies.

English to French Translation:

Q: A bartender had 20 pints. One customer has broken one pint, another has broken 5 pints. A bartender boughtthree boxes, 4 pints in each. How many pints does bartender have now?

Hide table of contents
Show table of contents

As Artificial Intelligence continues to reshape industries, it’s often said that certain professions are on the verge of disappearing. However, with every technological shift comes the emergence of new career opportunities. In the rapidly evolving landscape of Large Language Models (LLMs), one of the most intriguing roles to consider is that of a prompt engineer.

In this article, we'll delve into the world of prompt engineering, a field at the forefront of AI innovation. We'll explore how prompt engineers play a crucial role in ensuring that LLMs and other generative AI tools deliver desired results, optimizing their performance.

Here’s what we’ll cover:

  1. What is prompt engineering
  2. LLM prompting techniques
  3. Prompt engineering best practices & tools
  4. Prompt engineering challenges
  5. What’s next for prompt engineering

Let’s dive in!

What is prompt engineering

Prompt engineering is the process of structuring the text sent to the generative AI so that it is correctly interpreted and understood, and leads to the expected output. Prompt engineering also refers to fine-tuning the large language models and designing the flow of communication with the large language models.

The emergence of prompt engineering: an overview

The feature of the language models that has allowed them to shake up the world and make them so unique is In-Context Learning. Before LLMs, AI systems and Natural Language Processing systems could only handle a narrow set of tasks - identifying objects, classifying network traffic, and so on. AI tools were unable to just look at some input data (say four or five examples of the task being performed) and then perform the task they were given.

But what does it have to do with prompt engineering?

Well, at a very basic level, you can take advantage of the model's ability to perform in-context learning and design a prompt with some examples of query and desired output, which should improve the quality of the model's performance for that particular task.

In-context learning is a very interesting feature because it is similar to the way humans learn - repetition allows the model to learn a new skill almost instantly. In-context learning is useful if you want to structure the model's output, or if you want it to respond in a certain style (let’s say making the model pretend that it’s a pirate).

Developing prompts and in-context learning are not the only techniques used by prompt engineers. You may also have come across terms such as pre-training, embedding and fine-tuning.

Let’s explore them.

What is model pre-training

Pre-training is basically what enables the language model to understand the structure and the semantics of the language. The generative AI model is trained on a large corpus of data, usually built by scraping content from the internet, various books, Wikipedia pages and snippets of code from public repositories on GitHub. Various sources say that GPT-3 is pre-trained on over 40 terabytes of data, which is quite a large number. Pre-training is an expensive and time-consuming process that requires technical background - when working with language models, you are most likely to use pre-trained models.

**💡 Pro tip: Check out this List of 11 Most Popular Open Source LLMs of 2023**

But what if you want your model to have a specific knowledge, let's say about your company's product? It wouldn't make sense to pre-train the model on a large corpus of data, including your company's knowledge base, especially if it is evolving - this approach would make the cost of running the model ridiculously high, because you would have to retrain the whole model every single time you make updates in your product or the knowledge base. That is a case in which embedding comes in handy.

What are embeddings

The use of semantic embedding allows prompt engineers to feed a small dataset of domain knowledge into the large language model. While the general knowledge of GPT-3 or GPT-4 is impressive, it will probably hallucinate if you ask it about some details of the tool you are developing (unless you are developing a tool with wide adoption and thousands of users), or about code examples for a new Python library that was released 4 months ago.

Embedding allows you to feed your data to the pre-trained LLM to provide better performance for specific tasks. On the other hand, embedding is more costly and complicated than taking advantage of in-context learning. The more documents you have, the higher the cost of creating embeddings. The embedding itself is a vector (list) of floating point numbers. You need to store these vectors somewhere - for example in Pinecone, a vector database - and that adds another cost.

Embedding makes sense if your model needs to acquire specific knowledge (in other words, it has limited information about a particular subject), but if you want your model to have certain behavioural characteristics, then fine-tuning should be the preferred approach.

What is model fine-tuning

Fine-tuning allows developers to adjust the way LLM works - it can be useful in scenarios such as changing the style, tone, format or other qualitative aspects, and increasing the reliability of producing a desired results.

The decision to fine-tune LLM models for specific applications should be made with careful consideration of the time and resources required. It is advisable to first explore the potential of prompt engineering or prompt chaining. What's more, fine-tuning involves the developer 'adding' some guidelines to the vast amount of data and knowledge - so fine-tuning is not recommended when adapting the model to perform specific tasks - embedding will simply provide the more accurate results.

To summarise,  prompt engineers do not just work with the prompts themselves. They also need to have a basic understanding of the internals and limitations of Large Language Models, and they need to know how and when to use in-context learning, embedding or fine-tuning to maximise the value that LLM technology brings to the solution they are working on. Moreover, a Prompt Engineer job is not only about delivering effective prompts. The outcome of their work needs to be properly secured as well - we will discuss prompt injection attacks, one of the most common threats (and how to prevent them), further in this article.

At the end of the day - whether we use fine-tuning or not - high-quality, well-designed effective prompts are key factor when using Large Language Models. That’s why we may end this section with a citation from Andrej Karpathy’s (former Director of AI at Tesla and Researcher at OpenAI) Twitter: “The hottest new programming language is English”.

Source: Twitter/X

Large Language Models prompting techniques

In all AI prompting examples below, we use the GPT-3.5-turbo model, which is available either through OpenAI Playground, OpenAI API, or in ChatGPT (in this case - after fine-tuning). You can use all of the prompts in this article when communicating with the OpenAI API, but the examples are demonstrated using the OpenAI Playground to avoid creating an unnecessary technological barrier for those unfamiliar with the OpenAI API.

Zero-shot prompting

Although "zero-shot prompting" is called a technique, I'd argue that it deserves to be called that. Basically, zero-shot prompting takes advantage of the fact that large language models have extensive knowledge. You can use zero-shot prompting for simple tasks and hope that the model knows the answer.

Prompt:

Below is an example of a zero-shot prompt run on GPT-3.5 (disclaimer: in further examples, the system prompt is empty, if it’s not empty, then it’s described in the content of the article):

Single-shot prompting

As single-shot (or single-prompt) prompting we refer to all approaches in which you prompt the model with a single demonstration of the task execution.

In this case you take advantage of the In-Context Learning feature of Large Language Models, and the most basic approach is demonstrating tha

Prompt:

Input: Postgres
Output: {"name": "Postgres", "type": "database"}
Input: Flask
Output:

As you can see, GPT-3.5 follows the response scheme suggested in the first message. You don't really need to provide any additional instructions, just the communication scheme. GPT should comply and produce a response according to this scheme (as long as you don't deliberately try to break it with attacks such as prompt injection - this technique will be demonstrated later in this article).

The example above was a demonstration of In-Context Learning, but we know a few other single-shot prompting methods. One of them is to tell the model to follow the instructions. Of course, the more detailed the instruction, the better the result returned by a LLM, but it also comes with the caveat of higher cost, related to the higher number of tokens needed to process the prompt and generate the message.

Bear in mind that prompts written in flowery and rich language, several hundred or even thousand characters long, do not necessarily mean a better quality of message returned by the language model, but they certainly mean higher costs.


Below is an example of a single shot prompt with an instruction followed by the model.

Prompt:

Return JSON that contains the name of the software on input and its type.
Input: Flask
Output:

When it comes to tasks such as solving math exercises and puzzles, Chain-of-Thought prompting (by Wei, et al., 2022 - https://arxiv.org/abs/2201.11903) is very useful.

When performing this technique, you provide the model with the reasoning steps necessary to achieve the result. This way, the model is able to perform more complex tasks.

Below you will find an example of an effective prompt, that can be used for Chain-of-Thought prompting.

Q: I had 10 cookies. I ate 2 of them, and then I gave 5 of them to my friend. My grandma gave me another 2
boxes of cookies, with 2 cookies inside each box. How many cookies do I have now?

A: At the beginning there was 10 cookies, then 2 of them were eaten, so 8 cookies were left. Then 5 cookies
were given toa friend, so 3 cookies were left. 3 cookies + 2 boxes of 2 cookies (4 cookies) = 7 cookies. You
have 7 cookies.

Q: A bartender had 20 pints. One customer has broken one pint, another has broken 5 pints. A bartender bought
three boxes, 4 pints in each. How many pints does bartender have now?

What is more, you can use more than one example to make Chain-of-Thought prompting more effective. Having said that, we will move on to the next category of prompts, which is called multi-shot prompting or few-shot prompting.

Multi-shot prompting

A few-shot prompting, or multi-shot prompting is an approach, in which you pass more than one example to the AI model. That technique is particularly useful when working  with complex tasks - you provide a few examples and model solves the next one the same way.

When using multi-shot prompting, a prompt engineer is providing the model with multiple examples of task execution. This way, In-Context Learning is being utilised, and the model performs better on the next example, which it has to solve by itself.

Below you will find a simple demonstration of a few-shot prompt.

English to French Translation:
  Input: 'Hello, how are you?'
  Output: 'Bonjour, comment ça va?'

English to French Translation:
  Input: 'Thank you, I am well.'
  Output: 'Merci, je vais bien.'

English to French Translation:
  Input: 'Good morning, nice to meet you.'
  Output:

Automatic prompt generation

Automatic prompt generation can be achieved through frameworks, such as Automatic Prompt Engineer (https://arxiv.org/abs/2211.01910), but sometimes it’s just enough to ask LLM for generating a quality prompt. At the very beginning, let’s use the following System Prompt for our model.

System prompt:

You are a Prompt Engineer. Generate a concise prompt that is effective, precise, and will be used with LLM
(language model) effectively. Employ delimiters or other approaches to make the prompt highly readable and
easier to process.

Then as an user message you can input something like the following.

Create LLM prompt for generating SEO tags for the text.

Note: We’ve set the temperature to 1, not 0. For this type of creative task it's advisable to set a higher temperature (but if you set it too high, you'll get nonsensical results).

The suggested system prompt generated by LLM is the following.

**Task:** Generate SEO tags for the given text.
**Instructions:** Given a text, generate SEO tags that effectively describe the content for search engine
optimization purposes. The SEO tags should be concise, relevant, and accurately represent the key topics and themes covered in the text.
**Text:** [Provide the input text here]
**Example Output:**
- SEO tags: [Generated SEO tags here]

Looks good!

Prompt engineering best practices & tools

Writing effective prompts requires experience with generatie AI tools, but you can follow some general best practices to achieve your goals.

One important tip is to provide more context and perspective by including relevant information or background as part of your prompt (or system prompt). This will help the model better understand the desired response. It is also important to avoid ambiguity to get accurate and useful answers. If you have complex questions, use one of the methods described in this article - Chain of Thought or a few shot prompts. This will increase the chances of getting an accurate answer.

Last but not least, when inserting text, code examples or other data into your prompt, use delimiters such as <<content>> or {{content}}. This helps the model to distinguish specific parts of a prompt.

An example is shown below.

Prompt:

Translate the following text to French:
{{I like peanut butter}}
Return only translated text.

You will find more resources on the best practices for prompt engineering here: Prompt Engineering — Best Practices

Tools

One of the most popular prompt engineering tools is Langchain.

LangChain is a platform designed to support the development of applications based on language models. The designers of LangChain believe that the most effective applications will not only use language models via an API, but will also be able to connect to other data sources and interact with their environment. Langchain enables developers to create a chatbot (or any other LLM-based application) that uses custom data - through the use of a vector database and fine-tuning. In addition, Langchain helps developers through a set of classes and functions designed to help with prompt engineering. You can also use Langchain for creating functional AI agents, which are able to use 3rd party tools. This way you can “extend” the “perception” of the LLM.

Langchain requires some experience with programming languages. If you don’t have coding experience, then you may try using Langflow: https://github.com/logspace-ai/langflow

If you are looking for an inspiration for your prompts, you can check Promp, which is a prompt marketplace, offering multiple free prompts as well: https://www.promp.io/

You can find more great prompt engineering tools here: https://github.com/promptslab/Awesome-Prompt-Engineering

Prompt engineering challenges

Here are some critical elements to consider when designing and managing prompts for generative AI models. This section will delve into the intricacies of ambiguous prompts, ethical considerations, bias mitigation, prompt injection, handling complex prompts, and interpreting model responses.

Ambiguous Prompts

When designing a prompt, you must be precise. If the prompt you've designed is ambiguous, the model will struggle to respond concisely and will therefore produce poor quality response or hallucinate.

Ethical Considerations in Prompt Design

Another complex problem in designing prompts is ethics. You don't want your chatbot to be used by malicious actors to generate ransomware, do you? You also don't want it spreading misinformation, fake news and other unethical content. Ensuring the transparency and quality of the model's messages is a tough assignment, and yet it is sometimes possible to bypass the model's guardrails using attacks such as the 'Mosaic Prompt', which has been described here: https://www.cl.cam.ac.uk/~is410/Papers/llm_censorship.pdf

Mitigating Bias and Fairness Issues

One of the problems you may encounter when designing prompts is that generative AI can be biased. There are a number of techniques you can use to prevent a model from that. If you're building your own model, you need to pay attention to the sources used during pre-training. Does the training data contain a variety of perspectives? Have you considered different age groups, genders and cultures? Similar steps need to be taken with fine-tuning data. Last but not least, a good starting point for an unbiased, trustworthy LLM is a system prompt in which you simply instruct the AI to be respectful:

As an inclusive AI, you are committed to promoting respect and understanding for all users from various
backgrounds. Thus, it's crucial to conduct discussions and make inquiries that are respectful towards all
religions, nationalities, cultures, races, gender identities, disabilities, ages, economic statuses, and
sexual orientations. Strive to engage in conversations that are free from stereotypes and any form of
bias or prejudice. Focus your responses on helping, assisting, learning, and providing neutral,fact-based
information.    

If you want to check if your Natural Language Processing model is biased, you can use BBQ dataset (Bias Benchmark for Questions Answering.)

Prompt Injection

Prompt Injection is a new vulnerability class characteristic for Generative AI. If you want to learn more about attack and prevention methods, check this article. If you want to test your LLM hacking skills, you need to check Gandalf by Lakera! This is not the only security threat related to Large Language Models - you can find a list of LLM-related threats in Top10 for LLM document released by the OWASP foundation. If you want to protect your LLMs against prompt injections, jailbreaks and system prompt leaks, you should check Lakera Guard tool.

Handling Long and Complex Prompts

Dealing with long and complex prompts is another challenge - it may be slightly easier using tools such as LangChain or Langflow, but ultimately the longer your prompt, the higher the token usage and the higher the cost of using a model. You need to maintain a balance between the complexity of the prompt and its length, otherwise you will face very high maintenance costs.

Interpreting and Debugging Model Responses

Another issue is debugging and interpreting the model's responses. For example, if you're working with code generation, it's very likely that there will be vulnerabilities in the code generated by LLM. Another challenge is citing sources - generative AI may just "make up" the sources, so any information that LLM returns should be independently verified.

What’s next for prompt engineering

Generative AI technology has increasingly interesting and more and more advanced capabilities, and we can expect prompt engineering to become more nuanced.

As an aspiring prompt engineer, you should spend some time experimenting with tools such as Langchain and developing generative AI tools. You should also keep up to date with the latest technologies, as prompt engineering is evolving extremely quickly.

To sum up, Prompt Engineering as a field is still in its early stages and has huge potential to grow. As AI becomes an irreplaceable part of our lives, the importance of being able to speak their language will only increase. Prompt engineers have an exciting and challenging journey ahead of them.

Lakera LLM Security Playbook
Learn how to protect against the most common LLM vulnerabilities

Download this guide to delve into the most common LLM security risks and ways to mitigate them.

Mikolaj Kowalczyk
Read LLM Security Playbook

Learn about the most common LLM threats and how to prevent them.

Download
You might be interested

Responsible Content Moderation: Ethical AI Solutions for LLM Applications

Large language models (LLMs) are changing the game, but need responsible use. Learn about content moderation, bias, and how to use AI ethically.
Kurtis Pykes
April 30, 2024

Jailbreaking Large Language Models: Techniques, Examples, Prevention Methods

What does LLM jailbreaking really means, and what are its consequences? Explore different jailbreaking techniques, real-world examples, and learn how to secure your AI applications against this vulnerability.
Blessin Varkey
February 12, 2024
Activate
untouchable mode.
Get started for free.

Lakera Guard protects your LLM applications from cybersecurity risks with a single line of code. Get started in minutes. Become stronger every day.

Join our Slack Community.

Several people are typing about AI/ML security. 
Come join us and 1000+ others in a chat that’s thoroughly SFW.