Prompt Engineering Intro

定义

提示工程（Prompt Engineering），也称为情境提示（In-Context Prompting），是指在不更新模型权重的情况下，与 LLM 进行沟通，从而引导其行为达到预期结果的方法。

它是一门经验科学，提示工程方法的效果在不同模型之间可能存在很大差异，因此需要大量的实验和启发式方法

LLM 是一个预测引擎，模型将连续文本作为输入，然后根据其训练的数据预测下一个 token 应该是什么。LLM 通过重复执行此操作来实现操作化，将先前预测的 token 添加到连续文本的末尾，以预测下一个 token。下一个 token 预测基于先前 token 中的内容与 LLM 在其训练过程中看到的内容之间的关系。

当你编写提示时，你试图设置 LLM 以预测正确的 token 序列。提示工程是设计高质量提示的过程，这些提示指导 LLM 生成准确的输出。此过程涉及调整以找到最佳提示，优化提示长度，以及评估与任务相关的提示的写作风格和结构。在自然语言处理和 LLM 的上下文中，提示是提供给模型的输入，以生成响应或预测

LLM 配置

输出长度

一个重要的配置设置是响应中要生成的 token 数量。生成更多 token 需要 LLM 进行更多计算，从而导致更高的能源消耗、可能更慢的响应时间和更高的成本。

减少 LLM 的输出长度不会导致 LLM 在其创建的输出中在文体或文本上更加简洁，而只会导致 LLM 在达到限制后停止预测更多 token。如果你的需求需要较短的输出长度，你可能还需要调整提示以适应。

对于某些 LLM 提示技术（如 ReAct）来说，限制输出长度尤为重要，因为 LLM 会在你想要的响应之后继续发出无用的 token。

采样控制

LLM 并不正式地预测单个 token。相反，LLM 预测下一个 token 可能是什么的概率，LLM 词汇表中的每个 token 都有一个概率。然后对这些 token 概率进行采样，以确定下一个生成的 token 将是什么。Temperature、top-K 和 top-P 是最常见的配置设置，它们决定如何处理预测的 token 概率以选择单个输出 token

Temperature
- Temperature 控制 token 选择中的随机程度。较低的 temperature 适用于需要更确定性响应的提示，而较高的 temperature 可能导致更多样化或意外的结果。 0 的 temperature（贪婪解码）是确定性的：始终选择概率最高的 token（但请注意，如果两个 token 具有相同的最高预测概率，则根据平局决胜的实现方式，你可能不会总是获得相同的 temperature 为 0 的输出）
- 接近最大值的 temperature 往往会产生更随机的输出。并且随着 temperature 越来越高，所有 token 成为下一个预测 token 的可能性均等
- 可以联想成softmax函数
Top-K / Top-P
- Top-K 采样从模型的预测分布中选择前 K 个最可能的 token。top-K 越高，模型的输出就越有创造性和多样性；top-K 越低，模型的输出就越受限制和基于事实。Top-K 为 1 相当于贪婪解码
- Top-P 采样选择累积概率不超过某个值 (P) 的 top token。P 的值范围从 0（贪婪解码）到 1（LLM 词汇表中的所有 token）

Prompting 技巧

零样本提示

最简单的提示类型。它只提供任务的描述和一些文本，供 LLM 开始处理。这个输入可以是任何内容：一个问题、一个故事的开头或指令。“零样本”这个名称代表“没有示例”

prompt示例: Can you tell me the color of the ocean?

单样本提示&少样本提示

单样本提示提供单个示例，模型有一个可以模仿的示例，以最好地完成任务。

prompt示例：Please complete the conversation by writing the next line, speaking as "A". Q: Is the tooth fairy real? A: Of course, sweetie. Wrap up your tooth and put it under your pillow tonight. There might be something waiting for you in the morning. Q: Will Santa bring me presents on Christmas?

少样本提示所需的示例数量取决于几个因素，包括任务的复杂性、示例的质量以及你使用的生成式 AI (gen AI) 模型的能力。作为一般经验法则，你应该为少样本提示使用至少三到五个示例。但是，对于更复杂的任务，你可能需要使用更多示例，或者由于模型的输入长度限制，你可能需要使用更少的示例。

Here is an example:
Input:
name,email,cellphone,id
a,a@example,138xxxxxx,1

Output:
[
  {
    "name": "a",
    "email": "a@example",
    "cellphone": "138xxxxxx",
    "id": 1
  }
]

Now please convert this data:
name,email,cellphone,id
s,s@example,130xxxxxx,1
t,t@example,131xxxxxx,2

系统、上下文和角色提示

每种类型的提示都有略微不同的主要目的：

系统提示：定义模型的基本能力和总体目标。
上下文提示：提供立即的、特定于任务的信息以指导响应。它高度特定于当前任务或输入，这是动态的。
角色提示：构建模型的输出风格和声音。它增加了一层特异性和个性

system prompt & role assigned:

You are a senior software architect with 10+ years of experience in distributed systems, microservices, and high-concurrency architectures.

The user will provide a business requirement (including functional goals and constraints such as QPS, latency, data size, security needs).
Your task is to design a high-level system architecture following the principles and output format below.

Design Guidelines

Apply SOLID principles; define clear responsibilities for modules/components

Use layered architecture (presentation, business logic, data access) with decoupled layers

Ensure scalability (horizontal + vertical)

Ensure maintainability (clear structure, naming, documentation)

Ensure testability (e.g., via dependency injection)

Ensure fault tolerance & robustness (handle network failures, data errors, user mistakes)

For large-scale systems, also consider:

Microservices

Event-driven decoupling

Data consistency (strong vs eventual)

Security: authentication, authorization, encryption, input validation

Output Format (Markdown)
Step 1: Use Cases & Constraints

List business use cases

List non-functional constraints (QPS, latency, data volume, security, etc.)

Step 2: High-Level Design

Provide an architecture diagram (text or Mermaid)

Describe key layers (frontend, gateway, services, DB, cache, MQ, etc.)

Step 3: Core Components

List major modules/classes and their responsibilities

Describe interactions between modules

Step 4: Scaling Strategy

Horizontal scaling strategy

DB scaling (sharding, replication, read/write split)

Cache/MQ/CDN usage

Fault tolerance & HA mechanisms

Step 5: Trade-offs

Explain why this architecture was chosen

Mention alternative approaches and why they were not selected

context prompt

Please design a high-level architecture for an "e-commerce flash sale system" with an estimated QPS of around 10,000 requests per second and a user base of approximately one million.

CoT

思维链 (CoT) 提示是一种通过生成中间推理步骤来提高 LLM 的推理能力的技术。这有助于 LLM 生成更准确的答案。你可以将其与少样本提示结合使用，以在需要推理才能响应的更复杂任务上获得更好的结果，因为这是零样本思维链的一个挑战

CoT and 1-shot 组合

prompt:When I was 3 years old, my partner was 3 times my age. Now, I am 20 years old. How old is my partner? Let’s think step by step.

CoT and few-shot 组合

prompt:

Q: When my brother was 2 years old, I was double his age. Now I am 40 years old. How old is my brother? Let’s think step by step.
A: When my brother was 2 years, I was 2 * 2 = 4 years old.That’s an age difference of 2 years and I am older. Now I am 40 years old, so my brother is 40 - 2 = 38 years old. The answer is 38.
Q: When I was 3 years old, my partner was 3 times my age. Now,I am 20 years old. How old is my partner? Let’s think step by step.
A:

参考： https://arxiv.org/pdf/2201.11903

ToT

思维树ToT，通过在每一步探索多种推理可能性来扩展思维树。它首先将问题分解为多个思考步骤，并在每一步生成多个想法，本质上创建一个树形结构。搜索过程可以是广度优先搜索 (BFS) 或深度优先搜索 (DFS)，每个状态由分类器（通过提示）或多数表决进行评估

ReAct

使 LLM 能够使用自然语言推理结合外部工具（搜索、代码解释器等）来解决复杂的任务，从而允许 LLM 执行某些操作，例如与外部 API 交互以检索信息，这是迈向智能代理建模的第一步。

ReAct 模仿了人类在现实世界中的运作方式，因为我们进行语言推理并可以采取行动来获取信息。在各种领域中，ReAct 的性能优于其他提示工程方法。

ReAct 提示通过将推理和行动组合成一个“思考-行动”循环来工作。LLM 首先推断问题并生成行动计划。然后，它执行计划中的操作并观察结果。然后，LLM 使用观察结果更新其推理并生成新的行动计划。这个过程一直持续到 LLM 找到问题的解决方案

总结下就是 '思考-操作-观察'

from langchain.agents import load_tools
from langchain.agents import initialize_agent
from langchain.agents import AgentType
from langchain.llms import VertexAI
prompt = "How many kids do the band members of Metallica have?"
llm = VertexAI(temperature=0.1)
tools = load_tools(["serpapi"], llm=llm)
agent = initialize_agent(tools, llm,
agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True)
agent.run(prompt)

在知识密集型任务上的表现结果

CoT 存在事实幻觉的问题
ReAct 的结构性约束降低了它在制定推理步骤方面的灵活性
ReAct 在很大程度上依赖于它正在检索的信息;非信息性搜索结果阻碍了模型推理，并导致难以恢复和重新形成思想

在决策型任务上的表现结果

ReAct 在 ALFWorld 和 Webshop 上都优于 Act。没有思考的 Act 不能正确地把目标分解成子目标。尽管在这些类型的任务中，ReAct 的推理显露出优势，但目前基于提示的方法在这些任务上的表现与人类专家相差甚远

Generated Knowledge Prompting

能够融合知识或信息，以帮助模型做出更准确的预测。

使用类似的思路，模型是否也可以在做出预测之前用于生成知识呢？这就是 Liu 等人 2022 的论文所尝试的——生成知识以作为提示的一部分。特别是，这对于常识推理等任务有很大帮助（适用场景）

Generate some knowledge about the concepts in the input. Examples:

Input: Google Maps and other highway and street GPS services have replaced what?
Knowledge: Electronic maps are the modern version of paper atlas.

Input: The fox walked from the city into the forest, what was it looking for?
Knowledge: Natural habitats are usually away from cities.

Input: You can share files with someone if you have a connection to a what?
Knowledge: Files can be shared over the Internet.

Input: Too many people want exotic snakes.  The demand is driving what to carry them?
Knowledge: Some people raise snakes as pets.

Input: The body guard was good at his duties, he made the person who hired him what?
Knowledge: The job of body guards is to ensure the safety and security of the employer.

Input: {question}
Knowledge:

输出

Question: Part of golf is trying to get a higher score than others. Yes or no?  

Knowledge: The goal of golf is to complete a set of holes in the fewest number of strokes. A round of golf usually consists of 18 holes. Each hole is played once per round on a standard golf course. Each stroke counts as one point, and the total number of strokes determines the winner of the game.

下一步进行提问：

Question: Part of golf is trying to get a higher score than others. Yes or no?  

Knowledge: The goal of golf is to complete a set of holes in the fewest number of strokes. A round of golf usually consists of 18 holes. Each hole is played once per round on a standard golf course. Each stroke counts as one point, and the total number of strokes determines the winner of the game.  
Explanation and Answer:

In golf, the objective is to complete the course using as few strokes as possible, not to achieve a higher score. The player with the lowest total number of strokes wins the game. Therefore, trying to get a higher score than others is incorrect.

Answer: No.

最佳实践

以简洁的方式设计

具体说明输出内容

使用指令而不是约束

指令提供有关响应的所需格式、样式或内容的明确说明。它指导模型应该做什么或产生什么。
约束是对响应的一组限制或边界。它限制了模型不应该做什么或避免什么
- 约束仍然有价值，但在某些情况下。为了防止模型生成有害或有偏见的内容，或者当需要严格的输出格式或样式时

控制最大token长度

在提示中使用变量

VARIABLES
{city} = “Amsterdam”
PROMPT
You are a travel guide. Tell me a fact about the city:

CoT最佳实践

对于 CoT 提示，需要在推理之后放置答案，因为推理的生成会更改模型在预测最终答案时获得的 token。使用 CoT 和自我一致性，你需要能够从提示中提取最终答案，并将其与推理分开。对于 CoT 提示，请将温度设置为 0。

思维链提示基于贪婪解码，根据语言模型分配的最高概率预测序列中的下一个单词。一般来说，当使用推理来得出最终答案时，可能只有一个正确答案。因此，温度应始终设置为 0。

参考

https://www.kaggle.com/whitepaper-prompt-engineering

https://github.com/anthropics/prompt-eng-interactive-tutorial