当前位置：首页 > news >正文

Gemma 2 2B：针对小型 LLM 提示工程关注点

news 来源：原创 2024/9/20 6:44:36

大型语言模型的有效性在很大程度上取决于我们给出的指令。提示工程是指以从 LLM 获得最佳输出的方式设计问题的过程。这是实现基于 LLM 的功能的关键步骤。提示工程是一个迭代过程。如果尝试过不同的 LLM，那么发现为了获得更好的结果，需要微调提示。

这同样适用于不同尺寸的模型。

由大型 LLM（例如 Gemini 或 ChatGPT）提供支持的聊天界面通常只需极少的提示工作即可生成令人满意的结果。但是，在使用未经微调的默认较小 LLM 时，需要调整相应的方法。

较小的 LLM 功能较少，可供选择的信息池也较小。
在这里插入图片描述

小型LLMs的定义与应用场景

定义“小型LLMs”

在人工智能领域，语言模型（LLMs）根据其参数的数量可以分为不同规模。小型LLMs通常指的是参数数量在数百万到数十亿范围内的模型。这些模型相较于拥有数十亿甚至数千亿参数的大型模型，虽然在处理能力和知识广度上有所限制，但它们在资源消耗和部署灵活性上具有优势。

参数范围

小型LLMs的参数范围通常在几百万到30亿以下。这个参数量级的模型能够实现基本的语言理解和生成任务，但可能在复杂度和准确性上不如大型模型。

应用场景

小型LLMs的典型应用场景包括：

设备端/浏览器中的生成式AI：例如，使用Gemma 2B模型与MediaPipe的LLM Inference API结合，即使在仅支持CPU的设备上也能运行。
自定义服务器端生成式AI：开发者可以在自己的服务器上部署如Gemma 2B、Gemma 7B或Gemma 27B等小型模型，并根据需要进行微调。

开始使用小型LLMs

大型与小型LLMs的差异

大型LLMs如Gemini或ChatGPT通常能够通过简单的提示生成满意的结果，而小型LLMs则需要更精细的提示设计来实现最佳效果。小型LLMs在信息处理能力和上下文理解上相对有限。

设计详细、具体的提示

为小型LLMs设计提示时，需要提供更多的上下文信息和具体的格式要求。这有助于模型更准确地理解任务需求并生成合适的输出。
提供上下文和精确的格式指令，详细提示示例
例如，当需要根据用户评价给出产品评分时，可以提供如下的提示模板：

Based on a user review, provide a product rating as an integer between 1 and 5.Only output the integer. Review: "${review}"
Rating:

一次、少数和多示例以及思维链提示技术

通过提供具体的评分示例，可以帮助模型更好地理解评分标准。例如，展示积极评价与高评分的对应关系，以及消极评价与低评分的对应关系。
在这里插入图片描述

思维链提示技术通过展示问题解决的步骤来引导模型进行逻辑推理，这有助于提高小型LLMs在复杂任务上的表现。举例来说：

Analyze a product review, and then based on your analysis give me the corresponding rating (integer). The rating should be an integer between 1 and 5. 1 is the worst rating, and 5 is the best rating. A strongly dissatisfied review that only mentions issues should have a rating of 1 (worst). A strongly satisfied review that only mentions positives and upsides should have a rating of 5 (best). Be opinionated. Use the full range of possible ratings (1 to 5). \n\n \n\n Here are some examples of reviews and their corresponding analyses and ratings: \n\n Review: 'Stylish and functional. Not sure how it'll handle rugged outdoor use, but it's perfect for urban exploring.' Analysis: The reviewer appreciates the product's style and basic functionality. They express some uncertainty about its ruggedness but overall find it suitable for their intended use, resulting in a positive, but not top-tier rating. Rating (integer): 4 \n\n Review: 'It's a solid backpack at a decent price. Does the job, but nothing particularly amazing about it.' Analysis: This reflects an average opinion. The backpack is functional and fulfills its essential purpose. However, the reviewer finds it unremarkable and lacking any standout features deserving of higher praise. Rating (integer): 3 \n\n Review: 'The waist belt broke on my first trip! Customer service was unresponsive too. Would not recommend.' Analysis: A serious product defect and poor customer service experience naturally warrants the lowest possible rating. The reviewer is extremely unsatisfied with both the product and the company. Rating (integer): 1 \n\n Review: 'Love how many pockets and compartments it has. Keeps everything organized on long trips. Durable too!' Analysis: The enthusiastic review highlights specific features the user loves (organization and durability), indicating great satisfaction with the product. This justifies the highest rating. Rating (integer): 5 \n\n Review: 'The straps are a bit flimsy, and they started digging into my shoulders under heavy loads.' Analysis: While not a totally negative review, a significant comfort issue leads the reviewer to rate the product poorly. The straps are a key component of a backpack, and their failure to perform well under load is a major flaw. Rating (integer): 1 \n\n Now, here is the review you need to assess: \n Review: "${review}" \n;