问题解决:Problem exceeding maximum token in azure openai (with java)


I'm doing a chat that returns queries based on the question you ask it in reference to a specific database. For this I use azure openai and Java in Spring Boot.

我正在开发一个聊天功能,该功能根据您针对特定数据库的提问返回查询结果。为此,我使用了Azure OpenAI和Spring Boot中的Java。

My problem comes here:

How can I make the AI remember the previous questions without passing the context back to it (what I want to do is greatly reduce the consumption of tokens, since depending on what it asks, if the question contains a keyword, for example 'users', what I do is pass in the context the information in this table that is huge (name of the fields, type of data and description) so when you have several questions the use of tokens rises to more than 10,000))


I can't show all the code since it's a project for my company.


What im currently doing is adding to the context the referenced table and the principal context(you are a based SQL chat...). And for the chat to remember, I have tried to save the history in java and pass the context history again(but this exceed the tokens pretty fast)


This is what I'm currently doing (no remembering from the AI):


chatMessages.add(new ChatMessage(ChatRole.SYSTEM, context));chatMessages.add(new ChatMessage(ChatRole.USER, question));ChatCompletions chatCompletions = client.getChatCompletions(deploymentOrModelId, new ChatCompletionsOptions(chatMessages));


As far as I know, there is no way to make the LLM (Azure OpenAI in this case) remember your context cheaply, as you said, sending context (and a huge chunk of it) on each call gets pricy really fast. That been said, you could change the approach and try other techniques to mimic that the AI has memory like summarizing the previous questions and send that as content (instead of a long string with 20 questions/answers, you send a short summary of what the user has been asking for. it will keep your prompt short and kind of "aware" of the conversation.

据我所知,确实没有便宜的方法让大型语言模型(在这种情况下是Azure OpenAI)记住上下文,正如您所说,每次调用时发送上下文(特别是大量的上下文)会很快变得昂贵。话虽如此,您可以改变方法并尝试其他技术来模拟AI具有记忆的功能,比如总结之前的问题并将其作为内容发送(而不是发送包含20个问答的长字符串,您发送一个用户一直在询问的内容的简短摘要)。这将使您的提示保持简短,并使AI对对话保持“意识”。

There are also conversation buffers (keeping the chat history in memory and send it to de llm each time as you did) but it gets long pretty fast, for that you could configure a buffer window (limiting the memory of the conversation to the last 3 questions for example, that should help keep the token count manageable).


There are several ways to manage this but there is no "perfect memory" as far as I know, not one the is worth paying. If you could tell us a bit more on how good the bot memory needs to be or the specific use case, maybe we can be more precise. Good luck!



