当前位置：首页 > news >正文

知识图谱问答召回机制-llm-graph-builder

news 来源：原创 2024/9/20 5:29:47

背景

以Neo4j开源的 llm-graph-builder （以下简称 LGB）为例，说明 graph + RAG的模式下，如何进行知识的召回操作。

原理说明

graph + RAG模式下，依旧保持了RAG的思想，使用了向量作为语义召回的手段。

在 LGB 中，对于用户提出的问题，系统首先会将问题进行 Embedding 操作，从而得到问题的向量表示。
使用问题的向量表示，去neo4j中召回相关的文档片段，并按照所属文档进行分组。
从召回的文档片段中，找出基于片段生成的实体，并根据实体关联的文本片段数进行排序，取前25个。
寻找从实体 e 出发，通过最多1步的关系（排除 HAS_ENTITY 和 PART_OF 关系）到达的路径，这些路径不能包含 Chunk 和 Document 类型的节点。

collect { OPTIONAL MATCH path=(e)(()-[rels:!HAS_ENTITY&!PART_OF]-()){0,1}(:!Chunk&!Document) RETURN path }

从上述实体中，找出实体的相关关系，并返回关系列表

// de-duplicate nodes and relationships across chunks
RETURN collect{ unwind paths as p unwind relationships(p) as r return distinct r} as rels,
collect{ unwind paths as p unwind nodes(p) as n return distinct n} as nodes, entities
}

将收集到的文本片段、实体、关系、节点按照一定的结构进行组合

// generate metadata and text components for chunks, nodes and relationships
WITH d, avg_score,[c IN chunks | c.chunk.text] AS texts, [c IN chunks | {id: c.chunk.id, score: c.score}] AS chunkdetails, apoc.coll.sort([n in nodes | coalesce(apoc.coll.removeAll(labels(n),['__Entity__'])[0],"") +":"+ 
n.id + (case when n.description is not null then " ("+ n.description+")" else "" end)]) as nodeTexts,apoc.coll.sort([r in rels // optional filter if we limit the node-set// WHERE startNode(r) in nodes AND endNode(r) in nodes | 
coalesce(apoc.coll.removeAll(labels(startNode(r)),['__Entity__'])[0],"") +":"+ 
startNode(r).id +
" " + type(r) + " " + 
coalesce(apoc.coll.removeAll(labels(endNode(r)),['__Entity__'])[0],"") +":" + endNode(r).id
]) as relTexts
, entities
// combine texts into response-textWITH d, avg_score,chunkdetails,
"Text Content:\n" +
apoc.text.join(texts,"\n----\n") +
"\n----\nEntities:\n"+
apoc.text.join(nodeTexts,"\n") +
"\n----\nRelationships:\n" +
apoc.text.join(relTexts,"\n")as text,entities

将组合后的数据，返回出来

RETURN text, avg_score as score, {length:size(text), source: COALESCE( CASE WHEN d.url CONTAINS "None" THEN d.fileName ELSE d.url END, d.fileName), chunkdetails: chunkdetails} AS metadata

以上就是 LGB 召回文档内容的过程。