当前位置: 首页 > news >正文

GraphRAG的实践

好久没有体验新技术了,今天来玩一下GraphRAG

顾名思义,一种检索增强的方法,利用图谱来实现RAG

1.配置环境

conda  create -n GraphRAG python=3.11
conda activate  GraphRAG
pip install graphrag

2.构建GraphRAG 

mkdir -p ./ragtest/input
#这本书详细介绍了如何通过提示工程技巧来引导像ChatGPT这样的语言模型生成高质量的文本。
curl https://raw.githubusercontent.com/win4r/mytest/main/book.txt > ./ragtest/input/book.txt#初始化空间
python3 -m graphrag.index --init --root ./ragtest然后填写.env里面的内容,可以直接写openai的key,如下GRAPHRAG_API_KEY=sk-ZZvxAMzrl.....................或者可以写GRAPHRAG_API_KEY=ollama
1)如果是ollama的话
进入settings.yaml里面
# api_base: https://<instance>.openai.azure.com
取消注释,并改为 api_base: http://127.0.0.1:11434/v1
同时将model改为llama3(你自己的ollama模型)
2)用key的话,将模型改为model: gpt-3.5-turbo-1106
文档28行还有一个词嵌入模型,根据自己的选择更改
但是这个embeddings模型只能用openai的
如果上面用的是ollama的模型,这里要将api_base改一下,改为api_base: https://api.openai.com/v1
不然当进行到这一步的时候,会继承访问上面ollama设置的base——url,从而产生报错
#进行索引操作
python3 -m graphrag.index --root ./ragtest构建完成
encoding_model: cl100k_base
skip_workflows: []
llm:api_key: ${GRAPHRAG_API_KEY}type: openai_chat # or azure_openai_chatmodel: llama3model_supports_json: true # recommended if this is available for your model.# max_tokens: 4000# request_timeout: 180.0api_base: http://192.168.1.138:11434/v1# api_version: 2024-02-15-preview# organization: <organization_id># deployment_name: <azure_model_deployment_name># tokens_per_minute: 150_000 # set a leaky bucket throttle# requests_per_minute: 10_000 # set a leaky bucket throttle# max_retries: 10# max_retry_wait: 10.0# sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times# concurrent_requests: 25 # the number of parallel inflight requests that may be madeparallelization:stagger: 0.3# num_threads: 50 # the number of threads to use for parallel processingasync_mode: threaded # or asyncioembeddings:## parallelization: override the global parallelization settings for embeddingsasync_mode: threaded # or asynciollm:api_key: ${GRAPHRAG_API_KEY}type: openai_embedding # or azure_openai_embeddingmodel: text-embedding-3-smallapi_base: https://api.openai.com/v1# api_version: 2024-02-15-preview# organization: <organization_id># deployment_name: <azure_model_deployment_name># tokens_per_minute: 150_000 # set a leaky bucket throttle# requests_per_minute: 10_000 # set a leaky bucket throttle# max_retries: 10# max_retry_wait: 10.0# sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times# concurrent_requests: 25 # the number of parallel inflight requests that may be made# batch_size: 16 # the number of documents to send in a single request# batch_max_tokens: 8191 # the maximum number of tokens to send in a single request# target: required # or optionalchunks:size: 300overlap: 100group_by_columns: [id] # by default, we don't allow chunks to cross documentsinput:type: file # or blobfile_type: text # or csvbase_dir: "input"file_encoding: utf-8file_pattern: ".*\\.txt$"cache:type: file # or blobbase_dir: "cache"# connection_string: <azure_blob_storage_connection_string># container_name: <azure_blob_storage_container_name>storage:type: file # or blobbase_dir: "output/${timestamp}/artifacts"# connection_string: <azure_blob_storage_connection_string># container_name: <azure_blob_storage_container_name>reporting:type: file # or console, blobbase_dir: "output/${timestamp}/reports"# connection_string: <azure_blob_storage_connection_string># container_name: <azure_blob_storage_container_name>entity_extraction:## llm: override the global llm settings for this task## parallelization: override the global parallelization settings for this task## async_mode: override the global async_mode settings for this taskprompt: "prompts/entity_extraction.txt"entity_types: [organization,person,geo,event]max_gleanings: 0summarize_descriptions:## llm: override the global llm settings for this task## parallelization: override the global parallelization settings for this task## async_mode: override the global async_mode settings for this taskprompt: "prompts/summarize_descriptions.txt"max_length: 500claim_extraction:## llm: override the global llm settings for this task## parallelization: override the global parallelization settings for this task## async_mode: override the global async_mode settings for this task# enabled: trueprompt: "prompts/claim_extraction.txt"description: "Any claims or facts that could be relevant to information discovery."max_gleanings: 0community_report:## llm: override the global llm settings for this task## parallelization: override the global parallelization settings for this task## async_mode: override the global async_mode settings for this taskprompt: "prompts/community_report.txt"max_length: 2000max_input_length: 8000cluster_graph:max_cluster_size: 10embed_graph:enabled: false # if true, will generate node2vec embeddings for nodes# num_walks: 10# walk_length: 40# window_size: 2# iterations: 3# random_seed: 597832umap:enabled: false # if true, will generate UMAP embeddings for nodessnapshots:graphml: falseraw_entities: falsetop_level_nodes: falselocal_search:# text_unit_prop: 0.5# community_prop: 0.1# conversation_history_max_turns: 5# top_k_mapped_entities: 10# top_k_relationships: 10# max_tokens: 12000global_search:# max_tokens: 12000# data_max_tokens: 12000# map_max_tokens: 1000# reduce_max_tokens: 2000

3. 全局检索和本地检索

python3 -m graphrag.query \
--root ./ragtest \
--method global \
"show me some Prompts about Interpretable Soft Prompts."python3 -m graphrag.query \
--root ./ragtest \
--method local \
"show me some Prompts about Knowledge Generation."

4.可视化

#pip3 install chainlitimport chainlit as cl
import subprocess
import shlex@cl.on_chat_start
def start():cl.user_session.set("history", [])@cl.on_message
async def main(message: cl.Message):history = cl.user_session.get("history")# 从 Message 对象中提取文本内容query = message.content# 构建命令cmd = ["python3", "-m", "graphrag.query","--root", "./ragtest","--method", "local",]# 安全地添加查询到命令中cmd.append(shlex.quote(query))# 运行命令并捕获输出try:result = subprocess.run(cmd, capture_output=True, text=True, check=True)output = result.stdout# 提取 "SUCCESS: Local Search Response:" 之后的内容response = output.split("SUCCESS: Local Search Response:", 1)[-1].strip()history.append((query, response))cl.user_session.set("history", history)await cl.Message(content=response).send()except subprocess.CalledProcessError as e:error_message = f"An error occurred: {e.stderr}"await cl.Message(content=error_message).send()if __name__ == "__main__":cl.run()

相关文章:

  • 北京网站建设多少钱?
  • 辽宁网页制作哪家好_网站建设
  • 高端品牌网站建设_汉中网站制作
  • django实现用户的注册、登录、注销功能
  • jenkins+gitlab+harbor+maven自动化容器部署
  • Eureka基本概念
  • 【NLP实战】基于TextCNN的新闻文本分类
  • 5G mmWave PAAM 开发平台
  • Python爬虫实战 | 爬取携程网景区评论|美食推荐|景点列表数据
  • LeetCode岛屿的最大面积(深度搜索)/什么是深搜,简单案例回顾图用邻接表实现图的深度优先遍历。
  • 深度学习入门——与学习相关的技巧
  • 学习记录--GPT
  • QT获取电脑网卡IP等信息
  • Spring boot 运行环境搭建之Spring Tools 4 for Eclipse
  • STM32、Spring Boot、MQTT和React Native:智能停车管理系统的全栈开发详解(附代码示例)
  • react-draft-wysiwyg API
  • Nacos 服务发现(订阅)源码分析(服务端)
  • 数据仓库事实表
  • hexo+github搭建个人博客
  • [译]Python中的类属性与实例属性的区别
  • “寒冬”下的金三银四跳槽季来了,帮你客观分析一下局面
  • Android交互
  • iOS 系统授权开发
  • jQuery(一)
  • Less 日常用法
  • Linux各目录及每个目录的详细介绍
  • Lsb图片隐写
  • Python学习之路13-记分
  • QQ浏览器x5内核的兼容性问题
  • redis学习笔记(三):列表、集合、有序集合
  • tab.js分享及浏览器兼容性问题汇总
  • 给初学者:JavaScript 中数组操作注意点
  • 基于组件的设计工作流与界面抽象
  • 设计模式 开闭原则
  • 再次简单明了总结flex布局,一看就懂...
  • 哈罗单车融资几十亿元,蚂蚁金服与春华资本加持 ...
  • 交换综合实验一
  • ​​​【收录 Hello 算法】9.4 小结
  • ​【原创】基于SSM的酒店预约管理系统(酒店管理系统毕业设计)
  • ​草莓熊python turtle绘图代码(玫瑰花版)附源代码
  • # 手柄编程_北通阿修罗3动手评:一款兼具功能、操控性的电竞手柄
  • # 数仓建模:如何构建主题宽表模型?
  • # 学号 2017-2018-20172309 《程序设计与数据结构》实验三报告
  • #多叉树深度遍历_结合深度学习的视频编码方法--帧内预测
  • #我与Java虚拟机的故事#连载10: 如何在阿里、腾讯、百度、及字节跳动等公司面试中脱颖而出...
  • (06)金属布线——为半导体注入生命的连接
  • (cljs/run-at (JSVM. :browser) 搭建刚好可用的开发环境!)
  • (vue)el-cascader级联选择器按勾选的顺序传值,摆脱层级约束
  • (ZT)出版业改革:该死的死,该生的生
  • (层次遍历)104. 二叉树的最大深度
  • (创新)基于VMD-CNN-BiLSTM的电力负荷预测—代码+数据
  • (附源码)springboot 基于HTML5的个人网页的网站设计与实现 毕业设计 031623
  • (附源码)SSM环卫人员管理平台 计算机毕设36412
  • (附源码)ssm学生管理系统 毕业设计 141543
  • (附源码)计算机毕业设计ssm高校《大学语文》课程作业在线管理系统
  • (深入.Net平台的软件系统分层开发).第一章.上机练习.20170424
  • (已解决)什么是vue导航守卫
  • (转)iOS字体