当前位置: 首页 > news >正文

GraphRAG+ollama+LM Studio+chainlit

这里我们进一步尝试将embedding模型也换为本地的,同时熟悉一下流程和学一些新的东西

1.环境还是用之前的,这里我们先下载LLM  

然后你会在下载nomic模型的时候崩溃,因为无法搜索,无法下载

解决办法如下
lm studio 0.2.24国内下载模型_lm studio 国内源-CSDN博客

按照上面的教程依旧无法下载模型,但是可以搜索了,没什么用

直接hugging face下载,然后导入llm  models文件夹

C:\Users\Administrator\.cache\lm-studio\models

注意有格式要求

C:\Users\Administrator\.cache\lm-studio\models\Publisher\Repository
将模型放在这个文件夹里才能被识别,然后加在模型

然后修改配置

settings.yaml

##我这里用到是我的另一个电脑运行LLM Studio  ,所以IP是127
encoding_model: cl100k_base
skip_workflows: []
llm:api_key: ollamatype: openai_chat # or azure_openai_chatmodel: llama3model_supports_json: true # recommended if this is available for your model.# max_tokens: 4000# request_timeout: 180.0api_base: http://127.0.0.1:11434/v1# api_version: 2024-02-15-preview# organization: <organization_id># deployment_name: <azure_model_deployment_name># tokens_per_minute: 150_000 # set a leaky bucket throttle# requests_per_minute: 10_000 # set a leaky bucket throttle# max_retries: 10# max_retry_wait: 10.0# sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times# concurrent_requests: 25 # the number of parallel inflight requests that may be madeparallelization:stagger: 0.3# num_threads: 50 # the number of threads to use for parallel processingasync_mode: threaded # or asyncioembeddings:## parallelization: override the global parallelization settings for embeddingsasync_mode: threaded # or asynciollm:api_key: lm-studiotype: openai_embedding # or azure_openai_embeddingmodel: Publisher/Repository/nomic-embed-text-v1.5.Q5_K_M.ggufapi_base: http://192.168.1.127:1234/v1# api_version: 2024-02-15-preview# organization: <organization_id># deployment_name: <azure_model_deployment_name># tokens_per_minute: 150_000 # set a leaky bucket throttle# requests_per_minute: 10_000 # set a leaky bucket throttle# max_retries: 10# max_retry_wait: 10.0# sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times# concurrent_requests: 25 # the number of parallel inflight requests that may be made# batch_size: 16 # the number of documents to send in a single request# batch_max_tokens: 8191 # the maximum number of tokens to send in a single request# target: required # or optionalchunks:size: 300overlap: 100group_by_columns: [id] # by default, we don't allow chunks to cross documentsinput:type: file # or blobfile_type: text # or csvbase_dir: "input"file_encoding: utf-8file_pattern: ".*\\.txt$"cache:type: file # or blobbase_dir: "cache"# connection_string: <azure_blob_storage_connection_string># container_name: <azure_blob_storage_container_name>storage:type: file # or blobbase_dir: "output/${timestamp}/artifacts"# connection_string: <azure_blob_storage_connection_string># container_name: <azure_blob_storage_container_name>reporting:type: file # or console, blobbase_dir: "output/${timestamp}/reports"# connection_string: <azure_blob_storage_connection_string># container_name: <azure_blob_storage_container_name>entity_extraction:## llm: override the global llm settings for this task## parallelization: override the global parallelization settings for this task## async_mode: override the global async_mode settings for this taskprompt: "prompts/entity_extraction.txt"entity_types: [organization,person,geo,event]max_gleanings: 0summarize_descriptions:## llm: override the global llm settings for this task## parallelization: override the global parallelization settings for this task## async_mode: override the global async_mode settings for this taskprompt: "prompts/summarize_descriptions.txt"max_length: 500claim_extraction:## llm: override the global llm settings for this task## parallelization: override the global parallelization settings for this task## async_mode: override the global async_mode settings for this task# enabled: trueprompt: "prompts/claim_extraction.txt"description: "Any claims or facts that could be relevant to information discovery."max_gleanings: 0community_report:## llm: override the global llm settings for this task## parallelization: override the global parallelization settings for this task## async_mode: override the global async_mode settings for this taskprompt: "prompts/community_report.txt"max_length: 2000max_input_length: 8000cluster_graph:max_cluster_size: 10embed_graph:enabled: false # if true, will generate node2vec embeddings for nodes# num_walks: 10# walk_length: 40# window_size: 2# iterations: 3# random_seed: 597832umap:enabled: false # if true, will generate UMAP embeddings for nodessnapshots:graphml: falseraw_entities: falsetop_level_nodes: falselocal_search:# text_unit_prop: 0.5# community_prop: 0.1# conversation_history_max_turns: 5# top_k_mapped_entities: 10# top_k_relationships: 10# max_tokens: 12000global_search:# max_tokens: 12000# data_max_tokens: 12000# map_max_tokens: 1000# reduce_max_tokens: 2000# concurrency: 32

pdf转markdown,markdown转txt

#测试文档 https://github.com/win4r/mytest/blob/main/book.pdfpip install marker-pdfmarker_single ./book.pdf ./pdftxt --batch_multiplier 2 --max_pages 60 --langs English#markdown转txt
python markdown_to_text.py book.md book.txt

相关文章:

  • 北京网站建设多少钱?
  • 辽宁网页制作哪家好_网站建设
  • 高端品牌网站建设_汉中网站制作
  • 怎么剪辑音频文件?4款适合新的音频剪辑软件
  • Spring Boot项目中使用MyBatis Generator (MBG) 自动生成Mapper文件
  • LinuxShell编程2——shell搭建Discuzz论坛网站
  • 框架设计MVP
  • Adobe国际认证详解-网页设计认证专家行业应用场景解析
  • 数据仓库中事实表设计的关键步骤解析
  • 【Langchain大语言模型开发教程】模型、提示和解析
  • 微服务实战系列之玩转Docker(一)
  • # Redis 入门到精通(七)-- redis 删除策略
  • [SUCTF 2019]EasySQL1
  • 【C语言ffmpeg】打开第一个视频
  • Linux的热插拔UDEV机制和守护进程
  • ubuntu上通过修改grub启动参数,将串口重定向到sol
  • SQLite 事务
  • 实时吸烟检测系统:基于深度学习与YOLO模型的完整实现
  • JavaScript设计模式与开发实践系列之策略模式
  • webpack4 一点通
  • 关于 Cirru Editor 存储格式
  • 记录:CentOS7.2配置LNMP环境记录
  • 精益 React 学习指南 (Lean React)- 1.5 React 与 DOM
  • 可能是历史上最全的CC0版权可以免费商用的图片网站
  • 码农张的Bug人生 - 初来乍到
  • 驱动程序原理
  • 山寨一个 Promise
  • 深入 Nginx 之配置篇
  • 数据仓库的几种建模方法
  • 数据科学 第 3 章 11 字符串处理
  • 京东物流联手山西图灵打造智能供应链,让阅读更有趣 ...
  • ​linux启动进程的方式
  • ​ssh免密码登录设置及问题总结
  • #define用法
  • (42)STM32——LCD显示屏实验笔记
  • (M)unity2D敌人的创建、人物属性设置,遇敌掉血
  • (NO.00004)iOS实现打砖块游戏(十二):伸缩自如,我是如意金箍棒(上)!
  • (安卓)跳转应用市场APP详情页的方式
  • (附源码)python旅游推荐系统 毕业设计 250623
  • (免费领源码)Python#MySQL图书馆管理系统071718-计算机毕业设计项目选题推荐
  • (十七)devops持续集成开发——使用jenkins流水线pipeline方式发布一个微服务项目
  • (算法)前K大的和
  • (一)utf8mb4_general_ci 和 utf8mb4_unicode_ci 适用排序和比较规则场景
  • (转)mysql使用Navicat 导出和导入数据库
  • (自用)仿写程序
  • .DFS.
  • .NET C# 使用 SetWindowsHookEx 监听鼠标或键盘消息以及此方法的坑
  • .Net 垃圾回收机制原理(二)
  • .NET+WPF 桌面快速启动工具 GeekDesk
  • .net6Api后台+uniapp导出Excel
  • .NET教程 - 字符串 编码 正则表达式(String Encoding Regular Express)
  • @staticmethod和@classmethod的作用与区别
  • @SuppressWarnings注解
  • @拔赤:Web前端开发十日谈
  • @德人合科技——天锐绿盾 | 图纸加密软件有哪些功能呢?
  • [2019.3.20]BZOJ4573 [Zjoi2016]大森林
  • [30期] 我的学习方法
  • [Android]将私钥(.pk8)和公钥证书(.pem/.crt)合并成一个PKCS#12格式的密钥库文件