当前位置: 首页 > news >正文

Qdrant官方快速入门和教程简化版

Qdrant官方快速入门和教程简化版

说明:

  • 首次发表日期:2024-08-28
  • Qdrant官方文档:https://qdrant.tech/documentation/

关于

阅读Qdrant一小部分的官方文档,并使用中文简化记录下,更多请阅读官方文档。

使用Docker本地部署Qdrant

docker pull qdrant/qdrant
docker run -d -p 6333:6333 -p 6334:6334 \-v $(pwd)/qdrant_storage:/qdrant/storage:z \qdrant/qdrant

默认配置下,所有的数据存储在./qdrant_storage

快速入门

安装qdrant-client包(python):

pip install qdrant-client

初始化客户端:

from qdrant_client import QdrantClientclient = QdrantClient(url="http://localhost:6333")

所有的向量数据(vector data)都存储在Qdrant Collection上。创建一个名为test_collection的collection,该collection使用dot product作为比较向量的指标。

from qdrant_client.models import Distance, VectorParamsclient.create_collection(collection_name="test_collection",vectors_config=VectorParams(size=4, distance=Distance.DOT),
)

添加带payload的向量。payload是与向量相关联的数据。

from qdrant_client.models import PointStructoperation_info = client.upsert(collection_name="test_collection",wait=True,points=[PointStruct(id=1, vector=[0.05, 0.61, 0.76, 0.74], payload={"city": "Berlin"}),PointStruct(id=2, vector=[0.19, 0.81, 0.75, 0.11], payload={"city": "London"}),PointStruct(id=3, vector=[0.36, 0.55, 0.47, 0.94], payload={"city": "Moscow"}),PointStruct(id=4, vector=[0.18, 0.01, 0.85, 0.80], payload={"city": "New York"}),PointStruct(id=5, vector=[0.24, 0.18, 0.22, 0.44], payload={"city": "Beijing"}),PointStruct(id=6, vector=[0.35, 0.08, 0.11, 0.44], payload={"city": "Mumbai"}),]
)print(operation_info)

运行一个查询:

search_result = client.query_points(collection_name="test_collection", query=[0.2, 0.1, 0.9, 0.7], limit=3
).pointsprint(search_result)

输出:

[{"id": 4,"version": 0,"score": 1.362,"payload": null,"vector": null},{"id": 1,"version": 0,"score": 1.273,"payload": null,"vector": null},{"id": 3,"version": 0,"score": 1.208,"payload": null,"vector": null}
]

添加一个过滤器:

from qdrant_client.models import Filter, FieldCondition, MatchValuesearch_result = client.query_points(collection_name="test_collection",query=[0.2, 0.1, 0.9, 0.7],query_filter=Filter(must=[FieldCondition(key="city", match=MatchValue(value="London"))]),with_payload=True,limit=3,
).pointsprint(search_result)

输出:

[{"id": 2,"version": 0,"score": 0.871,"payload": {"city": "London"},"vector": null}
]

教程

语义搜索入门

安装依赖:

pip install sentence-transformers

导入模块:

from qdrant_client import models, QdrantClient
from sentence_transformers import SentenceTransformer

使用all-MiniLM-L6-v2编码器作为embedding模型,embedding模型可以将raw data转化为embeddings)

encoder = SentenceTransformer("all-MiniLM-L6-v2")

添加数据集:

documents = [{"name": "The Time Machine","description": "A man travels through time and witnesses the evolution of humanity.","author": "H.G. Wells","year": 1895,},{"name": "Ender's Game","description": "A young boy is trained to become a military leader in a war against an alien race.","author": "Orson Scott Card","year": 1985,},{"name": "Brave New World","description": "A dystopian society where people are genetically engineered and conditioned to conform to a strict social hierarchy.","author": "Aldous Huxley","year": 1932,},{"name": "The Hitchhiker's Guide to the Galaxy","description": "A comedic science fiction series following the misadventures of an unwitting human and his alien friend.","author": "Douglas Adams","year": 1979,},{"name": "Dune","description": "A desert planet is the site of political intrigue and power struggles.","author": "Frank Herbert","year": 1965,},{"name": "Foundation","description": "A mathematician develops a science to predict the future of humanity and works to save civilization from collapse.","author": "Isaac Asimov","year": 1951,},{"name": "Snow Crash","description": "A futuristic world where the internet has evolved into a virtual reality metaverse.","author": "Neal Stephenson","year": 1992,},{"name": "Neuromancer","description": "A hacker is hired to pull off a near-impossible hack and gets pulled into a web of intrigue.","author": "William Gibson","year": 1984,},{"name": "The War of the Worlds","description": "A Martian invasion of Earth throws humanity into chaos.","author": "H.G. Wells","year": 1898,},{"name": "The Hunger Games","description": "A dystopian society where teenagers are forced to fight to the death in a televised spectacle.","author": "Suzanne Collins","year": 2008,},{"name": "The Andromeda Strain","description": "A deadly virus from outer space threatens to wipe out humanity.","author": "Michael Crichton","year": 1969,},{"name": "The Left Hand of Darkness","description": "A human ambassador is sent to a planet where the inhabitants are genderless and can change gender at will.","author": "Ursula K. Le Guin","year": 1969,},{"name": "The Three-Body Problem","description": "Humans encounter an alien civilization that lives in a dying system.","author": "Liu Cixin","year": 2008,},
]

将embedding数据存储在内存中:

client = QdrantClient(":memory:")

创建一个collection:

client.create_collection(collection_name="my_books",vectors_config=models.VectorParams(size=encoder.get_sentence_embedding_dimension(),  # Vector size is defined by used modeldistance=models.Distance.COSINE,),
)

上传数据:

client.upload_points(collection_name="my_books",points=[models.PointStruct(id=idx, vector=encoder.encode(doc["description"]).tolist(), payload=doc)for idx, doc in enumerate(documents)],
)

问一个问题:

hits = client.query_points(collection_name="my_books",query=encoder.encode("alien invasion").tolist(),limit=3,
).pointsfor hit in hits:print(hit.payload, "score:", hit.score)

输出:

{'name': 'The War of the Worlds', 'description': 'A Martian invasion of Earth throws humanity into chaos.', 'author': 'H.G. Wells', 'year': 1898} score: 0.570093257022374
{'name': "The Hitchhiker's Guide to the Galaxy", 'description': 'A comedic science fiction series following the misadventures of an unwitting human and his alien friend.', 'author': 'Douglas Adams', 'year': 1979} score: 0.5040468703143637
{'name': 'The Three-Body Problem', 'description': 'Humans encounter an alien civilization that lives in a dying system.', 'author': 'Liu Cixin', 'year': 2008} score: 0.45902943411768216

过滤以便缩窄查询:

hits = client.query_points(collection_name="my_books",query=encoder.encode("alien invasion").tolist(),query_filter=models.Filter(must=[models.FieldCondition(key="year", range=models.Range(gte=2000))]),limit=1,
).pointsfor hit in hits:print(hit.payload, "score:", hit.score)

输出:

{'name': 'The Three-Body Problem', 'description': 'Humans encounter an alien civilization that lives in a dying system.', 'author': 'Liu Cixin', 'year': 2008} score: 0.45902943411768216

简单的神经搜索

下载样本数据集:

wget https://storage.googleapis.com/generall-shared-data/startups_demo.json

安装SentenceTransformer等依赖库:

pip install sentence-transformers numpy pandas tqdm

导入模块:

from sentence_transformers import SentenceTransformer
import numpy as np
import json
import pandas as pd
from tqdm.notebook import tqdm

创建sentence encoder:

model = SentenceTransformer("all-MiniLM-L6-v2", device="cuda"
)  # or device="cpu" if you don't have a GPU

读取数据:

df = pd.read_json("./startups_demo.json", lines=True)

为每一个description创建embedding向量。encode内部会将输入切分为一个个batch,以便提高处理速度。

vectors = model.encode([row.alt + ". " + row.description for row in df.itertuples()],show_progress_bar=True,
)
vectors.shape
# > (40474, 384)

保存为npy文件:

np.save("startup_vectors.npy", vectors, allow_pickle=False)

启动docker服务

docker pull qdrant/qdrant
docker run -p 6333:6333 \-v $(pwd)/qdrant_storage:/qdrant/storage \qdrant/qdrant

创建Qdrant客户端

# Import client library
from qdrant_client import QdrantClient
from qdrant_client.models import VectorParams, Distanceclient = QdrantClient("http://localhost:6333")

创建collection,其中384是embedding模型(all-MiniLM-L6-v2)的输出维度。

if not client.collection_exists("startups"):client.create_collection(collection_name="startups",vectors_config=VectorParams(size=384, distance=Distance.COSINE),)

加载数据

fd = open("./startups_demo.json")# payload is now an iterator over startup data
payload = map(json.loads, fd)# Load all vectors into memory, numpy array works as iterable for itself.
# Other option would be to use Mmap, if you don't want to load all data into RAM
vectors = np.load("./startup_vectors.npy")

上传数据到Qdrant

client.upload_collection(collection_name="startups",vectors=vectors,payload=payload,ids=None,  # Vector ids will be assigned automaticallybatch_size=256,  # How many vectors will be uploaded in a single request?
)

创建neural_searcher.py文件:

from qdrant_client import QdrantClient
from sentence_transformers import SentenceTransformerclass NeuralSearcher:def __init__(self, collection_name):self.collection_name = collection_name# Initialize encoder modelself.model = SentenceTransformer("all-MiniLM-L6-v2", device="cpu")# initializa Qdrant clientself.qdrant_client = QdrantClient("http://localhost:6333")def search(self, text:str):# Convert text query into vectorvector = self.model.encode(text).tolist()# Use `vector` for search for closet vectors in the collectionsearch_result = self.qdrant_client.search(collection_name=self.collection_name,query_vector=vector,query_filter=None, # If you don't want any filters for nowlimit=5, # 5 the most closet results is enough)# `search_result` contains found vector ids with similarity scores along with stored payload# In this function you are interested in payload onlypayloads = [hit.payload for hit in search_result]return payloads

使用FastAPI部署:

pip install fastapi uvicorn
from qdrant_client import QdrantClient
from qdrant_client.models import Filter
from sentence_transformers import SentenceTransformerclass NeuralSearcher:def __init__(self, collection_name):self.collection_name = collection_name# Initialize encoder modelself.model = SentenceTransformer("all-MiniLM-L6-v2", device="cpu")# initializa Qdrant clientself.qdrant_client = QdrantClient("http://localhost:6333")def search(self, text:str):# Convert text query into vectorvector = self.model.encode(text).tolist()# Use `vector` for search for closet vectors in the collectionsearch_result = self.qdrant_client.search(collection_name=self.collection_name,query_vector=vector,query_filter=None, # If you don't want any filters for nowlimit=5, # 5 the most closet results is enough)# `search_result` contains found vector ids with similarity scores along with stored payload# In this function you are interested in payload onlypayloads = [hit.payload for hit in search_result]return payloadsdef search_in_berlin(self, text:str):# Convert text query into vectorvector = self.model.encode(text).tolist()city_of_interest = "Berlin"# Define a filter for citiescity_filter = Filter(**{"must": [{"key": "city", # Store city information in a field of the same name "match": { # This condition checks if payload field has the requested value"value": city_of_interest}}]})# Use `vector` for search for closet vectors in the collectionsearch_result = self.qdrant_client.query_points(collection_name=self.collection_name,query=vector,query_filter=city_filter,limit=5,).points# `search_result` contains found vector ids with similarity scores along with stored payload# In this function you are interested in payload onlypayloads = [hit.payload for hit in search_result]return payloads
from fastapi import FastAPIapp = FastAPI()# Create a neural searcher instance
neural_searcher = NeuralSearcher(collection_name="startups")@app.get("/api/search")
def search_startup(q: str):return {"result": neural_searcher.search(text=q)}@app.get("/api/search_in_berlin")
def search_startup_filter(q: str):return {"result": neural_searcher.search_in_berlin(text=q)}if __name__ == "__main__":import uvicornuvicorn.run(app, host="0.0.0.0", port=8001)

如果是在jupyter notebook中运行,则需要添加

import nest_asyncio
nest_asyncio.apply()

安装nest_asyncio:

pip install nest_asyncio

异步使用Qdrant

Qdrant原生支持async

from qdrant_client import modelsimport qdrant_client
import asyncioasync def main():client = qdrant_client.AsyncQdrantClient("localhost")# Create a collectionawait client.create_collection(collection_name="my_collection",vectors_config=models.VectorParams(size=4, distance=models.Distance.COSINE),)# Insert a vectorawait client.upsert(collection_name="my_collection",points=[models.PointStruct(id="5c56c793-69f3-4fbf-87e6-c4bf54c28c26",payload={"color": "red",},vector=[0.9, 0.1, 0.1, 0.5],),],)# Search for nearest neighborspoints = await client.query_points(collection_name="my_collection",query=[0.9, 0.1, 0.1, 0.5],limit=2,).points# Your async code using AsyncQdrantClient might be put here# ...asyncio.run(main())

相关文章:

  • 北京网站建设多少钱?
  • 辽宁网页制作哪家好_网站建设
  • 高端品牌网站建设_汉中网站制作
  • RocketMQ第5集
  • Flutter ListView滑动
  • noexcept关键字
  • 【通俗理解】Transformer哈希机制——序列数据的情感搅拌机
  • 基于SpringBoot的财务管理系统
  • 学习记录:js算法(十八): 反转字符串中的单词
  • FLUX 1 将像 Stable Diffusion 一样完整支持ControlNet组件
  • 文本分析之关键词提取(TF-IDF算法)
  • 数据库sqlite3
  • 4.4 bps 拯救小哈
  • flannel,etcd,docker
  • LeetCode 热题100-39 对称二叉树
  • uniapp vue3安装 uview-plus3+
  • 更高效、更灵活的策略回测新体验?这份白皮书请收好!
  • kali
  • 2018天猫双11|这就是阿里云!不止有新技术,更有温暖的社会力量
  • css布局,左右固定中间自适应实现
  • css的样式优先级
  • hadoop入门学习教程--DKHadoop完整安装步骤
  • Invalidate和postInvalidate的区别
  • java8-模拟hadoop
  • LeetCode29.两数相除 JavaScript
  • Mybatis初体验
  • Netty 4.1 源代码学习:线程模型
  • Netty 框架总结「ChannelHandler 及 EventLoop」
  • SpringBoot几种定时任务的实现方式
  • VirtualBox 安装过程中出现 Running VMs found 错误的解决过程
  • Vue.js-Day01
  • 计算机常识 - 收藏集 - 掘金
  • 前端临床手札——文件上传
  • 使用common-codec进行md5加密
  • scrapy中间件源码分析及常用中间件大全
  • 仓管云——企业云erp功能有哪些?
  • #数据结构 笔记一
  • $(selector).each()和$.each()的区别
  • ( 10 )MySQL中的外键
  • (12)Hive调优——count distinct去重优化
  • (C#)Windows Shell 外壳编程系列4 - 上下文菜单(iContextMenu)(二)嵌入菜单和执行命令...
  • (C++20) consteval立即函数
  • (leetcode学习)236. 二叉树的最近公共祖先
  • (Redis使用系列) Springboot 在redis中使用BloomFilter布隆过滤器机制 六
  • (web自动化测试+python)1
  • (超详细)2-YOLOV5改进-添加SimAM注意力机制
  • (附源码)spring boot校园健康监测管理系统 毕业设计 151047
  • (离散数学)逻辑连接词
  • (十六)串口UART
  • (微服务实战)预付卡平台支付交易系统卡充值业务流程设计
  • (一)Dubbo快速入门、介绍、使用
  • (译) 函数式 JS #1:简介
  • (杂交版)植物大战僵尸
  • .NET Core 通过 Ef Core 操作 Mysql
  • .net Signalr 使用笔记
  • .NET 将多个程序集合并成单一程序集的 4+3 种方法
  • .NET/C# 使用反射注册事件
  • .NET程序集编辑器/调试器 dnSpy 使用介绍