当前位置: 首页 > news >正文

TemplateHit中提取query和hit比对上序列索引的映射字典

template_hits(Sequence[TemplateHit]数据格式)来自结构数据库搜索结果 python运行hhsearch二进制命令的包装器类  映射索引计算:TemplateHit 中含有 indices_query,需要换算成在原始query序列中的index,hit 中indices_hit 需要减去最小index(-1 gap 除外)

import pickle 
import dataclasses
from typing import Optional, List, Sequence, Mapping@dataclasses.dataclass(frozen=True)
class TemplateHit:"""Class representing a template hit."""index: intname: straligned_cols: intsum_probs: Optional[float]query: strhit_sequence: strindices_query: List[int]indices_hit: List[int]### 读入Sequence[TemplateHit]数据
with open('test_pdb_hits.pkl', 'rb') as file:# 使用 pickle.load 从文件中加载对象test_pdb_hits = pickle.load(file)#test_pdb_hits.pkl由python运行hhsearch二进制命令的包装器类 的结果 template_hits 保存得到
#import pickle
#with open('test_pdb_hits.pkl', 'wb') as file:
#  pickle.dump(template_hits, file)def build_query_to_hit_index_mapping(hit_query_sequence: str,hit_sequence: str,indices_hit: Sequence[int],indices_query: Sequence[int],original_query_sequence: str) -> Mapping[int, int]:"""Gets mapping from indices in original query sequence to indices in the hit.hit_query_sequence and hit_sequence are two aligned sequences containing gapcharacters. hit_query_sequence contains only the part of the original querysequence that matched the hit. When interpreting the indices from the .hhr, weneed to correct for this to recover a mapping from original query sequence tothe hit sequence.Args:hit_query_sequence: The portion of the query sequence that is in the .hhrhithit_sequence: The portion of the hit sequence that is in the .hhrindices_hit: The indices for each aminoacid relative to the hit sequenceindices_query: The indices for each aminoacid relative to the original querysequenceoriginal_query_sequence: String describing the original query sequence.Returns:Dictionary with indices in the original query sequence as keys and indicesin the hit sequence as values."""# If the hit is empty (no aligned residues), return empty mappingif not hit_query_sequence:return {}# Remove gaps and find the offset of hit.query relative to original query.hhsearch_query_sequence = hit_query_sequence.replace('-', '')hit_sequence = hit_sequence.replace('-', '')hhsearch_query_offset = original_query_sequence.find(hhsearch_query_sequence)print(f"hhsearch_query_offset:{hhsearch_query_offset}")# Index of -1 used for gap characters. Subtract the min index ignoring gaps.min_idx = min(x for x in indices_hit if x > -1)fixed_indices_hit = [x - min_idx if x > -1 else -1 for x in indices_hit]print(f"fixed_indices_hit:{fixed_indices_hit}")min_idx = min(x for x in indices_query if x > -1)fixed_indices_query = [x - min_idx if x > -1 else -1 for x in indices_query]print(f"fixed_indices_query:{fixed_indices_query}")# Zip the corrected indices, ignore case where both seqs have gap characters.mapping = {}for q_i, q_t in zip(fixed_indices_query, fixed_indices_hit):if q_t != -1 and q_i != -1:if (q_t >= len(hit_sequence) orq_i + hhsearch_query_offset >= len(original_query_sequence)):continuemapping[q_i + hhsearch_query_offset] = q_treturn mappinghit = test_pdb_hits[0]
input_fasta_file = 'Q94K49.fasta'
## 从fasta文件提取 query_sequence(str格式)
query_sequence = ""
with open(input_fasta_file) as f:for line in f.readlines():if line.startswith(">"):continuequery_sequence += line.strip()print(f"hit.query:{hit.query}")
print(f"hit.hit_sequence:{hit.hit_sequence}")
print(f"hit.indices_hit:{hit.indices_hit}")
print(f"hit.indices_query:{hit.indices_query}")
print(f"query_sequence:{query_sequence}")##query和hit序列比对上的氨基酸在各自多肽链上索引的对应字典
mapping = build_query_to_hit_index_mapping(hit.query, hit.hit_sequence, hit.indices_hit, hit.indices_query,query_sequence)
print(mapping)

相关文章:

  • 用户运营:如何搭建用户分析体系
  • Centos 7 在线安装(RPM) PostgreSQL 14 15 16
  • ChatGPT 使用入门
  • C++学习 --函数对象
  • AWS EC2 如何 使用 SSM会话管理器登陆
  • python——第十五天
  • boa服务器移植
  • 关闭vscode打开的本地服务器端口
  • Linux系统iptables扩展
  • Android系统分析
  • HarmonyOS(十)——@Styles装饰器和stateStyles(多态样式)双剑合并
  • QT linux下应用程序打包
  • 序列化-Serializable和Parcelable
  • MySQL之JDBC
  • 内网渗透之如何批量PTH获取主机权限?
  • [分享]iOS开发 - 实现UITableView Plain SectionView和table不停留一起滑动
  • Android 控件背景颜色处理
  • GraphQL学习过程应该是这样的
  • iOS动画编程-View动画[ 1 ] 基础View动画
  • Javascript编码规范
  • java概述
  • JS专题之继承
  • MobX
  • PHP变量
  • Redash本地开发环境搭建
  • Redux系列x:源码分析
  • Service Worker
  • 一天一个设计模式之JS实现——适配器模式
  • 云栖大讲堂Java基础入门(三)- 阿里巴巴Java开发手册介绍
  • 追踪解析 FutureTask 源码
  • 1.Ext JS 建立web开发工程
  • ​LeetCode解法汇总2670. 找出不同元素数目差数组
  • ###STL(标准模板库)
  • #我与Java虚拟机的故事#连载12:一本书带我深入Java领域
  • (1)(1.11) SiK Radio v2(一)
  • (C语言版)链表(三)——实现双向链表创建、删除、插入、释放内存等简单操作...
  • (M)unity2D敌人的创建、人物属性设置,遇敌掉血
  • (NO.00004)iOS实现打砖块游戏(十二):伸缩自如,我是如意金箍棒(上)!
  • (Redis使用系列) Springboot 实现Redis 同数据源动态切换db 八
  • (动态规划)5. 最长回文子串 java解决
  • (规划)24届春招和25届暑假实习路线准备规划
  • (七)c52学习之旅-中断
  • (十七)Flask之大型项目目录结构示例【二扣蓝图】
  • (算法)Game
  • (五)c52学习之旅-静态数码管
  • (幽默漫画)有个程序员老公,是怎样的体验?
  • .describe() python_Python-Win32com-Excel
  • .net core 6 集成 elasticsearch 并 使用分词器
  • .NET Core SkiaSharp 替代 System.Drawing.Common 的一些用法
  • .net core 控制台应用程序读取配置文件app.config
  • .net6解除文件上传限制。Multipart body length limit 16384 exceeded
  • .NET连接MongoDB数据库实例教程
  • .net网站发布-允许更新此预编译站点
  • @ModelAttribute 注解
  • [ 云计算 | AWS ] AI 编程助手新势力 Amazon CodeWhisperer:优势功能及实用技巧