当前位置: 首页 > news >正文

nova VirtualInterfaceCreateException (by quqi99)

作者:张华 发表于:2022-09-01
版权声明:可以任意转载,转载时请务必以超链接形式标明文章原始出处和作者信息及本版权声明

问题

虚机有时候会报下列错误:

nova.exception.VirtualInterfaceCreateException: Virtual Interface creation failed
...
2022-07-19 13:50:32.084 147039 WARNING nova.virt.libvirt.driver [req-7b6da117-d40c-4bac-9f82-50c4266e1617 66d3188e9f24466f8d9c3905f178d12a ca6332100f1d42e4aa94aa2c37f243e4 - d476d9f579154d49961c29942a76d1c0 d476d9f579154d49961c29942a76d1c0] [instance: 6fb0fc5d-4fa3-435c-9f60-b9953d11adb7] Timeout waiting for [('network-vif-plugged', '864b25d9-276b-4644-916a-7637762b93e2')] for instance with vm_state building and task_state spawning.: eventlet.timeout.Timeout: 300 seconds

初步分析

没问题的虚机有下列日志:

Provisioning for port 2e96ade5-3a66-4045-b631-597995c07d5b completed by entity DHCP. provisioning_complete /usr/lib/python3/dist-packages/neutron/db/provisioning_blocks.py:133
Provisioning for port 2e96ade5-3a66-4045-b631-597995c07d5b completed by entity L2. provisioning_complete /usr/lib/python3/dist-packages/neutron/db/provisioning_blocks.py:133

有问题的虚机有下列日志, 它比上面的少了’completed by entity L2. provisioning_complete '.

For every failed VM (due to the vif plugged event timeout) neutron server log contain only following
Provisioning for port 864b25d9-276b-4644-916a-7637762b93e2 completed by entity DHCP. provisioning_complete /usr/lib/python3/dist-packages/neutron/db/provisioning_blocks.py:133

这样它就不会像下列这样将port设置为ACTIVE(https://docs.openstack.org/neutron/latest/contributor/internals/provisioning_blocks.html), 然后会引起nova端报的超时问题.

Transition to ACTIVE for port object 864b25d9-276b-4644-916a-7637762b93e2 will not be triggered until provisioned by entity L2. add_provisioning_component /usr/lib/python3/dist-packages/neutron/db/provisioning_blocks.py:73

neutron-l2-agent有时会花47分钟来完成一个循环.

2022-08-04 19:24:53.758 95691 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-5184b270-3ac4-4777-a974-8d629599cd88 - - - - -] Agent rpc_loop - iteration:2935733 started
2022-08-04 20:11:40.496 95691 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-5184b270-3ac4-4777-a974-8d629599cd88 - - - - -] Agent rpc_loop - iteration:2935733 completed. Processed ports statistics: {'regular': {'added': 0, 'updated': 2, 'removed': 2}}. Elapsed:2806.738

下列脚本能将所有neutron-l2-agent的循环时间给列出来(下面只贴出了部分输出, 它显示有时候循环花很长时间):

$ for i in `seq 2 22`; do echo "neutron-openvswitch-agent.log.$i.gz: " ; zgrep "iteration.* completed" neutron-openvswitch-agent.log.$i.gz | awk '{ print $(NF) }' | cut -d ":" -f2 | sort -gr | head -n 6; done
neutron-openvswitch-agent.log.2.gz:
...
neutron-openvswitch-agent.log.5.gz:
2806.738
2803.673
1410.152
1410.006
925.405
925.316

在上面2022-08-04 19:24:53 - 2022-08-04 20:11:40这段时间也都能看到所报的错误,从而验证所报的错误应该就是这个循环造成的.

$zgrep "Timeout waiting for \[('network-vif-plugged'" nova-compute.log.5.gz                 
2022-08-04 19:30:54.076 147039 WARNING nova.virt.libvirt.driver [req-acfb3bd2-565f-48e7-91e8-f109446895c1 f7ca0fcc30cf47c0b60b260bab9dd7bd e14e421176884fb492a71e1c9be13fa3 - 6113214bdbed491c880c23149c25b7cb 6113214bdbed491c880c23149c25b7cb] [instance: 94d4ab46-799d-44fc-9991-9a57ec18e950] Timeout waiting for [('network-vif-plugged', '43b7f269-0eae-44c4-990f-53c98743e12b')] for instance with vm_state building and task_state spawning.: eventlet.timeout.Timeout: 300 seconds
...
2022-08-04 20:10:06.211 147039 WARNING nova.virt.libvirt.driver [req-5b035412-a797-453a-b3e8-098dead294dd f7ca0fcc30cf47c0b60b260bab9dd7bd e14e421176884fb492a71e1c9be13fa3 - 6113214bdbed491c880c23149c25b7cb 6113214bdbed491c880c23149c25b7cb] [instance: 19da6ffc-dcc1-4620-b3b2-2018a3355b8b] Timeout waiting for [('network-vif-plugged', '71190281-53ce-4558-82cb-0240e71b9245')] for instance with vm_state building and task_state spawning.: eventlet.timeout.Timeout: 300 seconds

进一步分析 - focus在neutron-l2-agent端

这段时间neutron-l2-agent在做什么呢?它在更新SG.

2022-07-28 21:10:34.988 95691 INFO neutron.agent.securitygroups_rpc [req-0be98a3b-1674-486c-aeb4-67d49e413757 f7ca0fcc30cf47c0b60b260bab9dd7bd e14e421176884fb492a71e1c9be13fa3 - - -] Security group member upd
ated {'1f4da925-7e78-4e7c-ac07-8eae09fc2fda', '4de2b2bf-5302-4cf4-96de-949ea78ca8eb'}

通过下列方式要来neutron DB.

pass=`juju run --unit mysql/0 'leader-get root-password'`
juju run --unit mysql/0 "mysqldump -u root --password=$pass --single-transaction --skip-lock-tables --set-gtid-purged=OFF --databases neutron --quick --result-file=/tmp/neutron.sql"

分析到有的SG中979上的LB, 有高达1994个active ports (This is for security group:4d15a8fa-3adc-4750-87da-c3d26902dbca network:4a57f171-1e6b-4ae4-9ec5-2203144817b7 (lb-mgmt-net).):

select sgb.security_group_id, p.network_id, n.name, count(sgb.port_id) from securitygroupportbindings sgb join ports p on p.id=sgb.port_id join networks n on n.id=p.network_id group by 1,2 order by 4 desc limit 10;
+--------------------------------------+--------------------------------------+-------------------+--------------------+
| security_group_id                    | network_id                           | name              | count(sgb.port_id) |
+--------------------------------------+--------------------------------------+-------------------+--------------------+
| 4d15a8fa-3adc-4750-87da-c3d26902dbca | 4a57f171-1e6b-4ae4-9ec5-2203144817b7 | lb-mgmt-net       |               1994 |
| 7f9e022e-f12d-4526-8e11-b09f5ed2e0e5 | 4a8c719c-04e7-490c-b7d2-b99e79e7e79f | network_lb_az3    |                241 |

select count(id) from load_balancer where provisioning_status='ACTIVE';
+-----------+
| count(id) |
+-----------+
|       979 |
+-----------+
...

相关代码如下:

#https://github.com/openstack/neutron/blob/stable/ussuri/neutron/agent/securitygroups_rpc.py#L214
    def security_groups_member_updated(self, security_groups):
        LOG.info("Security group "
                 "member updated %r", security_groups)
        self._security_group_updated(
            security_groups,
            'security_group_source_groups',
            'sg_member')
            
#https://github.com/openstack/neutron/blob/stable/ussuri/neutron/agent/securitygroups_rpc.py#L206
    def security_groups_rule_updated(self, security_groups):
        LOG.info("Security group "
                 "rule updated %r", security_groups)
        self._security_group_updated(
            security_groups,
            'security_groups',
            'sg_rule')

这个是FW rpc的实现机制:
https://github.com/openstack/neutron/blob/stable/ussuri/doc/source/contributor/internals/openvswitch_firewall.rst#firewall-api-calls
https://github.com/openstack/neutron/blob/stable/ussuri/doc/source/contributor/internals/rpc_callbacks.rst

There are two main calls performed by the firewall driver in order to either create or update a port with security groups - prepare_port_filter and update_port_filter. Both methods rely on the security group objects that are already defined in the driver and work similarly to their iptables counterparts. The definition of the objects will be described later in this document. prepare_port_filter must be called only once during port creation, and it defines the initial rules for the port. When the port is updated, all filtering rules are removed, and new rules are generated based on the available information about security groups in the driver.

Security group rules can be defined in the firewall driver by calling update_security_group_rules, which rewrites all the rules for a given security group. If a remote security group is changed, then update_security_group_members is called to determine the set of IP addresses that should be allowed for this remote security group. Calling this method will not have any effect on existing instance ports. In other words, if the port is using security groups and its rules are changed by calling one of the above methods, then no new rules are generated for this port. update_port_filter must be called for the changes to take effect.

All the machinery above is controlled by security group RPC methods, which mean the firewall driver doesn't have any logic of which port should be updated based on the provided changes, it only accomplishes actions when called from the controller.

推荐应该call了下列代码路径:

security_groups_rule_updated ->
and
security_groups_member_updated ->
_security_group_updated ->
refresh_firewall ->
_apply_port_filter(device_ids, update_filter=True) ->
firewall.update_port_filter(device) ->
_set_port_filters(of_port) ->
add_flows_from_rules ->
conj_ip_manager.update_flows_for_vlan ->
_update_flows_for_vlan_subr ->
delete_flows_for_flow_state ->
delete_flow_for_ip

是有可能触发了下列lp bug吗?
https://bugs.launchpad.net/neutron/+bug/1975674
https://bugs.launchpad.net/neutron/+bug/1813703

相关文章:

  • 会员积分制度有哪些好处?
  • orc文件的读写及整合hive
  • High Performance Computing 综述
  • DoozyUI⭐️十六、Progressor Group:可视化帮手,进度条组自动求平均值
  • 高德地图 INVALID_USER_SCODE 10008 错误
  • 2020中青杯A题集成电路通道布线数学建模全过程论文及程序
  • 神经系统图 基本结构图,大脑神经网络结构图片
  • 堆排序算法用数组模拟二叉树,求A[K](K>0)的父节点坐标
  • 【Unity】物理引擎、生命周期物理阶段、刚体、碰撞体、触发器、物理材质
  • 苹果证书在线制作
  • redux太繁琐?一文入门学会使用mobx简化项目的状态管理
  • python获取获得一个月中的最后一天
  • 用代码过中秋,python海龟月饼你要不要尝一口?
  • 洛谷刷题C语言:FAKTOR、BELA、PUTOVANJE、使用三个系统程度的能力、R.I.P.
  • Rust之Sea-orm快速入门指南
  • 时间复杂度分析经典问题——最大子序列和
  • 【comparator, comparable】小总结
  • 4月23日世界读书日 网络营销论坛推荐《正在爆发的营销革命》
  • Java新版本的开发已正式进入轨道,版本号18.3
  • JAVA之继承和多态
  • jdbc就是这么简单
  • leetcode讲解--894. All Possible Full Binary Trees
  • windows-nginx-https-本地配置
  • 融云开发漫谈:你是否了解Go语言并发编程的第一要义?
  • 微信开源mars源码分析1—上层samples分析
  • 小程序01:wepy框架整合iview webapp UI
  • 小程序开发之路(一)
  • 新手搭建网站的主要流程
  • ​iOS安全加固方法及实现
  • #if和#ifdef区别
  • (c语言版)滑动窗口 给定一个字符串,只包含字母和数字,按要求找出字符串中的最长(连续)子串的长度
  • (Forward) Music Player: From UI Proposal to Code
  • (Java岗)秋招打卡!一本学历拿下美团、阿里、快手、米哈游offer
  • (ZT)一个美国文科博士的YardLife
  • (论文阅读22/100)Learning a Deep Compact Image Representation for Visual Tracking
  • (牛客腾讯思维编程题)编码编码分组打印下标(java 版本+ C版本)
  • (三)模仿学习-Action数据的模仿
  • (转)关于如何学好游戏3D引擎编程的一些经验
  • (转)原始图像数据和PDF中的图像数据
  • (总结)Linux下的暴力密码在线破解工具Hydra详解
  • .Net 4.0并行库实用性演练
  • .NET 反射 Reflect
  • .NET 事件模型教程(二)
  • .net利用SQLBulkCopy进行数据库之间的大批量数据传递
  • .net下简单快捷的数值高低位切换
  • @Autowired多个相同类型bean装配问题
  • @Transient注解
  • [《百万宝贝》观后]To be or not to be?
  • [Android实例] 保持屏幕长亮的两种方法 [转]
  • [BZOJ1089][SCOI2003]严格n元树(递推+高精度)
  • [C#]C# winform部署yolov8目标检测的openvino模型
  • [C++]指针与结构体
  • [Codeforces] combinatorics (R1600) Part.2
  • [hdu 2826] The troubles of lmy [简单计算几何 - 相似]
  • [Java]深入剖析常见排序