当前位置: 首页 > news >正文

【计算机视觉】24-Object Detection

文章目录

  • 24-Object Detection
    • 1. Introduction
    • 2. Methods
      • 2.1 Sliding Window
      • 2.2 R-CNN: Region-Based CNN
      • 2.3 Fast R-CNN
      • 2.4 Faster R-CNN: Learnable Region Proposals
      • 2.5 Results of objects detection
    • 3. Summary
    • Reference

24-Object Detection

1. Introduction

  1. Task Definition

    Input: Single RGB Image

    Output: A set of detected objects;

    For each object predict:

    • Category label (from fixed, known set of categories)

    • Bounding box(four numbers: x, y, width, height)

  2. Challenges

    • Multiple outputs: Need to output variable numbers of objects per image
    • Multiple types of output: Need to predict ”what” (category label) as well as “where” (bounding box)
    • Large images: Classification works at 224x224; need higher resolution for detection, often ~800x600
  3. Detecting a single object

    image-20231120145632741

    With two branches, outputting label, and box

    Problem: Images can have more than one object! And if we use multiple single object detection, it will decrease the efficiency.

2. Methods

2.1 Sliding Window

Apply a CNN to many different crops of the image, CNN classifies each crop as an object or background:

image-20231120150748738

Problem: Need too many calculations

  • Consider an image of size H*W and a box of size h*w
  • Total possible boxes: ∑ h = 1 H ∑ w = 1 W ( W − w + 1 ) ( H − h + 1 ) = H ( H + 1 ) 2 W ( W + 1 ) 2 \sum_{h=1}^{H}\sum_{w=1}^{W}(W-w+1)(H-h+1)=\frac{H(H+1)}{2}\frac{W(W+1)}{2} h=1Hw=1W(Ww+1)(Hh+1)=2H(H+1)2W(W+1)
  • 800 x 600 image has ~58M boxes! No way we can evaluate them all.

2.2 R-CNN: Region-Based CNN

  1. Region Proposals(Selective Search)

    Selective Search is a region proposal algorithm used in object detection. It is based on computing hierarchical grouping of similar regions based on color, texture, size and shape compatibility.

    Selective Search starts by over-segmenting the image based on intensity of the pixels using a graph-based segmentation method by Felzenszwalb and Huttenlocher.

    image-20231120213007261

    Selective Search algorithm takes these oversegments as initial input and performs the following steps

    1. Add all bounding boxes corresponding to segmented parts to the list of regional proposals
    2. Group adjacent segments based on similarity
    3. Go to step 1

    At each iteration, larger segments are formed and added to the list of region proposals. Hence we create region proposals from smaller segments to larger segments in a bottom-up approach.

    As for the calculation of similarity measures based on color, texture, size and shape compatibility, please refer to Selective Search for Object Detection (C++ / Python) | LearnOpenCV

  2. Architecture of the network

    image-20231120214110598

    On two thousand selected regions, we narrow them down to the size required for classification, and after passing through the convolutional network, we output the category along with the box offset

  3. Steps

    1. Run region proposal method to compute ~2000 region proposals
    2. Resize each region to 224x224 and run independently through CNN to predict class scores and bbox transform
    3. Use scores to select a subset of region proposals to output (Many choices here: threshold on background, or per-category? Or take top K proposals per image?)
    4. Compare with ground-truth boxes
  4. Details(Focus on step3 and 4)

    1. Intersection over Union (IoU)
      I o U = Area of Intersection Area of Union IoU=\frac{\color{yellow}{\text{Area of Intersection}}}{\color{purple}{\text{Area of Union}}} IoU=Area of UnionArea of Intersection
      在这里插入图片描述

    2. Non-Max Suppression (NMS)

      • Select next highest-scoring box

      • Eliminate lower-scoring boxes(Comparing the highest-scoring box to all the others ) with IoU > threshold (e.g. 0.7)

      • If any boxes remain, GOTO 1

      Problem: NMS may eliminate ”good” boxes when objects are highly overlapping:

在这里插入图片描述

  1. Mean Average Precision (mAP)

    Use the gif to understand it(but I only have the final image):

在这里插入图片描述 For example, the mAP in COCO dataset is 0.4.

  1. Problem: Very slow! Need to do ~2k forward passes for each image!

    Solution: Run CNN before warping!

2.3 Fast R-CNN

  1. Architecture:

    image-20231120151757798
    • Most of the computation happens in the backbone network; this saves work for overlapping region proposals

    • Per-Region network is relatively lightweight

  2. The concrete architecture in Alexnet and Resnet:

    image-20231120152141617 image-20231120152156583
  3. Details:

    How to crop features?

    image-20231120222841764

    In this process, there are two errors:

    img

    如下图,假设输入图像经过一系列卷积层下采样32倍后输出的特征图大小为8x8,现有一 RoI 的左上角和右下角坐标(x, y 形式)分别为(0, 100) 和 (198, 224),映射至特征图上后坐标变为(0, 100 / 32)和(198 / 32,224 / 32),由于像素点是离散的,因此向下取整后最终坐标为(0, 3)和(6, 7),这里产生了第一次量化误差。

    假设最终需要将 RoI 变为固定的2x2大小,那么将 RoI 平均划分为2x2个区域,每个区域长宽分别为 (6 - 0 + 1) / 2 和 (7 - 3 + 1) / 2 即 3.5 和 2.5,同样,由于像素点是离散的,因此有些区域的长取3,另一些取4,而有些区域的宽取2,另一些取3,这里产生了第二次量化误差。

  4. RoI Align in Mask R-CNN

在这里插入图片描述

Notice: RoI Align needs to set a hyperparameter to represent the number of sampling points in each region, which is usually 4.

  1. Speed

    It has an enormous increase from R-CNN. But we can find that region proposals costs lots of time.

2.4 Faster R-CNN: Learnable Region Proposals

  1. Architecture:

    Insert Region Proposal Network (RPN) to predict proposals from feature
    在这里插入图片描述

  2. Details:

在这里插入图片描述

At each point, predict whether the corresponding anchor contains an object. And we use logistic regression to express the error. predict scores with conv layer

  1. Evaluation

在这里插入图片描述

  1. Improvement

    Faster R-CNN is a Two-stage object detector:

    But we want to design the structure of end to end, eliminating the second stage. So we change the function of region proposal network to predict the class label.
    在这里插入图片描述

2.5 Results of objects detection

在这里插入图片描述

  • Two-stage method (Faster R-CNN) gets the best accuracy but are slower.
  • Single-stage methods (SSD) are much faster but don’t perform as well
  • Bigger backbones improve performance, but are slower
  • Diminishing returns for slower methods

在这里插入图片描述

These results are a few years old …since then GPUs have gotten faster, and we’ve improved performance with many tricks:

  • Train longer!
  • Multiscale backbone: Feature
    Pyramid Networks
  • Better backbone: ResNeXt
  • Single-Stage methods have improved
  • Very big models work better
  • Test-time augmentation pushes
    numbers up
  • Big ensembles, more data, etc

3. Summary

Reference

[1] RoI Pooling 系列方法介绍(文末附源码) - 知乎 (zhihu.com)

[2] Selective Search for Object Detection (C++ / Python) | LearnOpenCV

相关文章:

  • 【图数据库实战】HugeGraph图计算流程
  • Vue3 源码解读系列(十三)——双向数据绑定 v-model
  • OpenCV快速入门:直方图、掩膜、模板匹配和霍夫检测
  • 【脑与认知科学】【n-back游戏】
  • 编程刷题网站以及实用型网站推荐
  • MR外包团队:MR、XR混合现实技术应用于游戏、培训,心理咨询、教育成为一种创新的各行业MR、XR形式!
  • Vue3的异步组件使用
  • UEFI实战——键盘操作
  • 项目交互-选择器交互
  • win10手机投屏到电脑的操作方法
  • CISP模拟试题(三)
  • 米家竞品分析
  • 微信私域运营工具CRM
  • 前端新手Vue3+Vite+Ts+Pinia+Sass项目指北系列文章 —— 第二章 环境部署
  • CentOS 安装etcd集群 —— 筑梦之路
  • 【跃迁之路】【735天】程序员高效学习方法论探索系列(实验阶段492-2019.2.25)...
  • cookie和session
  • JS变量作用域
  • JS正则表达式精简教程(JavaScript RegExp 对象)
  • MySQL Access denied for user 'root'@'localhost' 解决方法
  • MySQL常见的两种存储引擎:MyISAM与InnoDB的爱恨情仇
  • MySQL-事务管理(基础)
  • ucore操作系统实验笔记 - 重新理解中断
  • Zsh 开发指南(第十四篇 文件读写)
  • 半理解系列--Promise的进化史
  • 分布式熔断降级平台aegis
  • 机器学习 vs. 深度学习
  • 如何编写一个可升级的智能合约
  • 如何学习JavaEE,项目又该如何做?
  • 微信小程序设置上一页数据
  • 学习笔记DL002:AI、机器学习、表示学习、深度学习,第一次大衰退
  • 在electron中实现跨域请求,无需更改服务器端设置
  • 掌握面试——弹出框的实现(一道题中包含布局/js设计模式)
  • 追踪解析 FutureTask 源码
  • ​​​​​​​sokit v1.3抓手机应用socket数据包: Socket是传输控制层协议,WebSocket是应用层协议。
  • ​软考-高级-系统架构设计师教程(清华第2版)【第20章 系统架构设计师论文写作要点(P717~728)-思维导图】​
  • #include<初见C语言之指针(5)>
  • (1)(1.13) SiK无线电高级配置(六)
  • (4)Elastix图像配准:3D图像
  • (C语言)输入自定义个数的整数,打印出最大值和最小值
  • (Redis使用系列) Springboot 使用redis实现接口Api限流 十
  • (附程序)AD采集中的10种经典软件滤波程序优缺点分析
  • (附源码)springboot课程在线考试系统 毕业设计 655127
  • (附源码)小程序儿童艺术培训机构教育管理小程序 毕业设计 201740
  • (一)Neo4j下载安装以及初次使用
  • (原創) 如何動態建立二維陣列(多維陣列)? (.NET) (C#)
  • (转)【Hibernate总结系列】使用举例
  • (转)chrome浏览器收藏夹(书签)的导出与导入
  • (转)LINQ之路
  • .NET 8 编写 LiteDB vs SQLite 数据库 CRUD 接口性能测试(准备篇)
  • .NET Core、DNX、DNU、DNVM、MVC6学习资料
  • .NET Entity FrameWork 总结 ,在项目中用处个人感觉不大。适合初级用用,不涉及到与数据库通信。
  • .NET 设计模式—适配器模式(Adapter Pattern)
  • .pings勒索病毒的威胁:如何应对.pings勒索病毒的突袭?
  • /usr/lib/mysql/plugin权限_给数据库增加密码策略遇到的权限问题