当前位置：首页 > news >正文

详解PASCAL VOC数据集及基于Python和PyTorch的下载、解析及可视化【目标检测+类别分割】

news 来源：原创 2024/9/21 8:35:17

PASCAL VOC数据集简介

PASCAL VOC数据集是计算机视觉领域中 目标检测（object detection） 任务和 分割（segmentation） 任务的基准数据集。PASCAL VOC数据和比赛发源于由欧盟资助的PASCAL2 Network of Excellence on Pattern Analysis, Statistical Modelling and Computational Learning项目。该比赛从2005年至2012年每年举办一次，并已经于2012年停办。因此，PASCAL VOC数据集是一系列数据集的集合，从2005年至2012年这八年按年发布，每年的数据集可以简写为VOC2005、VOC2006，以此类推。值得注意的是，VOC2007以后便不再发布test数据集。并且PASCAL VOC数据集中的图片来源于flickr网站和Microsoft Research Cambrige (MSRC)数据集，因此使用时要注意遵守flickr的使用条款。

* 笔者认为官方采用PASCAL命名而不是pattern analysis, statistical modelling, and computational learning visual object classes的首字母缩写PASMCL，应该是PASCAL更加简洁和易于识别，因为pascal是物理力学中的标准压力单位。

PASCAL VOC主页：http://host.robots.ox.ac.uk/pascal/VOC/
Visual Object Classes Challenge 2012 (VOC2012)：http://host.robots.ox.ac.uk/pascal/VOC/voc2012/index.html
The PASCAL Visual Object Classes Challenge 2007：http://host.robots.ox.ac.uk/pascal/VOC/voc2007/index.html

PASCAL VOC各年份数据集摘要

年份	统计数据	任务类别	备注
2005	4类别：bicycles、cars、motorbikes、people 1578张图片，2209个标注子数据集：train、validation、test	classification、segmentation
2006	10类别：bicycle、bus、car、cat、cow、dog、horse、motorbike、person、sheep 2618张图片、4754个标注子数据集：train、validation、test	classification、segmentation
2007	20类别： person；bird、cat、cow、dog、horse、sheep、aeroplane、bicycle、boat、bus、car、motorbike、train；bottle、chair、dining table、potted plant、sofa、tv/monitor 9963张图片、24640个标注子数据集：train、validation、test	classification、segmentation、person layout	最后一年公开test数据集类别固定为20个标注中增加了`truncation`标签评价指标由ROC-AUC变为AP
2008	同2007的20类别 train+validation子数据集与test子数据集的划分比例约为1:1 4340张图片、10363个标注子数据集：train、validation	classification、segmentation、person layout	标注中添加了`Occlusion`标签 segementation和person layout子数据集包含VOC2007的数据
2009	同2007的20类别 10103张图片、23374个ROI标注、4203个segmentation标注子数据集：train、validation、test	classification、segmentation、person layout	从现在开始图像都包含了前几年的图像和芯图像
2010	同2007的20类别 9963张图片、24640个标注、4203个segmentation 子数据集：train、validation	classification、segmentation、person layout	计算AP的方法从TREC式变为基于所有点计算
2011	同2007的20类别 11530张图片、27450个标注、5034个segmentation 子数据集：train、validation	classification、segmentation、person layout、action classification	action classification的类别扩展为“10+other”模式 layout标签并不完整，不是所有图片中的所有person均被标注
2012	同2007的20类别 11530张图片、27450个标注、6929个segmentation 子数据集：train、validation	classification、segmentation、person layout、action classification	使用person身上的参考点注释了action classification数据集

数据集下载

因为VOC2005-VOC2006，数据集的图片数量、物体类别数量都在不断变化。直到2007年，VOC2007的物体类别才固定下来，其所有的标签均比较完善，并且VOC2007也是最后一个数据集发布一个较为完整的test子数据集。在VOC2009-VOC2012期间，test子数据集均没有公开发布，并且图片数量和标签质量均有一定的完善。可以说VOC2012是PASCAL VOC系列数据集最后的一个版本，也是最完善的不包含test子数据集的PASCAL VOC公开数据集。
因此，我们一般使用VOC2007和VOC2012两个数据集。将VOC2007与VOC2012的train/val数据集进行合并，用于模型开发过程中的训练和验证，其共包含16511张图片。然后，单独使用VOC2007的test数据集作为测试，共包含4952张图片。

通过下面官方提供的网址下载

点击下方链接即可下载：

VOC2007：http://host.robots.ox.ac.uk/pascal/VOC/voc2007/index.html
- train/val：http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar
- test：http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar
VOC2012：http://host.robots.ox.ac.uk/pascal/VOC/voc2012/index.html
- train/val：http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar

通过PyTorch的API下载

torchvision.datasets.VOCDetection: https://pytorch.org/vision/0.17/generated/torchvision.datasets.VOCDetection.html
torchvision.datasets.VOCSegmentation: https://pytorch.org/vision/main/generated/torchvision.datasets.VOCSegmentation.html

数据集解析

目标检测数据集

VOC2007目标检测数据集下载后的文件夹结构如下【无论直接通过URL下载还是PyTorch的API解压后都应是如此！】：

VOCdevkit/VOC2007/
├─ Annotations/
│  ├─ 000001.xml
│  ├─ 000002.xml
│  └─ ...
├─ ImageSets
│  ├─ Layout/
│  │  ├─ train.txt
│  │  ├─ trainval.txt
│  │  └─ val.txt
│  ├─ Main/
│  │  ├─ aeroplane_train.txt
│  │  ├─ aeroplane_trainval.txt
│  │  ├─ aeroplane_val.txt
│  │  └─ ...
│  └─ Segmentation/
│     ├─ train.txt
│     ├─ trainval.txt
│     └─ val.txt
├─ JPEGImages
│  ├─ 000001.jpg
│  ├─ 000002.jpg
│  └─ ...
├─ SegmentationClass
│  ├─ 000001.png
│  ├─ 000002.png
│  └─ ...
└─ SegmentationObject├─ 000001.png├─ 000002.png└─ ...

目标检测使用torchvision.datasets.VOCDetectionAPI下载、解压和读取。它将加载JPEGImagesw文件夹下面的JPEG图片并范围PIL.Image对象，同时加载Annotations子文件夹下的XML文件并返回Python字典作为标签。

一个加载图片并显示目标检测数据的Python脚本如下：

#!/usr/bin/env python3
# -*- encoding utf-8 -*-'''
@File: **
@Date: 2024-08-27
@Author: KRISNAT
@Version: 0.0.0
@Email: **
@Copyright: (C)Copyright 2024, KRISNAT
@Desc: None
'''
import cv2
import PIL.Image
import matplotlib.pyplot as plt
from torchvision.datasets import VOCDetection
import numpy as np# Download or load VOC2007 for detection task from local file
voc2007_dec_trainval = VOCDetection(root='voc',year='2007',image_set='train',download=True,
)# Define a function to draw bounding box onto the image
def draw_bndbox(image: PIL.Image, object = None) -> PIL.Image:"""draw the bounding box of the PASCAL VOC image"""if object is None:return imageimg_cv2 = cv2.cvtColor(np.asanyarray(image), cv2.COLOR_RGB2BGR)  # convert PIL.Image RGB to cv2 BGRif isinstance(object, dict):  # only one objectname = object["name"]pose = object["pose"]xmin = int(object["bndbox"]["xmin"])ymin = int(object["bndbox"]["ymin"])xmax = int(object["bndbox"]["xmax"])ymax = int(object["bndbox"]["ymax"])img_cv2 = cv2.rectangle(img_cv2, (xmin, ymin), (xmax, ymax), (0, 0, 255))  # red retangleimg_cv2 = cv2.putText(img_cv2, name + ", " + pose, (xmin, ymin), cv2.FONT_HERSHEY_COMPLEX, 1, (0, 0, 255), 1, cv2.FILLED)elif isinstance(object, list):  # multiple objectsfor obj in object:  # here, object is a listname = obj["name"]pose = obj["pose"]xmin = int(obj["bndbox"]["xmin"])ymin = int(obj["bndbox"]["ymin"])xmax = int(obj["bndbox"]["xmax"])ymax = int(obj["bndbox"]["ymax"])img_cv2 = cv2.rectangle(img_cv2, (xmin, ymin), (xmax, ymax), (0, 0, 255))  # red retangleimg_cv2 = cv2.putText(img_cv2, name + ", " + pose, (xmin, ymin), cv2.FONT_HERSHEY_COMPLEX, 1, (0, 0, 255), 1, cv2.FILLED)else:raise "object can only be dict or list."return PIL.Image.fromarray(cv2.cvtColor(img_cv2, cv2.COLOR_BGR2RGB))if __name__ == "__main__":# show the firt four images and targetsfig = plt.figure(figsize=(10, 9))fig.suptitle("The first four images and labels"" \nof VOC2007 train dataset for Dectction in PyTorch")for idx, (image, label) in enumerate(voc2007_dec_trainval):filename = label["annotation"]["filename"]img_size = (label["annotation"]["size"]["width"], label["annotation"]["size"]["height"], label["annotation"]["size"]["depth"])# Be carefull: Ff the were only one object, you would get a Python dict# object, otherwise, you would get a list dict list of objectsobject = label["annotation"]["object"]xlabel = filename + "\n" + str(img_size).replace("'", '')ax = fig.add_subplot(2, 2, idx + 1)image_show = draw_bndbox(image, label["annotation"]["object"])ax.imshow(image_show)ax.set_xlabel(xlabel)# disable the ticks and frame of the axesax.set_frame_on(False)ax.set_xticks([])ax.set_yticks([])# only read the first four images and labelsif idx >= 3:break# plt.show()plt.savefig("PASCAL VOC Detection.jpg", bbox_inches='tight')  # Save the result

执行结果如下：

请添加图片描述

物体分割数据集

目标检测使用torchvision.datasets.VOCSegmentationAPI下载、解压和读取。它将加载JPEGImagesw文件夹下面的JPEG图片并范围PIL.Image对象，同时加载SegmentationClass子文件夹下的PNG文件并返回PIL.PngImagePlugin.PngImageFile对象作为掩膜标签。注意：torchvision.datasets.VOCSegmentation默认没有使用SegmentationObject目录下的实例分割掩膜，而是使用的SegmentationClass目录下的语义分割掩膜。

语义分割数据中掩膜的像素值代表了不同类别：
- 0=background
- 1=aeroplane
- 2=bicycle
- 3=bird
- 4=boat
- 5=bottle
- 6=bus
- 7=car
- 8=cat
- 9=chair
- 10=cow
- 11=dining table
- 12=dog
- 13=horse
- 14=motorbike
- 15=person
- 16=pottled plant
- 17=sheep
- 18=sofa
- 19=train
- 20=tv/monitor
- 255=void or unlabelled

一个加载图片并显示类别分割数据的Python脚本如下：

#!/usr/bin/env python3
# -*- encoding utf-8 -*-'''
@File: **
@Date: 2024-08-27
@Author: KRISNAT
@Version: 0.0.0
@Email: **
@Copyright: (C)Copyright 2024, KRISNAT
@Desc: None
'''
import PIL.Image
import matplotlib.pyplot as plt
from torchvision.datasets import VOCSegmentation
import numpy as np# Download or load VOC2007 for detection task from local file
voc2007_seg_trainval = VOCSegmentation(root='voc',year='2007',image_set='train',download=True,
)# Define a function to draw bounding box onto the image
def draw_mask(image: PIL.Image, target: PIL.Image = None) -> PIL.Image:"""draw the mask for segmentation of the PASCAL VOC image"""# define a color map for mask objectcolor_map ={0: (0, 0, 0, 128),  # background, black1: (247, 116, 95, 128),  # aeroplane2: (232, 129, 49, 128),  # bicycle3: (208, 142, 49, 128),  # bird4: (190, 150, 49, 128),  # boat5: (173, 156, 49, 128),  # bottle6: (173, 156, 49, 128),  # bus7: (155, 162, 49, 128),  # car8: (134, 167, 49, 128),  # cat9: (99, 174, 49, 128),  # chair10: (49, 178, 82, 128),  # cow11: (51, 176, 122, 128),  # dining table12: (52, 174, 142, 128),  # dog13: (53, 173, 157, 128),  # horse14: (54, 172, 170, 128),  # motorbike15: (54, 170, 182, 128),  # person16: (56, 168, 197, 128),  # pottled plant17: (57, 166, 216, 128),  # sheep18: (73, 160, 244, 128),  # sofa19: (135, 149, 244, 128),  # train20: (172, 136, 244, 128),  # tv/monitor255: (255, 255, 255, 128),  # void or unlabelled, white}mask = PIL.Image.new("RGBA", image.size, (0, 0, 0, 0))  # create a new mask image# iterate the pixel value of the target PNG imagetarget_array = np.array(target).T  # Here is question, why when convert PIL.PngImageFile to numpy.arrary, the width and weight is reversed? for x in range(target.width):for y in range(target.height):mpv = target_array[x, y]  # mask pixel valueif target_array[x, y] != 0 or target_array[x, y] != 255:mask.putpixel((x, y), color_map[mpv])  # alpha=128 means transluscent# merge the mask and origin imagesimage = image.convert("RGBA")merged_image = PIL.Image.alpha_composite(image, mask)return merged_image# show the firt four images and targets
if __name__ == "__main__":fig = plt.figure(figsize=(8, 6))fig.suptitle("The first four images and masks"" \nof VOC2007 train dataset for Segmentation in PyTorch")for idx, (image, target) in enumerate(voc2007_seg_trainval):ax = fig.add_subplot(2, 2, idx + 1)image_show = draw_mask(image, target)ax.imshow(image_show)# disable the ticks and frame of the axesax.set_frame_on(False)ax.set_xticks([])ax.set_yticks([])if idx >= 3:break# plt.show()plt.savefig("PASCAL VOC SegementationClass.jpg", bbox_inches='tight')   # Save the result

执行结果如下：

请添加图片描述

参考文献

Everingham M., Van Gool L., Williams C.K.I., et. al. The PASCAL Visual Object Class (VOC) Challenge[J]. International Journal of Computer Vision, 2010, 88(2): 303-339.
Everingham M., Eslami S.M.A., Van Gool L., et. al. The PASCAL Visual Object Classes Challenge: A Retrospective[J]. International Journal of Computer Vision, 2015, 111(1): 98-136.
C.K.I., et. al. The PASCAL Visual Object Class (VOC) Challenge[J]. International Journal of Computer Vision, 2010, 88(2): 303-339.
Everingham M., Eslami S.M.A., Van Gool L., et. al. The PASCAL Visual Object Classes Challenge: A Retrospective[J]. International Journal of Computer Vision, 2015, 111(1): 98-136.
The PASCAL Visual Object Classes Homepage[EB/OL]. [2024-08-25]. http://host.robots.ox.ac.uk/pascal/VOC/

收集整理和创作不易, 若有帮助🉑, 请帮忙点赞👍➕收藏❤️, 谢谢!✨✨🚀🚀