为了训练深度学习模型,经常要整理大量的标注数据,需统一不同格式的标注数据,一般情况下习惯读取TXT格式的数据。但实际中经常遇到XML格式的标注数据,在此举例:1.读取XML标注数据;2.写入TXT文件。

XML标注数据如下

<annotation verified="no"> 
 <folder>suE</folder> 
 <filename>Drivingrecord_001</filename> 
 <path>C:\Desktop\Drivingrecord_001.jpg</path> 
 <source> 
  <database>Unknown</database> 
 </source> 
 <size> 
  <width>1920</width> 
  <height>1080</height> 
  <depth>3</depth> 
 </size> 
 <segmented>0</segmented> 
 <object> 
  <name>苏E*****-蓝-1-白,灰-大众-上海大众-桑塔纳-尚纳</name> 
  <flag>polygon</flag> 
  <pose>Unspecified</pose> 
  <truncated>0</truncated> 
  <difficult>0</difficult> 
  <bndbox> 
   <leftTopx>170</leftTopx> 
   <leftTopy>704</leftTopy> 
   <rightTopx>167</rightTopx> 
   <rightTopy>729</rightTopy> 
   <rightBottomx>242</rightBottomx> 
   <rightBottomy>735</rightBottomy> 
   <leftBottomx>243</leftBottomx> 
   <leftBottomy>710</leftBottomy> 
  </bndbox> 
 </object> 
 <object> 
  <name>苏E*****-蓝-1-黄-雷克萨斯-雷克萨斯(进口)-雷克萨斯RX</name> 
  <flag>polygon</flag> 
  <pose>Unspecified</pose> 
  <truncated>0</truncated> 
  <difficult>0</difficult> 
  <bndbox> 
   <leftTopx>733</leftTopx> 
   <leftTopy>721</leftTopy> 
   <rightTopx>733</rightTopx> 
   <rightTopy>759</rightTopy> 
   <rightBottomx>881</rightBottomx> 
   <rightBottomy>760</rightBottomy> 
   <leftBottomx>882</leftBottomx> 
   <leftBottomy>722</leftBottomy> 
  </bndbox> 
 </object> 
 <object> 
  <name>苏*****-蓝-1-黑-宝马-宝马(进口)-宝马7系</name> 
  <flag>polygon</flag> 
  <pose>Unspecified</pose> 
  <truncated>0</truncated> 
  <difficult>0</difficult> 
  <bndbox> 
   <leftTopx>1274</leftTopx> 
 <leftTopy>657</leftTopy> 
   <rightTopx>1274</rightTopx> 
   <rightTopy>671</rightTopy> 
   <rightBottomx>1325</rightBottomx> 
   <rightBottomy>670</rightBottomy> 
   <leftBottomx>1326</leftBottomx> 
   <leftBottomy>656</leftBottomy> 
  </bndbox> 
 </object> 
 <object> 
  <name>苏*****-蓝-1-灰-标致-东风标致-标致307</name> 
  <flag>polygon</flag> 
  <pose>Unspecified</pose> 
  <truncated>0</truncated> 
  <difficult>0</difficult> 
  <bndbox> 
   <leftTopx>1609</leftTopx> 
   <leftTopy>658</leftTopy> 
   <rightTopx>1611</rightTopx> 
   <rightTopy>671</rightTopy> 
   <rightBottomx>1659</rightBottomx> 
   <rightBottomy>669</rightBottomy> 
   <leftBottomx>1657</leftBottomx> 
   <leftBottomy>656</leftBottomy> 
  </bndbox> 
 </object> 
</annotation> 

在此,我们只需要图片名filename,和每个object的坐标(四个点的坐标)

Drivingrecord_001.jpg 170 704 167 729 242 735 243 710 733 721 733 759 881 760 882 722 1274 657 1274 671 1325 670 1326 656 1609 658 1611 671 1659 669 1657 656  

利用xml.dom.*模块,文件对象模块DOM在读取XML文件时,一次读取整个文件,将其所有数据保存在一个树结构中,此时,可利用DOM的各种函数来读取目标数据。在此,利用xml.dom.minidom解析XML文件。

并将目标数据写入TXT文档。

# -*- coding: utf-8 -*- 
""" 
Created on Fri Mar 2 15:36:44 2018 
 
@author: gg 
""" 
 
import xml.dom.minidom 
import os 
 
save_dir = 'D:\plate_train'  
if not os.path.exists(save_dir): 
  os.mkdir(save_dir) 
f = open(os.path.join(save_dir, 'landmark.txt'), 'w') 
 
DOMTree = xml.dom.minidom.parse('D:\plate_train\label\Drivingrecord_001.xml') 
annotation = DOMTree.documentElement 
 
filename = annotation.getElementsByTagName("filename")[0] 
imgname = filename.childNodes[0].data+'.jpg' 
print(imgname) 
   
objects = annotation.getElementsByTagName("object") 
 
loc = [imgname] #文档保存格式:文件名 坐标 
 
for object in objects: 
  bbox = object.getElementsByTagName("bndbox")[0] 
  leftTopx = bbox.getElementsByTagName("leftTopx")[0] 
  lefttopx = leftTopx.childNodes[0].data 
  print(lefttopx) 
  leftTopy = bbox.getElementsByTagName("leftTopy")[0] 
  lefttopy = leftTopy.childNodes[0].data 
  print(lefttopy) 
  rightTopx = bbox.getElementsByTagName("rightTopx")[0] 
  righttopx = rightTopx.childNodes[0].data 
  print(righttopx) 
  rightTopy = bbox.getElementsByTagName("rightTopy")[0] 
  righttopy = rightTopy.childNodes[0].data 
  print(righttopy) 
  rightBottomx = bbox.getElementsByTagName("rightBottomx")[0] 
  rightbottomx = rightBottomx.childNodes[0].data 
  print(rightbottomx) 
  rightBottomy = bbox.getElementsByTagName("rightBottomy")[0] 
  rightbottomy = rightBottomy.childNodes[0].data 
  print(rightbottomy) 
  leftBottomx = bbox.getElementsByTagName("leftBottomx")[0] 
  leftbottomx = leftBottomx.childNodes[0].data 
  print(leftbottomx) 
  leftBottomy = bbox.getElementsByTagName("leftBottomy")[0] 
  leftbottomy = leftBottomy.childNodes[0].data  
  print(leftbottomy) 
   
  loc = loc + [lefttopx, lefttopy, righttopx, righttopy, rightbottomx, rightbottomy, leftbottomx, leftbottomy] 
   
for i in range(len(loc)): 
  f.write(str(loc[i])+' ') 
f.write('\t\n')   
f.close() 
   

以上这篇python代码xml转txt实例就是小编分享给大家的全部内容了,希望能给大家一个参考,也希望大家多多支持。

标签:
python,xml转txt

免责声明:本站文章均来自网站采集或用户投稿,网站不提供任何软件下载或自行开发的软件! 如有用户或公司发现本站内容信息存在侵权行为,请邮件告知! 858582#qq.com
金钱帮资源网 Copyright www.kbjia.com

评论“python代码xml转txt实例”

暂无“python代码xml转txt实例”评论...

稳了!魔兽国服回归的3条重磅消息!官宣时间再确认!

昨天有一位朋友在大神群里分享,自己亚服账号被封号之后居然弹出了国服的封号信息对话框。

这里面让他访问的是一个国服的战网网址,com.cn和后面的zh都非常明白地表明这就是国服战网。

而他在复制这个网址并且进行登录之后,确实是网易的网址,也就是我们熟悉的停服之后国服发布的暴雪游戏产品运营到期开放退款的说明。这是一件比较奇怪的事情,因为以前都没有出现这样的情况,现在突然提示跳转到国服战网的网址,是不是说明了简体中文客户端已经开始进行更新了呢?