当前位置: 首页 > news >正文

tensorflow 一个Nan问题

学习cifar10的相关代码,遇到以下问题:
Traceback (most recent call last):
  File "/home/yangguang/machineLearning/venv4ML/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1322, in _do_call
    return fn(*args)
  File "/home/yangguang/machineLearning/venv4ML/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1307, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/home/yangguang/machineLearning/venv4ML/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1409, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Nan in summary histogram for: softmax_linear/weights/gradients
	 [[Node: softmax_linear/weights/gradients = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](softmax_linear/weights/gradients/tag, gradients/AddN/_211)]]
	 [[Node: GradientDescent/update/_232 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_511_GradientDescent/update", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/yangguang/machineLearning/learn_machineLearning/Tensorflow_learning/CIFAR10/cifar10_train.py", line 119, in <module>
    tf.app.run()
  File "/home/yangguang/machineLearning/venv4ML/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 126, in run
    _sys.exit(main(argv))
  File "/home/yangguang/machineLearning/learn_machineLearning/Tensorflow_learning/CIFAR10/cifar10_train.py", line 115, in main
    train()
  File "/home/yangguang/machineLearning/learn_machineLearning/Tensorflow_learning/CIFAR10/cifar10_train.py", line 107, in train
    mon_sess.run(train_op)
  File "/home/yangguang/machineLearning/venv4ML/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 567, in run
    run_metadata=run_metadata)
  File "/home/yangguang/machineLearning/venv4ML/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1043, in run
    run_metadata=run_metadata)
  File "/home/yangguang/machineLearning/venv4ML/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1134, in run
    raise six.reraise(*original_exc_info)
  File "/home/yangguang/machineLearning/venv4ML/lib/python3.6/site-packages/six.py", line 693, in reraise
    raise value
  File "/home/yangguang/machineLearning/venv4ML/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1119, in run
    return self._sess.run(*args, **kwargs)
  File "/home/yangguang/machineLearning/venv4ML/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1191, in run
    run_metadata=run_metadata)
  File "/home/yangguang/machineLearning/venv4ML/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 971, in run
    return self._sess.run(*args, **kwargs)
  File "/home/yangguang/machineLearning/venv4ML/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 900, in run
    run_metadata_ptr)
  File "/home/yangguang/machineLearning/venv4ML/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1135, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/yangguang/machineLearning/venv4ML/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1316, in _do_run
    run_metadata)
  File "/home/yangguang/machineLearning/venv4ML/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1335, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Nan in summary histogram for: softmax_linear/weights/gradients
	 [[Node: softmax_linear/weights/gradients = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](softmax_linear/weights/gradients/tag, gradients/AddN/_211)]]
	 [[Node: GradientDescent/update/_232 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_511_GradientDescent/update", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]

Caused by op 'softmax_linear/weights/gradients', defined at:
  File "/home/yangguang/machineLearning/learn_machineLearning/Tensorflow_learning/CIFAR10/cifar10_train.py", line 119, in <module>
    tf.app.run()
  File "/home/yangguang/machineLearning/venv4ML/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 126, in run
    _sys.exit(main(argv))
  File "/home/yangguang/machineLearning/learn_machineLearning/Tensorflow_learning/CIFAR10/cifar10_train.py", line 115, in main
    train()
  File "/home/yangguang/machineLearning/learn_machineLearning/Tensorflow_learning/CIFAR10/cifar10_train.py", line 71, in train
    train_op = cifar10.train(loss, global_step)
  File "/home/yangguang/machineLearning/learn_machineLearning/Tensorflow_learning/CIFAR10/cifar10.py", line 343, in train
    tf.summary.histogram(var.op.name + '/gradients', grad)
  File "/home/yangguang/machineLearning/venv4ML/lib/python3.6/site-packages/tensorflow/python/summary/summary.py", line 203, in histogram
    tag=tag, values=values, name=scope)
  File "/home/yangguang/machineLearning/venv4ML/lib/python3.6/site-packages/tensorflow/python/ops/gen_logging_ops.py", line 283, in histogram_summary
    "HistogramSummary", tag=tag, values=values, name=name)
  File "/home/yangguang/machineLearning/venv4ML/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/home/yangguang/machineLearning/venv4ML/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3392, in create_op
    op_def=op_def)
  File "/home/yangguang/machineLearning/venv4ML/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1718, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): Nan in summary histogram for: softmax_linear/weights/gradients
	 [[Node: softmax_linear/weights/gradients = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](softmax_linear/weights/gradients/tag, gradients/AddN/_211)]]
	 [[Node: GradientDescent/update/_232 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_511_GradientDescent/update", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]

弄了很久发现是个很蠢的原因,cifar10数据集我是自己从网上下载的,解压之后数据集文件的文件名为data_batch_1(是binary文件但是没有后缀),于是我把代码里的文件名的后缀bin去了,否则无法运行,然后就出来以上错误。后来,我把下载的数据集删了,让代码自己从网上下载和解压,得到的数据集的文件名是data_batch_1.bin(有后缀),然后再运行train文件,发现可以运行了。找这个问题找了3个多小时,想死。。。

相关文章:

  • tf-tips:InTopK和TopK,decay_steps
  • tensorflow tips:features,batch_size,iteration,epoch, global_step, shard
  • Argument must be a dense tensor: range(0, 3) - got shape [3], but wanted []
  • ubuntu 18.04自动更新后分辨率只剩下640x480选项
  • python3运行《21个项目玩转深度学习》遇到的问题
  • ubuntu18.04 pip3 install gym[all]出错
  • ubuntu18.04 docker安装+docker使用caffe+一般caffe安装
  • 深度学习剖根问底:交叉熵和KL散度的区别
  • Ubuntu18.04 国内环境使用kubeadm安装kubernetes+docker(单机版)
  • python PIL image.show() 之后如何关闭
  • PyInstaller 打包 .py文件为windows下.exe执行文件遇到的种种
  • python threading中的thread开始和停止
  • 如何修改/固定matplotlib显示图片窗口在屏幕上的位置
  • 基于Pytorch理解attention decoder网络结构
  • Pytorch和Tensorflow在实现RNN上的区别
  • 【JavaScript】通过闭包创建具有私有属性的实例对象
  • Elasticsearch 参考指南(升级前重新索引)
  • ES2017异步函数现已正式可用
  • Golang-长连接-状态推送
  • HTTP那些事
  • JSONP原理
  • js如何打印object对象
  • MySQL用户中的%到底包不包括localhost?
  • python 学习笔记 - Queue Pipes,进程间通讯
  • Python语法速览与机器学习开发环境搭建
  • V4L2视频输入框架概述
  • 闭包--闭包作用之保存(一)
  • 区块链将重新定义世界
  • 让你的分享飞起来——极光推出社会化分享组件
  • 容器服务kubernetes弹性伸缩高级用法
  • 一加3T解锁OEM、刷入TWRP、第三方ROM以及ROOT
  • 鱼骨图 - 如何绘制?
  • Salesforce和SAP Netweaver里数据库表的元数据设计
  • ​软考-高级-系统架构设计师教程(清华第2版)【第15章 面向服务架构设计理论与实践(P527~554)-思维导图】​
  • ​软考-高级-系统架构设计师教程(清华第2版)【第20章 系统架构设计师论文写作要点(P717~728)-思维导图】​
  • #绘制圆心_R语言——绘制一个诚意满满的圆 祝你2021圆圆满满
  • #我与Java虚拟机的故事#连载14:挑战高薪面试必看
  • (4)Elastix图像配准:3D图像
  • (9)STL算法之逆转旋转
  • (done) NLP “bag-of-words“ 方法 (带有二元分类和多元分类两个例子)词袋模型、BoW
  • (多级缓存)缓存同步
  • (附源码)springboot家庭装修管理系统 毕业设计 613205
  • (十)DDRC架构组成、效率Efficiency及功能实现
  • (四)docker:为mysql和java jar运行环境创建同一网络,容器互联
  • (完整代码)R语言中利用SVM-RFE机器学习算法筛选关键因子
  • (转载)Linux 多线程条件变量同步
  • .360、.halo勒索病毒的最新威胁:如何恢复您的数据?
  • .NET Core MongoDB数据仓储和工作单元模式封装
  • .NET Framework杂记
  • .NetCore Flurl.Http 升级到4.0后 https 无法建立SSL连接
  • .netcore 获取appsettings
  • .NET企业级应用架构设计系列之开场白
  • .Net下的签名与混淆
  • .NET序列化 serializable,反序列化
  • .Net中的集合