当前位置：首页 > news >正文

protobuf中SerializeToString和SerializePartialToString的区别

news 来源：原创 2024/4/28 17:13:14

文章目录

前言
proto2
- message定义
- message扩展
- 注意事项
proto3
序列化
- SerializeToString和SerializeAsString区别
- SerializeToString和SerializePartialToString区别
总结

前言

protobuf是Google提出的序列化方案，此方案独立于语言和平台，目前提供了如c++、go、python等多种语言的实现，使用比较广泛，具有性能开销小，压缩率高等优点，是值得学习的优秀开源库。

protobuf有 v2 和 v3 两个主要的并且差异很大的版本，有一些关于protobuf的文章中并没有说明版本，有些描述的内容给人造成了疑惑，所以在使用protobuf前要明确自己使用的版本，查找对应的特性。

proto2

这个版本在编写 .proto 文件时的字段有三种限定符，分别是required、optional 和 repeated。

required：必须设置该字段，如果是在debug模式下编译 libprotobuf，则序列化一个未初始化（未对required字段赋值）的 message 将导致断言失败。在release模式的构建中，将跳过检查并始终写入消息，但解析未初始化的消息将返回false表示失败。
optional：可以设置也可以不设置该字段。如果未设置可选字段值，则使用默认值，也可以用[default = value]进行设置。
repeated：该字段可以重复任意次数（包括零次），可以将 repeated 字段视为动态大小的数组。

message定义

定义一个简单的 message 结构如下：

message Person {
  required string name = 1;
  optional string email = 2;
  optional int age = 3 [default = 18];
  repeated bytes phones = 4;
}

观察 message 定义可以看到每个字段后面都有 = 1、= 2 的标记，这些被称为 Tags，在 protobuf 中同一个 message 中的每个字段都需要有独一无二的tag，tag 为 1-15 的是单字节编码，16-2047 使用2字节编码，所以1-15应该给频繁使用的字段。

关于tag的取值，还有一种范围是[1,536870911]的说法，同时 19000 到 19999 之间的数字也不能使用，因为它们是 protobuf 的实现中保留的，也就是 FieldDescriptor::kFirstReservedNumber 到 FieldDescriptor::kLastReservedNumber 指定的范围，如果使用其中的数字，导出 .proto 文件时会报错，此处存疑，需要验证一下。

message扩展

在使用的了 protobuf 的项目发布以后，绝对会遇到扩展原有 message 结构的需求，这一点不可避免，除非发布后的项目不再升级维护了，要想扩展就需要兼容之前的代码逻辑，这里有一些必须遵守的规则，否则就达不到兼容的目的。

不能更改任何现有字段的 tag
不能添加或删除任何 required 字段
可以删除 optional 或 repeated 的字段
可以添加新的 optional 或 repeated 字段，但必须使用新的tag，曾经使用过又删除的 tag 也不能再使用了

注意事项

proto2 中对 required 的使用永远都应该非常小心。如果想在某个时刻停止写入或发送 required 字段，直接将字段更改为可选字段将会有问题。一些工程师得出的经验是，使用 required 弊大于利，他们更喜欢只使用 optional 和 repeated。

proto3

proto3 比 proto2 支持更多语言但更简洁，去掉了一些复杂的语法和特性。

在第一行非空白非注释行，必须写：syntax = "proto3";
直接从语法层面上移除了 required 规则，取消了 required 限定词
增加了对 Go、Ruby、JavaNano 等语言的支持
移除了 default 选项，字段的默认值只能根据字段类型由系统决定

序列化

将 message 结构对象序列化的函数有很多，即使是序列化成字符串也有多个函数可以使用，比如 SerializeToString、SerializePartialToString、SerializeAsString、SerializePartialAsString 等等。

SerializeToString和SerializeAsString区别

这两个还是很好区分的，从源码角度一眼就能够分辨出来：

std::string MessageLite::SerializeAsString() const {
  // If the compiler implements the (Named) Return Value Optimization,
  // the local variable 'output' will not actually reside on the stack
  // of this function, but will be overlaid with the object that the
  // caller supplied for the return value to be constructed in.
  std::string output;
  if (!AppendToString(&output)) output.clear();
  return output;
}

bool MessageLite::SerializeToString(std::string* output) const {
  output->clear();
  return AppendToString(output);
}

从源代码可以很容易看出，两者仅仅是参数和返回值的类型不同，其内部调用的函数都是一样的，SerializePartialToString 和 SerializePartialAsString 两个函数也是这种区别，可以根据外部逻辑所需来调用合适的函数。

bool MessageLite::SerializePartialToString(std::string* output) const {
  output->clear();
  return AppendPartialToString(output);
}

std::string MessageLite::SerializePartialAsString() const {
  std::string output;
  if (!AppendPartialToString(&output)) output.clear();
  return output;
}

SerializeToString和SerializePartialToString区别

这两个函数的区别在于内部调用的函数不同，一个调用 AppendToString，另一个调用 AppendPartialToString，两个被调用函数的源代码如下：

bool MessageLite::AppendToString(std::string* output) const {
  GOOGLE_DCHECK(IsInitialized()) << InitializationErrorMessage("serialize", *this);
  return AppendPartialToString(output);
}

bool MessageLite::AppendPartialToString(std::string* output) const {
  size_t old_size = output->size();
  size_t byte_size = ByteSizeLong();
  if (byte_size > INT_MAX) {
    GOOGLE_LOG(ERROR) << GetTypeName()
               << " exceeded maximum protobuf size of 2GB: " << byte_size;
    return false;
}

原来 AppendToString 函数调用了 AppendPartialToString, 只是在调用之前先执行了一句 GOOGLE_DCHECK(IsInitialized()) << InitializationErrorMessage("serialize", *this); 这句话什么意思呢？

其实就是一个调试状态下的检查，类似于 assert 这个断言函数吧，检查的内容是判断这个 message 是否初始化，之前提到 required 修饰的字段必须要设置一个值，否者就是未初始化的状态，那么现在两个函数的区别就知道了，带有 “Partial” 函数其实是忽略 required 字段检查的，另外还有没有别的不同需要再进一步研究下源码了。

总结

protobuf有 v2 和 v3 两个主要的并且差异较大的版本，使用前请注意版本号
proto3 直接从语法层面上移除了 required 规则，移除了 default 选项，字段的默认值只能根据字段类型由系统决定
SerializeToString和SerializeAsString区别在于参数和返回值的不同，内部调用的函数是相同的
SerializeToString和SerializePartialToString区别在于SerializePartialToString会忽略 required 字段必须赋值的要求
在应用过程中尽可能重用 message 结构，这样protobuf内部实现中内存的重用

==>> 反爬链接，请勿点击，原地爆炸，概不负责！<<==