xerces-c++内存管理策略为何耗费大量内存
xerces-c++内存管理策略&为何耗费大量内存
- 本文结构
- 1) 奇怪的new语句
- 2) xerces-c++内存管理策略
- 3) xerces-c++为何耗费内存?
- 4) demo
本文结构
xerces-c++是一XML解析器。在讲其内存管理策略之前,需要先讲一下一个奇怪的new用法,之后会继续介绍它的内存管理策略,最后会说明它是如何耗费大量内存的。
1) 奇怪的new语句
DOMAttr *DOMDocumentImpl::createAttribute(const XMLCh *nam)
{
if(!nam || !isXMLName(nam))
throw DOMException(DOMException::INVALID_CHARACTER_ERR,0, getMemoryManager());
return new (this, DOMMemoryManager::ATTR_OBJECT) DOMAttrImpl(this,nam);
}
placement new的用法一般为 new (address) (type) initializer 的形式,address是一个指针指向已经存在的一块内存,而上面的代码却有两个参数紧跟在new后面。于是查了下官方文档,还真有这种用法:
new expression
C++ C++ language Expressions
Creates and initializes objects with dynamic storage duration, that is, objects whose lifetime is not necessarily limited by the scope in which they were created.
Syntax
::(optional) new ( type ) initializer(optional) (1)
::(optional) new new-type initializer(optional) (2)
::(optional) new (placement-params) ( type ) initializer(optional) (3)
::(optional) new (placement-params) new-type initializer(optional) (4)
官方文档还有一个例子与xerces代码很像:
new(2,f) T; // calls operator new(sizeof(T), 2, f)
都是相当于下面两步:
- 调用重载的new函数分配内存
inline void * operator new(size_t amt, DOMDocumentImpl *doc, DOMMemoryManager::NodeObjectType type)
{
void *p = doc->allocate(amt, type);
return p;
}
- 调用构造函数初始化上面分配的内存
DOMAttrImpl::DOMAttrImpl(DOMDocument *ownerDoc, const XMLCh *aName)
: fNode(ownerDoc), fParent (ownerDoc), fSchemaType(0)
{
DOMDocumentImpl *docImpl = (DOMDocumentImpl *)ownerDoc;
fName = docImpl->getPooledString(aName);
fNode.isSpecified(true);
}
所以,看着怪异,其实和一般的placement new没什么大的区别。
2) xerces-c++内存管理策略
通过上面被重载的new函数可以看到它调用了DOMDocumentImpl::allocate(amt, type), 而后者又会调用到:
void* DOMDocumentImpl::allocate(XMLSize_t amount)
{
// Align the request size so that suballocated blocks
// beyond this one will be maintained at the same alignment.
amount = XMLPlatformUtils::alignPointerForNewBlockAllocation(amount);
// If the request is for a largish block, hand it off to the system
// allocator. The block still must be linked into the list of
// allocated blocks so that it will be deleted when the time comes.
if (amount > kMaxSubAllocationSize)
{
// The size of the header we add to our raw blocks
XMLSize_t sizeOfHeader = XMLPlatformUtils::alignPointerForNewBlockAllocation(sizeof(void *));
// Try to allocate the block
void* newBlock;
newBlock = fMemoryManager->allocate(sizeOfHeader + amount);
// Link it into the list beyond current block, as current block
// is still being subdivided. If there is no current block
// then track that we have no bytes to further divide.
if (fCurrentBlock)
{
*(void **)newBlock = *(void **)fCurrentBlock;
*(void **)fCurrentBlock = newBlock;
}
else
{
*(void **)newBlock = 0;
fCurrentBlock = newBlock;
fFreePtr = 0;
fFreeBytesRemaining = 0;
}
void *retPtr = (char*)newBlock + sizeOfHeader;
return retPtr;
}
// It's a normal (sub-allocatable) request.
// Are we out of room in our current block?
if (amount > fFreeBytesRemaining)
{
// Request doesn't fit in the current block.
// The size of the header we add to our raw blocks
XMLSize_t sizeOfHeader = XMLPlatformUtils::alignPointerForNewBlockAllocation(sizeof(void *));
// Get a new block from the system allocator.
void* newBlock;
newBlock = fMemoryManager->allocate(fHeapAllocSize);
*(void **)newBlock = fCurrentBlock;
fCurrentBlock = newBlock;
fFreePtr = (char *)newBlock + sizeOfHeader;
fFreeBytesRemaining = fHeapAllocSize - sizeOfHeader;
if(fHeapAllocSize<kMaxHeapAllocSize)
fHeapAllocSize*=2;
}
// Subdivide the request off current block
void *retPtr = fFreePtr;
fFreePtr += amount;
fFreeBytesRemaining -= amount;
return retPtr;
}
这便是内存分配的核心代码,逻辑不复杂,可以总结为以下几点:
- 如果要分配的内存大于kMaxSubAllocationSize(0x0100)直接走原始的系统new函数。
- 否则上次分配的大块内存还有剩余且大于等于需要的,则用剩余的。
- 剩余的不够则新分配一大块内存,大小为fHeapAllocSize。
- 这些大块内存会组成链表,fCurrentBlock是头指针,fFreeBytesRemaining是当前大块内存剩余未用的字节数。
DOMDocumentImpl是对外的接口,要想创建节点(Node)就必须通过一系列的createXXX来创建(工厂模式?),比如createAttribute,createElement,而这些create函数都会走allocate函数。也就是说每个Attribute/Element实例都来自链表上的大块内存。这个策略让我想起了《Effecive C++》也有类似的代码。
这些节点中途不会释放,直到最后要释放整个Document时才一起释放。请参考以下代码:
DOMDocumentImpl::~DOMDocumentImpl()
{
...
// Delete the heap for this document. This uncerimoniously yanks the storage
// out from under all of the nodes in the document. Destructors are NOT called.
this->deleteHeap();``
}`
void DOMDocumentImpl::deleteHeap()
{
while (fCurrentBlock != 0)
{
void *nextBlock = *(void **)fCurrentBlock;
fMemoryManager->deallocate(fCurrentBlock);
fCurrentBlock = nextBlock;
}
}
3) xerces-c++为何耗费内存?
根本原因是描述节点、属性等的数据结构太大,可以想像成重型卡车(每个节点或属性)只拉一袋大米。
既然所有的节点、属性等类的实例内存分配都走allocate,那我们就让它打印出为哪个类分配了多少字节,看看每辆卡车自身多重?
void * DOMDocumentImpl::allocate(XMLSize_t amount, DOMMemoryManager::NodeObjectType type)
{
static std::map<int, std::string> maps = {
{DOMMemoryManager::NodeObjectType::ATTR_OBJECT , "ATTR_OBJECT"},
{DOMMemoryManager::NodeObjectType::ATTR_NS_OBJECT , "ATTR_NS_OBJECT"},
{DOMMemoryManager::NodeObjectType::CDATA_SECTION_OBJECT , "CDATA_SECTION_OBJECT"},
{DOMMemoryManager::NodeObjectType::COMMENT_OBJECT , "COMMENT_OBJECT"},
{DOMMemoryManager::NodeObjectType::DOCUMENT_FRAGMENT_OBJECT , "DOCUMENT_FRAGMENT_OBJECT"},
{DOMMemoryManager::NodeObjectType::DOCUMENT_TYPE_OBJECT , "DOCUMENT_TYPE_OBJECT"},
{DOMMemoryManager::NodeObjectType::ELEMENT_OBJECT , "ELEMENT_OBJECT"},
{DOMMemoryManager::NodeObjectType::ELEMENT_NS_OBJECT , "ELEMENT_NS_OBJECT"},
{DOMMemoryManager::NodeObjectType::ENTITY_OBJECT , "ENTITY_OBJECT"},
{DOMMemoryManager::NodeObjectType::ENTITY_REFERENCE_OBJECT , "ENTITY_REFERENCE_OBJECT"},
{DOMMemoryManager::NodeObjectType::NOTATION_OBJECT , "NOTATION_OBJECT"},
{DOMMemoryManager::NodeObjectType::PROCESSING_INSTRUCTION_OBJECT , "PROCESSING_INSTRUCTION_OBJECT"},
{DOMMemoryManager::NodeObjectType::TEXT_OBJECT , "TEXT_OBJECT"}
};
std::cout<<"New for "<<maps[type]<<" size=0x"<<std::hex<<amount<<std::endl;
if (!fRecycleNodePtr)
return allocate(amount);
DOMNodePtr* ptr = fRecycleNodePtr->operator[](type);
if (!ptr || ptr->empty())
return allocate(amount);
return (void*) ptr->pop();
}
一个简单的XML及日志:
<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<TopNode>
<SectionOfDataA>
<TestData>MEMORY COST 1 1</TestData>
<TestData>MEMORY COST 2 1</TestData>
</SectionOfDataA>
</TopNode>
New for ELEMENT_OBJECT size=0x68
New for TEXT_OBJECT size=0x38
New for ELEMENT_OBJECT size=0x68
New for TEXT_OBJECT size=0x38
New for ELEMENT_OBJECT size=0x68
New for TEXT_OBJECT size=0x38
New for TEXT_OBJECT size=0x38
New for ELEMENT_OBJECT size=0x68
New for TEXT_OBJECT size=0x38
New for TEXT_OBJECT size=0x38
New for TEXT_OBJECT size=0x38
可见DOMElementImpl的SIZE为0x68, DOMTextImpl为0x38, 这得顶多少个字符串!
4) demo
抽取了xerces-c++关于内存管理的代码,便于demo或学习使用,请移步下面的链接下载。