当前位置: 首页 > news >正文

【x265】预测模块的简单分析—帧间预测

目录

  • 1. 帧间预测概述
    • 1.1 编码块结构
    • 1.2 运动估计
      • 1.2.1 运动估计准则
      • 1.2.2 运动搜索
    • 1.3 MV预测技术
      • 1.3.1 Merge模式
      • 1.3.2 AMVP技术
  • 2. 帧间预测入口函数(compressInterCU_rd0_4)
    • 2.1 检查Merge/Skip模式(checkMerge2Nx2N_rd0_4)
      • 2.1.1 获取Merge候选列表(getInterMergeCandidates)
      • 2.1.2 运动补偿(motionCompensation)
        • 2.1.2.1 获取预测块(predInterLumaPixel)
      • 2.1.3 计算不编码残差的损失(encodeResAndCalcRdSkipCU)
      • 2.1.4 计算编码残差的损失(encodeResAndCalcRdInterCU)
    • 2.2 常规帧间预测(checkInter_rd0_4)
      • 2.2.1 帧间预测搜索(predInterSearch)
        • 2.2.1.1 对子PU评估merge模式(mergeEstimation)
        • 2.2.1.2 AMPV的实现
        • 2.2.1.3 进行运动估计(motionEstimate)
    • 2.3 P帧当中的Intra模式(checkIntraInInter)

x265相关:
【x265】x265编码器参数配置
【x265】预测模块的简单分析—帧内预测

1. 帧间预测概述

1.1 编码块结构

帧间预测是编码器中降低编码耗时和编码码率的最有效工具之一,通过时域上的相邻参考,能够大幅度降低编码码率,从而节省网络带宽。在x265当中,帧间预测(Inter Prediction,下文简称Inter模式)是基于PU实现和操作的,它能够将一个CU划分成为若干个子区域,分别实现预测功能,与帧内预测(Intra Prediction,下文简称Intra模式)不同,Inter模式能够将CU分成不规则的PU尺寸,如下所示,一共8种

//    2Nx2N               2NxN 				   Nx2N				   NxN
+---+---+---+---+ 	+---+---+---+---+ 	+---+---+---+---+ 	+---+---+---+---+ 
|               |	|   			|	|       |       |	|       |       |1. 			+	+				+	+		+		+	+		+		+
|   			|   |   			|   |    	|       |   |       |       |2. 			+	+---+---+---+---+	+       +		+	+---+---+---+---+
|   			|	|   			|	|       |       |	|       |       |3. 			+	+				+	+		+		+	+		+		+
|   			|	|   			|	|       |       |	|       |       |
+---+---+---+---+	+---+---+---+---+	+---+---+---+---+	+---+---+---+---+
//    2NxnU				  2NxnD				  nLx2N              nRx2N
+---+---+---+---+ 	+---+---+---+---+ 	+---+---+---+---+ 	+---+---+---+---+ 
|               |	|   			|	|   |   	    |	|           |   |
+---+---+---+---+	+				+	+	+			+	+		    +	+
|   			|   |   			|   |   |       	|   |       	|   |4. 			+	+				+	+   +			+	+			+	+
|   			|	|   			|	|   |       	|	|       	|   |5. 			+	+---+---+---+---+	+	+			+	+			+	+
|   			|	|   			|	|   |       	|	|       	|   |
+---+---+---+---+	+---+---+---+---+	+---+---+---+---+	+---+---+---+---+

1.2 运动估计

通常在视频播放时,前后帧具有比较强的关联性,一个比较好的思考是,在前后图像中找到两个很相似的块,并利用一个运动偏移量来描述这两个块之间的位置差异,前面的帧编码图像块,后续的帧只对这个位置偏移量进行编码,就能够节省编码比特。找到这个运动偏移量的过程叫做运动估计(Motion Estimation,ME),为了找到这两个很相似的块,需要考虑两个问题:

  1. 如何描述这两个块的差异程度
  2. 如何高效的去找到这两个块

1.2.1 运动估计准则

在Inter模式中,描述参考块(下称refBlock)和当前块(下称curBlock)的方式主要为SAD和SATD,另外需要加上对应MV使用的比特开销,即率失真优化公式 J = D + lambda * R

1.2.2 运动搜索

在x265中使用的运动搜索(Motion Search,MS)分为几个步骤:

  1. 整像素搜索
    (1) 菱形搜索(X265_DIA_SEARCH)
    (1) 六边形搜索(X265_HEX_SEARCH)
    这两种搜索方式和x264当中的类似,可以参考雷博的文章:x264源代码简单分析:宏块分析(Analysis)部分-帧间宏块(Inter)。不同之处在于,如果使用HEX搜索,在x265中还会多循环几次,使用半HEX快速搜索,扩大搜索范围,因为x265当中CU的尺寸要更大一些
  2. 1/2像素搜索
  3. 1/4像素搜索

PS:整像素搜索使用的是SAD来描述损失大小,1/2和1/4像素搜索使用的是SATD来描述损失大小。另外,不使用1/8像素搜索的原因是带来的性能增益不明显

1.3 MV预测技术

在Inter模式中,使用了Merge和AMVP两项技术,辅助实现更好的Inter编码。其中,Merge技术可以看作成一种编码模式,在x265中有专门的宏定义这种模式,并且在实际编码时也会将merge相关信息写入码流(例如m_entropyCoder.codeMergeIndex(cu, 0)),不存在MVD(MV Difference);而AMVP技术可以看成一种MV预测技术,编码器只需要对实际MV和预测MV的差值进行编码,因此是存在MVD的

1.3.1 Merge模式

Merge模式为当前PU构建一个MV候选列表,这个候选列表存在5个候选MV。通过遍历这个列表,从5个候选MV中选择一个最佳的MV作为Merge模式的MV,merge mv会在后续的帧间预测流程中提供有力指导。

Merge列表的构建分为空域候选列表和时域候选列表两个部分:

  1. 空域候选列表的构建
    空域候选列表的构建顺序 = { A1, B1, B0, A0, B2 },列表从左到右进行顺序构建,空域候选列表至多包含4个候选MV
    在这里插入图片描述
    对于下列使用矩形划分方式中的PU 2,其候选模式需要做额外处理。下图(a)中的情形,PU2的候选列表中不能存在A1的运动信息,因为如果PU2使用了A1(即PU1)中的信息,则PU1和PU2的MV会一样,这与2NxN的划分方式就没有区别了。同理,对于图(b)中的情形,PU2的列表中不能存在B1的运动信息
    在这里插入图片描述
  2. 时域候选列表的建立
    时域MV候选列表的建立利用了当前PU在邻近已编码图像中对应位置PU(同位PU)的运动信息,但不是直接使用,而是根据当前帧与参考帧的相对位置做对应的比例伸缩调整。图示如下,其中cur_PU为当前预测PU,col_PU为相邻已编码帧的同位PU,cur_ref为当前帧的参考帧,col_ref为相邻已编码帧的参考帧
    在这里插入图片描述
    当前PU的时域候选MV的计算公式为
    c u r M V = t d t b c o l M V curMV = \frac{td}{tb}colMV curMV=tbtdcolMV
    时域候选列表中同位块的位置位于右下角H块,如果H块不存在,则使用C3来代替。时域候选列表最多只提供1个候选MV
    在这里插入图片描述

PS:如果merge模式前面两步的操作之后,候选列表不足5个,就填充(0, 0)

1.3.2 AMVP技术

AMVP技术与merge有类似之处,同样使用了空域和时域上运动向量的相关性。

  1. 空域候选列表的建立
    沿用merge模式使用的相邻块编号,AMVP空域候选列表分别从左侧和上方各产生一个候选预测MV,左侧选择的顺序 = { A0, A1, scaled A0, scaled A1 },上方选择的顺序 = { B0, B1, B2, (scaled B0, scaled B1, scaled B2) },这里的scaled和merge中利用同位块计算当前块MV的方式相同。对于上方选择的顺序而言,MV的比例伸缩只有在左侧两个PU都不可用或者都是Intra模式时才会进行。同时,只有当相邻块候选MV指向的参考帧与当前PU相同时,才可以直接使用相邻MV,否则需要对其进行scale

    另外,AMVP技术中空域候选列表至多包含2个候选MV(merge模式至多包含4个)

  2. 时域候选列表的建立
    与merge模式构建的方式一致

PS:如果AMVP技术经过前面两个步骤之后,候选列表中不足2个候选MV,就填充(0, 0)。另外,AMVP技术在实际编码时,会对MV进行差分编码,即只编码MVD

2. 帧间预测入口函数(compressInterCU_rd0_4)

在x265的帧间预测入口函数中,仅简单分析compressInterCU_rd0_4(),函数中的0和4表示如果rdLevel位于0~4之间则使用这个函数,因为默认的配置中rdLevel=3,所以默认会使用这个函数进行帧间预测

函数的定义位于encoder\analysis.cpp中,其主要的工作流程为
(1)评估使用merge和skip模式带来的损失(checkMerge2Nx2N_rd0_4)
(2)评估划分成为4个子块带来的损失(递归调用compressInterCU_rd0_4)
(3)评估当前深度各种划分模式和Intra模式带来的损失(checkInter_rd0_4,checkIntraInInter)

SplitData Analysis::compressInterCU_rd0_4(const CUData& parentCTU, const CUGeom& cuGeom, int32_t qp)
{if (parentCTU.m_vbvAffected && calculateQpforCuSize(parentCTU, cuGeom, 1))return compressInterCU_rd5_6(parentCTU, cuGeom, qp);uint32_t depth = cuGeom.depth;uint32_t cuAddr = parentCTU.m_cuAddr;ModeDepth& md = m_modeDepth[depth];// searchMethod默认为X265_HEX_SEARCHif (m_param->searchMethod == X265_SEA){int numPredDir = m_slice->isInterP() ? 1 : 2;int offset = (int)(m_frame->m_reconPic->m_cuOffsetY[parentCTU.m_cuAddr] + m_frame->m_reconPic->m_buOffsetY[cuGeom.absPartIdx]);for (int list = 0; list < numPredDir; list++)for (int i = 0; i < m_frame->m_encData->m_slice->m_numRefIdx[list]; i++)for (int planes = 0; planes < INTEGRAL_PLANE_NUM; planes++)m_modeDepth[depth].fencYuv.m_integral[list][i][planes] = m_frame->m_encData->m_slice->m_refFrameList[list][i]->m_encData->m_meIntegral[planes] + offset;}PicYuv& reconPic = *m_frame->m_reconPic;SplitData splitCUData;// 是否进行hevc的分析(x265似乎对AVC做了兼容)bool bHEVCBlockAnalysis = (m_param->bAnalysisType == AVC_INFO && cuGeom.numPartitions > 16);// 是否进行avc分析的refinebool bRefineAVCAnalysis = (m_param->analysisLoadReuseLevel == 7 && (m_modeFlag[0] || m_modeFlag[1]));// no-off loading,如果为true,表示不会将CPU当中的任务移动到其他处理器(如GPU等)上面进行bool bNooffloading = !(m_param->bAnalysisType == AVC_INFO);if (bHEVCBlockAnalysis || bRefineAVCAnalysis || bNooffloading){md.bestMode = NULL;bool mightSplit = !(cuGeom.flags & CUGeom::LEAF);bool mightNotSplit = !(cuGeom.flags & CUGeom::SPLIT_MANDATORY);uint32_t minDepth = topSkipMinDepth(parentCTU, cuGeom);bool bDecidedDepth = parentCTU.m_cuDepth[cuGeom.absPartIdx] == depth;bool skipModes = false; /* Skip any remaining mode analyses at current depth */bool skipRecursion = false; /* Skip recursion */bool splitIntra = true;bool skipRectAmp = false;bool chooseMerge = false;bool bCtuInfoCheck = false;int sameContentRef = 0;if (m_evaluateInter){if (m_refineLevel == 2){if (parentCTU.m_predMode[cuGeom.absPartIdx] == MODE_SKIP)skipModes = true;if (parentCTU.m_partSize[cuGeom.absPartIdx] == SIZE_2Nx2N)skipRectAmp = true;}mightSplit &= false;minDepth = depth;}if ((m_limitTU & X265_TU_LIMIT_NEIGH) && cuGeom.log2CUSize >= 4)m_maxTUDepth = loadTUDepth(cuGeom, parentCTU);SplitData splitData[4];splitData[0].initSplitCUData();splitData[1].initSplitCUData();splitData[2].initSplitCUData();splitData[3].initSplitCUData();// avoid uninitialize value in below referenceif (m_param->limitModes){md.pred[PRED_2Nx2N].bestME[0][0].mvCost = 0; // L0md.pred[PRED_2Nx2N].bestME[0][1].mvCost = 0; // L1md.pred[PRED_2Nx2N].sa8dCost = 0;}if (m_param->bCTUInfo && depth <= parentCTU.m_cuDepth[cuGeom.absPartIdx]){if (bDecidedDepth && m_additionalCtuInfo[cuGeom.absPartIdx])sameContentRef = findSameContentRefCount(parentCTU, cuGeom);if (depth < parentCTU.m_cuDepth[cuGeom.absPartIdx]){mightNotSplit &= bDecidedDepth;bCtuInfoCheck = skipRecursion = false;skipModes = true;}else if (mightNotSplit && bDecidedDepth){if (m_additionalCtuInfo[cuGeom.absPartIdx]){bCtuInfoCheck = skipRecursion = true;md.pred[PRED_MERGE].cu.initSubCU(parentCTU, cuGeom, qp);md.pred[PRED_SKIP].cu.initSubCU(parentCTU, cuGeom, qp);checkMerge2Nx2N_rd0_4(md.pred[PRED_SKIP], md.pred[PRED_MERGE], cuGeom);if (!sameContentRef){if ((m_param->bCTUInfo & 2) && (m_slice->m_pps->bUseDQP && depth <= m_slice->m_pps->maxCuDQPDepth)){qp -= int32_t(0.04 * qp);setLambdaFromQP(parentCTU, qp);}if (m_param->bCTUInfo & 4)skipModes = false;}if (sameContentRef || (!sameContentRef && !(m_param->bCTUInfo & 4))){if (m_param->rdLevel)skipModes = m_param->bEnableEarlySkip && md.bestMode && md.bestMode->cu.isSkipped(0);if ((m_param->bCTUInfo & 4) && sameContentRef)skipModes = md.bestMode && true;}}else{md.pred[PRED_MERGE].cu.initSubCU(parentCTU, cuGeom, qp);md.pred[PRED_SKIP].cu.initSubCU(parentCTU, cuGeom, qp);checkMerge2Nx2N_rd0_4(md.pred[PRED_SKIP], md.pred[PRED_MERGE], cuGeom);if (m_param->rdLevel)skipModes = m_param->bEnableEarlySkip && md.bestMode && md.bestMode->cu.isSkipped(0);}mightSplit &= !bDecidedDepth;}}if ((m_param->analysisLoadReuseLevel > 1 && m_param->analysisLoadReuseLevel != 10)){if (mightNotSplit && depth == m_reuseDepth[cuGeom.absPartIdx]){if (m_reuseModes[cuGeom.absPartIdx] == MODE_SKIP){md.pred[PRED_MERGE].cu.initSubCU(parentCTU, cuGeom, qp);md.pred[PRED_SKIP].cu.initSubCU(parentCTU, cuGeom, qp);checkMerge2Nx2N_rd0_4(md.pred[PRED_SKIP], md.pred[PRED_MERGE], cuGeom);skipRecursion = !!m_param->recursionSkipMode && md.bestMode;if (m_param->rdLevel)skipModes = m_param->bEnableEarlySkip && md.bestMode;}if (m_param->analysisLoadReuseLevel > 4 && m_reusePartSize[cuGeom.absPartIdx] == SIZE_2Nx2N){if (m_reuseModes[cuGeom.absPartIdx] != MODE_INTRA  && m_reuseModes[cuGeom.absPartIdx] != 4){skipRectAmp = true && !!md.bestMode;chooseMerge = !!m_reuseMergeFlag[cuGeom.absPartIdx] && !!md.bestMode;}}}}if (m_param->analysisMultiPassRefine && m_param->rc.bStatRead && m_reuseInterDataCTU) {if (mightNotSplit && depth == m_reuseDepth[cuGeom.absPartIdx]){if (m_reuseModes[cuGeom.absPartIdx] == MODE_SKIP){md.pred[PRED_MERGE].cu.initSubCU(parentCTU, cuGeom, qp);md.pred[PRED_SKIP].cu.initSubCU(parentCTU, cuGeom, qp);checkMerge2Nx2N_rd0_4(md.pred[PRED_SKIP], md.pred[PRED_MERGE], cuGeom);skipRecursion = !!m_param->recursionSkipMode && md.bestMode;if (m_param->rdLevel)skipModes = m_param->bEnableEarlySkip && md.bestMode;}}}/* Step 1. Evaluate Merge/Skip candidates for likely early-outs, if skip mode was not set above */// 1. 对Merge、Skip候选模式进行评估以确定是否可以提前终止某些计算过程(如果skip模式在前面没有配置)if ((mightNotSplit && depth >= minDepth && !md.bestMode && !bCtuInfoCheck) || (m_param->bAnalysisType == AVC_INFO && m_param->analysisLoadReuseLevel == 7 && (m_modeFlag[0] || m_modeFlag[1])))/* TODO: Re-evaluate if analysis load/save still works */{/* Compute Merge Cost */// 初始化merge和skip模式的CUmd.pred[PRED_MERGE].cu.initSubCU(parentCTU, cuGeom, qp);md.pred[PRED_SKIP].cu.initSubCU(parentCTU, cuGeom, qp);// 进行merge模式和skip模式的帧间预测checkMerge2Nx2N_rd0_4(md.pred[PRED_SKIP], md.pred[PRED_MERGE], cuGeom);if (m_param->rdLevel)skipModes = (m_param->bEnableEarlySkip || m_refineLevel == 2)&& md.bestMode && md.bestMode->cu.isSkipped(0); // TODO: sa8d threshold per depth}if (md.bestMode && m_param->recursionSkipMode && !bCtuInfoCheck && !(m_param->bAnalysisType == AVC_INFO && m_param->analysisLoadReuseLevel == 7 && (m_modeFlag[0] || m_modeFlag[1]))){skipRecursion = md.bestMode->cu.isSkipped(0);if (mightSplit && !skipRecursion){if (depth >= minDepth && m_param->recursionSkipMode == RDCOST_BASED_RSKIP){if (depth)skipRecursion = recursionDepthCheck(parentCTU, cuGeom, *md.bestMode);if (m_bHD && !skipRecursion && m_param->rdLevel == 2 && md.fencYuv.m_size != MAX_CU_SIZE)skipRecursion = complexityCheckCU(*md.bestMode);}else if (cuGeom.log2CUSize >= MAX_LOG2_CU_SIZE - 1 && m_param->recursionSkipMode == EDGE_BASED_RSKIP){skipRecursion = complexityCheckCU(*md.bestMode);}}}// 检查是否需要跳过递归划分if (m_param->bAnalysisType == AVC_INFO && md.bestMode && cuGeom.numPartitions <= 16 && m_param->analysisLoadReuseLevel == 7)skipRecursion = true;/* Step 2. Evaluate each of the 4 split sub-blocks in series */// 评估4个子块的Inter模式if (mightSplit && !skipRecursion){if (bCtuInfoCheck && m_param->bCTUInfo & 2)qp = int((1 / 0.96) * qp + 0.5);Mode* splitPred = &md.pred[PRED_SPLIT];splitPred->initCosts();CUData* splitCU = &splitPred->cu;splitCU->initSubCU(parentCTU, cuGeom, qp);uint32_t nextDepth = depth + 1;ModeDepth& nd = m_modeDepth[nextDepth];invalidateContexts(nextDepth);Entropy* nextContext = &m_rqt[depth].cur;int nextQP = qp;splitIntra = false;for (uint32_t subPartIdx = 0; subPartIdx < 4; subPartIdx++){const CUGeom& childGeom = *(&cuGeom + cuGeom.childOffset + subPartIdx);if (childGeom.flags & CUGeom::PRESENT){m_modeDepth[0].fencYuv.copyPartToYuv(nd.fencYuv, childGeom.absPartIdx);m_rqt[nextDepth].cur.load(*nextContext);if (m_slice->m_pps->bUseDQP && nextDepth <= m_slice->m_pps->maxCuDQPDepth)nextQP = setLambdaFromQP(parentCTU, calculateQpforCuSize(parentCTU, childGeom));// 进行4个子块的帧间预测splitData[subPartIdx] = compressInterCU_rd0_4(parentCTU, childGeom, nextQP);// Save best CU and pred data for this sub CUsplitIntra |= nd.bestMode->cu.isIntra(0);splitCU->copyPartFrom(nd.bestMode->cu, childGeom, subPartIdx);splitPred->addSubCosts(*nd.bestMode);if (m_param->rdLevel)nd.bestMode->reconYuv.copyToPartYuv(splitPred->reconYuv, childGeom.numPartitions * subPartIdx);elsend.bestMode->predYuv.copyToPartYuv(splitPred->predYuv, childGeom.numPartitions * subPartIdx);if (m_param->rdLevel > 1)nextContext = &nd.bestMode->contexts;}elsesplitCU->setEmptyPart(childGeom, subPartIdx);}nextContext->store(splitPred->contexts);if (mightNotSplit)addSplitFlagCost(*splitPred, cuGeom.depth);else if (m_param->rdLevel > 1)updateModeCost(*splitPred);elsesplitPred->sa8dCost = m_rdCost.calcRdSADCost((uint32_t)splitPred->distortion, splitPred->sa8dBits);}/* If analysis mode is simple do not Evaluate other modes */if (m_param->bAnalysisType == AVC_INFO && m_param->analysisLoadReuseLevel == 7){if (m_slice->m_sliceType == P_SLICE){if (m_checkMergeAndSkipOnly[0])skipModes = true;}else{if (m_checkMergeAndSkipOnly[0] && m_checkMergeAndSkipOnly[1])skipModes = true;}}/* Split CUs*   0  1*   2  3 */uint32_t allSplitRefs = splitData[0].splitRefs | splitData[1].splitRefs | splitData[2].splitRefs | splitData[3].splitRefs;/* Step 3. Evaluate ME (2Nx2N, rect, amp) and intra modes at current depth */// 评估当前深度的ME和intra模式if (mightNotSplit && (depth >= minDepth || (m_param->bCTUInfo && !md.bestMode))){if (m_slice->m_pps->bUseDQP && depth <= m_slice->m_pps->maxCuDQPDepth && m_slice->m_pps->maxCuDQPDepth != 0)setLambdaFromQP(parentCTU, qp);// /*检查是否是skip模式(1)如果是skip模式,跳过当前深度的inter prediction(2)如果不是skip模式,进入下面的inter prediction,会按照顺序去检查各种划分方式(a)2Nx2N(b)矩形划分(i)  2NxN, Nx2N(ii) 2NxnD, 2NxnU(iii)nRx2N, nLx2N*/if (!skipModes){uint32_t refMasks[2];refMasks[0] = allSplitRefs;md.pred[PRED_2Nx2N].cu.initSubCU(parentCTU, cuGeom, qp);// 2Nx2N的帧间预测checkInter_rd0_4(md.pred[PRED_2Nx2N], cuGeom, SIZE_2Nx2N, refMasks);if (m_param->limitReferences & X265_REF_LIMIT_CU){CUData& cu = md.pred[PRED_2Nx2N].cu;uint32_t refMask = cu.getBestRefIdx(0);allSplitRefs = splitData[0].splitRefs = splitData[1].splitRefs = splitData[2].splitRefs = splitData[3].splitRefs = refMask;}// B帧的2Nx2N帧间预测(没有研究)if (m_slice->m_sliceType == B_SLICE){md.pred[PRED_BIDIR].cu.initSubCU(parentCTU, cuGeom, qp);checkBidir2Nx2N(md.pred[PRED_2Nx2N], md.pred[PRED_BIDIR], cuGeom);}Mode *bestInter = &md.pred[PRED_2Nx2N];// 检查是否进行rect模式预测,即矩形划分方式if (!skipRectAmp){/*2NxN划分		 Nx2N划分+---+---+		+---+---+|       |		|   |   |+---+---+		+   +   +|       |		|	|	|+---+---+		+---+---+*/// 检查是否允许进行矩形分割(非正方形)if (m_param->bEnableRectInter){// 计算划分成为4个子块的总损失uint64_t splitCost = splitData[0].sa8dCost + splitData[1].sa8dCost + splitData[2].sa8dCost + splitData[3].sa8dCost;uint32_t threshold_2NxN, threshold_Nx2N;/*(1)如果是P帧,取出前向cost(2)如果是B帧,求前后向的平均cost*/if (m_slice->m_sliceType == P_SLICE){threshold_2NxN = splitData[0].mvCost[0] + splitData[1].mvCost[0];threshold_Nx2N = splitData[0].mvCost[0] + splitData[2].mvCost[0];}else{threshold_2NxN = (splitData[0].mvCost[0] + splitData[1].mvCost[0]+ splitData[0].mvCost[1] + splitData[1].mvCost[1] + 1) >> 1;threshold_Nx2N = (splitData[0].mvCost[0] + splitData[2].mvCost[0]+ splitData[0].mvCost[1] + splitData[2].mvCost[1] + 1) >> 1;}/*下面代码的逻辑(1)如果try_2NxN_first = true,则按照检查顺序的1和2执行(2)如果try_Nx2N_first = true, 则按照检查顺序的2和3执行*/int try_2NxN_first = threshold_2NxN < threshold_Nx2N;/*检查顺序1splitCost:划分成为4个子块的损失md.pred[PRED_2Nx2N].sa8dCost:按照2Nx2N模式进行预测的损失threshold_2NxN:划分成2NxN的阈值如果满足下面的不等式关系,表示使用2NxN有可能损失更小*/if (try_2NxN_first && splitCost < md.pred[PRED_2Nx2N].sa8dCost + threshold_2NxN){// 上半部分refMasks[0] = splitData[0].splitRefs | splitData[1].splitRefs; /* top */// 下半部分refMasks[1] = splitData[2].splitRefs | splitData[3].splitRefs; /* bot */md.pred[PRED_2NxN].cu.initSubCU(parentCTU, cuGeom, qp);// 检查2NxN帧间预测损失checkInter_rd0_4(md.pred[PRED_2NxN], cuGeom, SIZE_2NxN, refMasks);if (md.pred[PRED_2NxN].sa8dCost < bestInter->sa8dCost)bestInter = &md.pred[PRED_2NxN];}/*检查顺序2splitCost:划分成为4个子块的损失md.pred[PRED_2Nx2N].sa8dCost:按照2Nx2N模式进行预测的损失threshold_Nx2N:划分成Nx2N的阈值如果满足下面的不等式关系,表示使用Nx2N有可能损失更小*/if (splitCost < md.pred[PRED_2Nx2N].sa8dCost + threshold_Nx2N){refMasks[0] = splitData[0].splitRefs | splitData[2].splitRefs; /* left */refMasks[1] = splitData[1].splitRefs | splitData[3].splitRefs; /* right */md.pred[PRED_Nx2N].cu.initSubCU(parentCTU, cuGeom, qp);checkInter_rd0_4(md.pred[PRED_Nx2N], cuGeom, SIZE_Nx2N, refMasks);if (md.pred[PRED_Nx2N].sa8dCost < bestInter->sa8dCost)bestInter = &md.pred[PRED_Nx2N];}// 检查顺序3if (!try_2NxN_first && splitCost < md.pred[PRED_2Nx2N].sa8dCost + threshold_2NxN){refMasks[0] = splitData[0].splitRefs | splitData[1].splitRefs; /* top */refMasks[1] = splitData[2].splitRefs | splitData[3].splitRefs; /* bot */md.pred[PRED_2NxN].cu.initSubCU(parentCTU, cuGeom, qp);checkInter_rd0_4(md.pred[PRED_2NxN], cuGeom, SIZE_2NxN, refMasks);if (md.pred[PRED_2NxN].sa8dCost < bestInter->sa8dCost)bestInter = &md.pred[PRED_2NxN];}}// 检查(SIZE_2NxnU, SIZE_2NxnD, SIZE_nLx2N, SIZE_nRx2N)if (m_slice->m_sps->maxAMPDepth > depth){uint64_t splitCost = splitData[0].sa8dCost + splitData[1].sa8dCost + splitData[2].sa8dCost + splitData[3].sa8dCost;uint32_t threshold_2NxnU, threshold_2NxnD, threshold_nLx2N, threshold_nRx2N;// 根据帧类型获取thresholdif (m_slice->m_sliceType == P_SLICE){threshold_2NxnU = splitData[0].mvCost[0] + splitData[1].mvCost[0];threshold_2NxnD = splitData[2].mvCost[0] + splitData[3].mvCost[0];threshold_nLx2N = splitData[0].mvCost[0] + splitData[2].mvCost[0];threshold_nRx2N = splitData[1].mvCost[0] + splitData[3].mvCost[0];}else{threshold_2NxnU = (splitData[0].mvCost[0] + splitData[1].mvCost[0]+ splitData[0].mvCost[1] + splitData[1].mvCost[1] + 1) >> 1;threshold_2NxnD = (splitData[2].mvCost[0] + splitData[3].mvCost[0]+ splitData[2].mvCost[1] + splitData[3].mvCost[1] + 1) >> 1;threshold_nLx2N = (splitData[0].mvCost[0] + splitData[2].mvCost[0]+ splitData[0].mvCost[1] + splitData[2].mvCost[1] + 1) >> 1;threshold_nRx2N = (splitData[1].mvCost[0] + splitData[3].mvCost[0]+ splitData[1].mvCost[1] + splitData[3].mvCost[1] + 1) >> 1;}/*检查是否进行水平或者垂直的划分(1)如果partSize = 2Nx2N,则进行水平划分尝试(2)如果partSize = Nx2N,则进行垂直划分尝试(3)如果partSize = 2Nx2N,并且四叉树根节点有非零系数,则同时采用水平和垂直划分尝试*/bool bHor = false, bVer = false;if (bestInter->cu.m_partSize[0] == SIZE_2NxN)bHor = true;else if (bestInter->cu.m_partSize[0] == SIZE_Nx2N)bVer = true;else if (bestInter->cu.m_partSize[0] == SIZE_2Nx2N &&md.bestMode && md.bestMode->cu.getQtRootCbf(0)){bHor = true;bVer = true;}// 尝试水平划分if (bHor){// 检查2NxnD是否优先,确定检查顺序/*2NxnD						2NxnU+--+--+--+--+				+--+--+--+--+|			|				|			|	25% top+			+				+--+--+--+--+|			|	75% top		|			|+			+				+			+|			|				|			|	75% bottom+--+--+--+--+				+			+|			|	25% bottom	|			|+--+--+--+--+				+--+--+--+--+*/int try_2NxnD_first = threshold_2NxnD < threshold_2NxnU;if (try_2NxnD_first && splitCost < md.pred[PRED_2Nx2N].sa8dCost + threshold_2NxnD){refMasks[0] = allSplitRefs;                                    /* 75% top */refMasks[1] = splitData[2].splitRefs | splitData[3].splitRefs; /* 25% bot */md.pred[PRED_2NxnD].cu.initSubCU(parentCTU, cuGeom, qp);// 检查2NxnDcheckInter_rd0_4(md.pred[PRED_2NxnD], cuGeom, SIZE_2NxnD, refMasks);if (md.pred[PRED_2NxnD].sa8dCost < bestInter->sa8dCost)bestInter = &md.pred[PRED_2NxnD];}if (splitCost < md.pred[PRED_2Nx2N].sa8dCost + threshold_2NxnU){refMasks[0] = splitData[0].splitRefs | splitData[1].splitRefs; /* 25% top */refMasks[1] = allSplitRefs;                                    /* 75% bot */md.pred[PRED_2NxnU].cu.initSubCU(parentCTU, cuGeom, qp);// 检查2NxnUcheckInter_rd0_4(md.pred[PRED_2NxnU], cuGeom, SIZE_2NxnU, refMasks);if (md.pred[PRED_2NxnU].sa8dCost < bestInter->sa8dCost)bestInter = &md.pred[PRED_2NxnU];}if (!try_2NxnD_first && splitCost < md.pred[PRED_2Nx2N].sa8dCost + threshold_2NxnD){refMasks[0] = allSplitRefs;                                    /* 75% top */refMasks[1] = splitData[2].splitRefs | splitData[3].splitRefs; /* 25% bot */md.pred[PRED_2NxnD].cu.initSubCU(parentCTU, cuGeom, qp);checkInter_rd0_4(md.pred[PRED_2NxnD], cuGeom, SIZE_2NxnD, refMasks);if (md.pred[PRED_2NxnD].sa8dCost < bestInter->sa8dCost)bestInter = &md.pred[PRED_2NxnD];}}// 尝试垂直划分if (bVer){	/*nRx2N75% left       25% left+--+--+--+--+    +--+--+--+--+|		 |  |	 |  |        |+        +  +	 +  +        +|        |  |	 |  |        |+        +  +	 +  +        +|        |  |	 |  |        |+        +  +    +  +        +|        |  |    |  |        |+--+--+--+--+	 +--+--+--+--+25% right     75% right*/int try_nRx2N_first = threshold_nRx2N < threshold_nLx2N;if (try_nRx2N_first && splitCost < md.pred[PRED_2Nx2N].sa8dCost + threshold_nRx2N){refMasks[0] = allSplitRefs;                                    /* 75% left  */refMasks[1] = splitData[1].splitRefs | splitData[3].splitRefs; /* 25% right */md.pred[PRED_nRx2N].cu.initSubCU(parentCTU, cuGeom, qp);checkInter_rd0_4(md.pred[PRED_nRx2N], cuGeom, SIZE_nRx2N, refMasks);if (md.pred[PRED_nRx2N].sa8dCost < bestInter->sa8dCost)bestInter = &md.pred[PRED_nRx2N];}if (splitCost < md.pred[PRED_2Nx2N].sa8dCost + threshold_nLx2N){refMasks[0] = splitData[0].splitRefs | splitData[2].splitRefs; /* 25% left  */refMasks[1] = allSplitRefs;                                    /* 75% right */md.pred[PRED_nLx2N].cu.initSubCU(parentCTU, cuGeom, qp);checkInter_rd0_4(md.pred[PRED_nLx2N], cuGeom, SIZE_nLx2N, refMasks);if (md.pred[PRED_nLx2N].sa8dCost < bestInter->sa8dCost)bestInter = &md.pred[PRED_nLx2N];}if (!try_nRx2N_first && splitCost < md.pred[PRED_2Nx2N].sa8dCost + threshold_nRx2N){refMasks[0] = allSplitRefs;                                    /* 75% left  */refMasks[1] = splitData[1].splitRefs | splitData[3].splitRefs; /* 25% right */md.pred[PRED_nRx2N].cu.initSubCU(parentCTU, cuGeom, qp);checkInter_rd0_4(md.pred[PRED_nRx2N], cuGeom, SIZE_nRx2N, refMasks);if (md.pred[PRED_nRx2N].sa8dCost < bestInter->sa8dCost)bestInter = &md.pred[PRED_nRx2N];}}}}/*检查是否需要进行intra模式的尝试,需要满足的条件为(1)sliceType不为B帧,或者允许B帧中使用intra模式(2)CUSize不能为64(3)bCTUInfo第三位不能为1(没研究过,但bCTUInfo默认为0)(4)bCtuInfoCheck表示是否启用基于CTU内容信息的编码策略调整*/bool bTryIntra = (m_slice->m_sliceType != B_SLICE || m_param->bIntraInBFrames) && cuGeom.log2CUSize != MAX_LOG2_CU_SIZE && !((m_param->bCTUInfo & 4) && bCtuInfoCheck);// rdLevel默认为3if (m_param->rdLevel >= 3){/* Calculate RD cost of best inter option */if ((!m_bChromaSa8d && (m_csp != X265_CSP_I400)) || (m_frame->m_fencPic->m_picCsp == X265_CSP_I400 && m_csp != X265_CSP_I400)) /* When m_bChromaSa8d is enabled, chroma MC has already been done */{uint32_t numPU = bestInter->cu.getNumPartInter(0);for (uint32_t puIdx = 0; puIdx < numPU; puIdx++){PredictionUnit pu(bestInter->cu, cuGeom, puIdx);motionCompensation(bestInter->cu, pu, bestInter->predYuv, false, true);}}// 不使用merge模式if (!chooseMerge){// 将前面确定的模式进行编码并计算RdCostencodeResAndCalcRdInterCU(*bestInter, cuGeom);checkBestMode(*bestInter, depth);/* If BIDIR is available and within 17/16 of best inter option, choose by RDO */// 如果BIDIR的损失小于等于最佳模式的17/16倍(应该是经验性参数)if (m_slice->m_sliceType == B_SLICE && md.pred[PRED_BIDIR].sa8dCost != MAX_INT64 &&md.pred[PRED_BIDIR].sa8dCost * 16 <= bestInter->sa8dCost * 17){uint32_t numPU = md.pred[PRED_BIDIR].cu.getNumPartInter(0);if (m_frame->m_fencPic->m_picCsp == X265_CSP_I400 && m_csp != X265_CSP_I400)for (uint32_t puIdx = 0; puIdx < numPU; puIdx++){PredictionUnit pu(md.pred[PRED_BIDIR].cu, cuGeom, puIdx);// BIDIR模式的运动补偿motionCompensation(md.pred[PRED_BIDIR].cu, pu, md.pred[PRED_BIDIR].predYuv, true, true);}// 计算BIDIR模式的损失encodeResAndCalcRdInterCU(md.pred[PRED_BIDIR], cuGeom);checkBestMode(md.pred[PRED_BIDIR], depth);}}// 尝试intra模式if ((bTryIntra && md.bestMode->cu.getQtRootCbf(0)) ||md.bestMode->sa8dCost == MAX_INT64){if (!m_param->limitReferences || splitIntra){ProfileCounter(parentCTU, totalIntraCU[cuGeom.depth]);md.pred[PRED_INTRA].cu.initSubCU(parentCTU, cuGeom, qp);checkIntraInInter(md.pred[PRED_INTRA], cuGeom);encodeIntraInInter(md.pred[PRED_INTRA], cuGeom);checkBestMode(md.pred[PRED_INTRA], depth);}else{ProfileCounter(parentCTU, skippedIntraCU[cuGeom.depth]);}}}else{/* SA8D choice between merge/skip, inter, bidir, and intra */if (!md.bestMode || bestInter->sa8dCost < md.bestMode->sa8dCost)md.bestMode = bestInter;if (m_slice->m_sliceType == B_SLICE &&md.pred[PRED_BIDIR].sa8dCost < md.bestMode->sa8dCost)md.bestMode = &md.pred[PRED_BIDIR];if (bTryIntra || md.bestMode->sa8dCost == MAX_INT64){if (!m_param->limitReferences || splitIntra){ProfileCounter(parentCTU, totalIntraCU[cuGeom.depth]);md.pred[PRED_INTRA].cu.initSubCU(parentCTU, cuGeom, qp);checkIntraInInter(md.pred[PRED_INTRA], cuGeom);if (md.pred[PRED_INTRA].sa8dCost < md.bestMode->sa8dCost)md.bestMode = &md.pred[PRED_INTRA];}else{ProfileCounter(parentCTU, skippedIntraCU[cuGeom.depth]);}}/* finally code the best mode selected by SA8D costs:* RD level 2 - fully encode the best mode* RD level 1 - generate recon pixels* RD level 0 - generate chroma prediction */if (md.bestMode->cu.m_mergeFlag[0] && md.bestMode->cu.m_partSize[0] == SIZE_2Nx2N){/* prediction already generated for this CU, and if rd level* is not 0, it is already fully encoded */}else if (md.bestMode->cu.isInter(0)){uint32_t numPU = md.bestMode->cu.getNumPartInter(0);if (m_csp != X265_CSP_I400){for (uint32_t puIdx = 0; puIdx < numPU; puIdx++){PredictionUnit pu(md.bestMode->cu, cuGeom, puIdx);motionCompensation(md.bestMode->cu, pu, md.bestMode->predYuv, false, true);}}if (m_param->rdLevel == 2)encodeResAndCalcRdInterCU(*md.bestMode, cuGeom);else if (m_param->rdLevel == 1){/* generate recon pixels with no rate distortion considerations */CUData& cu = md.bestMode->cu;uint32_t tuDepthRange[2];cu.getInterTUQtDepthRange(tuDepthRange, 0);m_rqt[cuGeom.depth].tmpResiYuv.subtract(*md.bestMode->fencYuv, md.bestMode->predYuv, cuGeom.log2CUSize, m_frame->m_fencPic->m_picCsp);residualTransformQuantInter(*md.bestMode, cuGeom, 0, 0, tuDepthRange);if (cu.getQtRootCbf(0))md.bestMode->reconYuv.addClip(md.bestMode->predYuv, m_rqt[cuGeom.depth].tmpResiYuv, cu.m_log2CUSize[0], m_frame->m_fencPic->m_picCsp);else{md.bestMode->reconYuv.copyFromYuv(md.bestMode->predYuv);if (cu.m_mergeFlag[0] && cu.m_partSize[0] == SIZE_2Nx2N)cu.setPredModeSubParts(MODE_SKIP);}}}else{if (m_param->rdLevel == 2)encodeIntraInInter(*md.bestMode, cuGeom);else if (m_param->rdLevel == 1){/* generate recon pixels with no rate distortion considerations */CUData& cu = md.bestMode->cu;uint32_t tuDepthRange[2];cu.getIntraTUQtDepthRange(tuDepthRange, 0);residualTransformQuantIntra(*md.bestMode, cuGeom, 0, 0, tuDepthRange);if (m_csp != X265_CSP_I400){getBestIntraModeChroma(*md.bestMode, cuGeom);residualQTIntraChroma(*md.bestMode, cuGeom, 0, 0);}md.bestMode->reconYuv.copyFromPicYuv(reconPic, cu.m_cuAddr, cuGeom.absPartIdx); // TODO:}}}} // !earlyskipif (m_bTryLossless)tryLossless(cuGeom);if (mightSplit)addSplitFlagCost(*md.bestMode, cuGeom.depth);}if (mightSplit && !skipRecursion){Mode* splitPred = &md.pred[PRED_SPLIT];if (!md.bestMode)md.bestMode = splitPred;else if (m_param->rdLevel > 1)checkBestMode(*splitPred, cuGeom.depth);else if (splitPred->sa8dCost < md.bestMode->sa8dCost)md.bestMode = splitPred;checkDQPForSplitPred(*md.bestMode, cuGeom);}/* determine which motion references the parent CU should search */splitCUData.initSplitCUData();if (m_param->limitReferences & X265_REF_LIMIT_DEPTH){if (md.bestMode == &md.pred[PRED_SPLIT])splitCUData.splitRefs = allSplitRefs;else{/* use best merge/inter mode, in case of intra use 2Nx2N inter references */CUData& cu = md.bestMode->cu.isIntra(0) ? md.pred[PRED_2Nx2N].cu : md.bestMode->cu;uint32_t numPU = cu.getNumPartInter(0);for (uint32_t puIdx = 0, subPartIdx = 0; puIdx < numPU; puIdx++, subPartIdx += cu.getPUOffset(puIdx, 0))splitCUData.splitRefs |= cu.getBestRefIdx(subPartIdx);}}if (m_param->limitModes){splitCUData.mvCost[0] = md.pred[PRED_2Nx2N].bestME[0][0].mvCost; // L0splitCUData.mvCost[1] = md.pred[PRED_2Nx2N].bestME[0][1].mvCost; // L1splitCUData.sa8dCost = md.pred[PRED_2Nx2N].sa8dCost;}// 最佳模式是skip模式,更新cu统计信息if (mightNotSplit && md.bestMode->cu.isSkipped(0)){FrameData& curEncData = *m_frame->m_encData;FrameData::RCStatCU& cuStat = curEncData.m_cuStat[parentCTU.m_cuAddr];uint64_t temp = cuStat.avgCost[depth] * cuStat.count[depth];cuStat.count[depth] += 1;cuStat.avgCost[depth] = (temp + md.bestMode->rdCost) / cuStat.count[depth];}/* Copy best data to encData CTU and recon */// 拷贝最新的data到recon缓冲区中md.bestMode->cu.copyToPic(depth);if (m_param->rdLevel)md.bestMode->reconYuv.copyToPicYuv(reconPic, cuAddr, cuGeom.absPartIdx);if ((m_limitTU & X265_TU_LIMIT_NEIGH) && cuGeom.log2CUSize >= 4){if (mightNotSplit){CUData* ctu = md.bestMode->cu.m_encData->getPicCTU(parentCTU.m_cuAddr);int8_t maxTUDepth = -1;for (uint32_t i = 0; i < cuGeom.numPartitions; i++)maxTUDepth = X265_MAX(maxTUDepth, md.bestMode->cu.m_tuDepth[i]);ctu->m_refTuDepth[cuGeom.geomRecurId] = maxTUDepth;}}}else{// ...}return splitCUData;
}

2.1 检查Merge/Skip模式(checkMerge2Nx2N_rd0_4)

函数的主要作用是检查merge模式和skip模式对应的损失,主要的工作流程为
(1)获取merge候选列表(getInterMergeCandidates)
(2)检查merge候选模式列表,确认最佳merge模式(使用了运动补偿motionCompensation,基于SAD)
(3)基于最佳merge模式,计算不编码残差的损失(encodeResAndCalcRdSkipCU,基于SSE)
(4)基于最佳merge模式,计算编码残差的损失(encodeResAndCalcRdInterCU,基于SSE)

PS:需要注意的是,这里说的Skip模式指的是基于最佳Merge模式,不对最佳Merge模式的残差进行编码的操作

/* sets md.bestMode if a valid merge candidate is found, else leaves it NULL */
void Analysis::checkMerge2Nx2N_rd0_4(Mode& skip, Mode& merge, const CUGeom& cuGeom)
{uint32_t depth = cuGeom.depth;ModeDepth& md = m_modeDepth[depth];Yuv *fencYuv = &md.fencYuv;/* Note that these two Mode instances are named MERGE and SKIP but they may* hold the reverse when the function returns. We toggle between the two modes */Mode* tempPred = &merge;Mode* bestPred = &skip;X265_CHECK(m_slice->m_sliceType != I_SLICE, "Evaluating merge in I slice\n");tempPred->initCosts();tempPred->cu.setPartSizeSubParts(SIZE_2Nx2N);tempPred->cu.setPredModeSubParts(MODE_INTER);tempPred->cu.m_mergeFlag[0] = true;bestPred->initCosts();bestPred->cu.setPartSizeSubParts(SIZE_2Nx2N);bestPred->cu.setPredModeSubParts(MODE_INTER);bestPred->cu.m_mergeFlag[0] = true;MVField candMvField[MRG_MAX_NUM_CANDS][2]; // double length for mv of both lists,存储MV列表uint8_t candDir[MRG_MAX_NUM_CANDS];	// 存储前后向// 1. 获取merge候选列表,MRG_MAX_NUM_CANDS = 5,实际使用时可能为3,与参数配置有关系uint32_t numMergeCand = tempPred->cu.getInterMergeCandidates(0, 0, candMvField, candDir);PredictionUnit pu(merge.cu, cuGeom, 0);bestPred->sa8dCost = MAX_INT64;int bestSadCand = -1;int sizeIdx = cuGeom.log2CUSize - 2;int safeX, maxSafeMv;if (m_param->bIntraRefresh && m_slice->m_sliceType == P_SLICE){safeX = m_slice->m_refFrameList[0][0]->m_encData->m_pir.pirEndCol * m_param->maxCUSize - 3;maxSafeMv = (safeX - tempPred->cu.m_cuPelX) * 4;}// 2. 检查merge候选模式for (uint32_t i = 0; i < numMergeCand; ++i){// 是否启用帧级并行处理if (m_bFrameParallel){// Parallel slices bound checkif (m_param->maxSlices > 1){// NOTE: First row in slice can't negativeif (X265_MIN(candMvField[i][0].mv.y, candMvField[i][1].mv.y) < m_sliceMinY)continue;// Last row in slice can't reference beyond bound since it is another slice area// TODO: we may beyond bound in future since these area have a chance to finish because we use parallel slices. Necessary prepare research on load balanceif (X265_MAX(candMvField[i][0].mv.y, candMvField[i][1].mv.y) > m_sliceMaxY)continue;}if (candMvField[i][0].mv.y >= (m_param->searchRange + 1) * 4 ||candMvField[i][1].mv.y >= (m_param->searchRange + 1) * 4)continue;}if (m_param->bIntraRefresh && m_slice->m_sliceType == P_SLICE &&tempPred->cu.m_cuPelX / m_param->maxCUSize < m_frame->m_encData->m_pir.pirEndCol &&candMvField[i][0].mv.x > maxSafeMv)// skip merge candidates which reference beyond safe reference areacontinue;// merge候选模式存储在L0中tempPred->cu.m_mvpIdx[0][0] = (uint8_t)i; // merge candidate ID is stored in L0 MVP idxX265_CHECK(m_slice->m_sliceType == B_SLICE || !(candDir[i] & 0x10), " invalid merge for P slice\n");tempPred->cu.m_interDir[0] = candDir[i]; // 候选列表信息tempPred->cu.m_mv[0][0] = candMvField[i][0].mv;	// 前向mv(第二个维度0表示前向,1表示后向)tempPred->cu.m_mv[1][0] = candMvField[i][1].mv;	// 后向mvtempPred->cu.m_refIdx[0][0] = (int8_t)candMvField[i][0].refIdx;	// 前向参考帧索引tempPred->cu.m_refIdx[1][0] = (int8_t)candMvField[i][1].refIdx;	// 后向参考帧索引// 运动补偿(根据MV来获取预测块)motionCompensation(tempPred->cu, pu, tempPred->predYuv, true, m_bChromaSa8d && (m_csp != X265_CSP_I400 && m_frame->m_fencPic->m_picCsp != X265_CSP_I400));tempPred->sa8dBits = getTUBits(i, numMergeCand);// 根据运动补偿MC获取的预测块,来计算sadtempPred->distortion = primitives.cu[sizeIdx].sa8d(fencYuv->m_buf[0], fencYuv->m_size, tempPred->predYuv.m_buf[0], tempPred->predYuv.m_size);if (m_bChromaSa8d && (m_csp != X265_CSP_I400 && m_frame->m_fencPic->m_picCsp != X265_CSP_I400)){tempPred->distortion += primitives.chroma[m_csp].cu[sizeIdx].sa8d(fencYuv->m_buf[1], fencYuv->m_csize, tempPred->predYuv.m_buf[1], tempPred->predYuv.m_csize);tempPred->distortion += primitives.chroma[m_csp].cu[sizeIdx].sa8d(fencYuv->m_buf[2], fencYuv->m_csize, tempPred->predYuv.m_buf[2], tempPred->predYuv.m_csize);}// 计算rdCosttempPred->sa8dCost = m_rdCost.calcRdSADCost((uint32_t)tempPred->distortion, tempPred->sa8dBits);// 检查当前模式的rdCost是否是最佳的if (tempPred->sa8dCost < bestPred->sa8dCost){bestSadCand = i;std::swap(tempPred, bestPred);}}/* force mode decision to take inter or intra */if (bestSadCand < 0)return;/* calculate the motion compensation for chroma for the best mode selected */// 检查chroma分量if ((!m_bChromaSa8d && (m_csp != X265_CSP_I400)) || (m_frame->m_fencPic->m_picCsp == X265_CSP_I400 && m_csp != X265_CSP_I400)) /* Chroma MC was done above */motionCompensation(bestPred->cu, pu, bestPred->predYuv, false, true);if (m_param->rdLevel){if (m_param->bLossless)bestPred->rdCost = MAX_INT64;else // 3. 基于最佳merge模式,计算直接skip的损失(基于SSE),skip模式不会实际编码残差encodeResAndCalcRdSkipCU(*bestPred);/* Encode with residual */tempPred->cu.m_mvpIdx[0][0] = (uint8_t)bestSadCand;tempPred->cu.setPUInterDir(candDir[bestSadCand], 0, 0);tempPred->cu.setPUMv(0, candMvField[bestSadCand][0].mv, 0, 0);tempPred->cu.setPUMv(1, candMvField[bestSadCand][1].mv, 0, 0);tempPred->cu.setPURefIdx(0, (int8_t)candMvField[bestSadCand][0].refIdx, 0, 0);tempPred->cu.setPURefIdx(1, (int8_t)candMvField[bestSadCand][1].refIdx, 0, 0);tempPred->sa8dCost = bestPred->sa8dCost;tempPred->sa8dBits = bestPred->sa8dBits;tempPred->predYuv.copyFromYuv(bestPred->predYuv);// 4. 将bestSadCand使用SSE再计算一遍,获取基于SSE的损失,会实际编码残差encodeResAndCalcRdInterCU(*tempPred, cuGeom);/*从两者中取出最佳的模式(1)bestPred指向的是skip模式(2)tempPred指向的是从merge候选列表中得到bestSadCand模式(基于SSE重新计算之后)*/md.bestMode = tempPred->rdCost < bestPred->rdCost ? tempPred : bestPred;}elsemd.bestMode = bestPred;/* broadcast sets of MV field data */// 存储最佳模式md.bestMode->cu.setPUInterDir(candDir[bestSadCand], 0, 0);md.bestMode->cu.setPUMv(0, candMvField[bestSadCand][0].mv, 0, 0);md.bestMode->cu.setPUMv(1, candMvField[bestSadCand][1].mv, 0, 0);md.bestMode->cu.setPURefIdx(0, (int8_t)candMvField[bestSadCand][0].refIdx, 0, 0);md.bestMode->cu.setPURefIdx(1, (int8_t)candMvField[bestSadCand][1].refIdx, 0, 0);checkDQP(*md.bestMode, cuGeom);
}

2.1.1 获取Merge候选列表(getInterMergeCandidates)

/* Construct list of merging candidates, returns count */
uint32_t CUData::getInterMergeCandidates(uint32_t absPartIdx, uint32_t puIdx, MVField(*candMvField)[2], uint8_t* candDir) const
{uint32_t absPartAddr = m_absIdxInCTU + absPartIdx;const bool isInterB = m_slice->isInterB();const uint32_t maxNumMergeCand = m_slice->m_maxNumMergeCand;for (uint32_t i = 0; i < maxNumMergeCand; ++i){candMvField[i][0].mv = 0;candMvField[i][1].mv = 0;candMvField[i][0].refIdx = REF_NOT_VALID;candMvField[i][1].refIdx = REF_NOT_VALID;}/* calculate the location of upper-left corner pixel and size of the current PU */int xP, yP, nPSW, nPSH;int cuSize = 1 << m_log2CUSize[0];int partMode = m_partSize[0];/*// Partition table.总共有3个维度:(1)第1维表示划分的方式,例如SIZE_2Nx2N,长度为9(2)第2维表示划分之后的索引号,即第几个块,长度为4(3)第3维长度为2,其中第一个表示划分的尺寸,第二个表示划分偏移量举例如下(1)partTable[0][0][0] = 0x44,表示这是一个4x4的块,partTable[0][0][1] = 0x00,表示不存在偏移量(2)partTable[3][0][0] = 0x22,表示这是一个划分成为4个子块的情况,并且标识的是第一个2x2的块;partTable[3][1][1] = 0x20,表示第二个2x2块,水平偏移量为2,垂直偏移量为0const uint32_t partTable[8][4][2] ={//        XY{ { 0x44, 0x00 }, { 0x00, 0x00 }, { 0x00, 0x00 }, { 0x00, 0x00 } }, // SIZE_2Nx2N.{ { 0x42, 0x00 }, { 0x42, 0x02 }, { 0x00, 0x00 }, { 0x00, 0x00 } }, // SIZE_2NxN.{ { 0x24, 0x00 }, { 0x24, 0x20 }, { 0x00, 0x00 }, { 0x00, 0x00 } }, // SIZE_Nx2N.{ { 0x22, 0x00 }, { 0x22, 0x20 }, { 0x22, 0x02 }, { 0x22, 0x22 } }, // SIZE_NxN.{ { 0x41, 0x00 }, { 0x43, 0x01 }, { 0x00, 0x00 }, { 0x00, 0x00 } }, // SIZE_2NxnU.{ { 0x43, 0x00 }, { 0x41, 0x03 }, { 0x00, 0x00 }, { 0x00, 0x00 } }, // SIZE_2NxnD.{ { 0x14, 0x00 }, { 0x34, 0x10 }, { 0x00, 0x00 }, { 0x00, 0x00 } }, // SIZE_nLx2N.{ { 0x34, 0x00 }, { 0x14, 0x30 }, { 0x00, 0x00 }, { 0x00, 0x00 } }  // SIZE_nRx2N.};*/int tmp = partTable[partMode][puIdx][0]; // 尺寸nPSW = ((tmp >> 4) * cuSize) >> 2;  // 宽nPSH = ((tmp & 0xF) * cuSize) >> 2; // 高tmp = partTable[partMode][puIdx][1]; // 偏移量xP = ((tmp >> 4) * cuSize) >> 2;	 // x偏移量(或者说相对位置)yP = ((tmp & 0xF) * cuSize) >> 2;	 // y偏移量uint32_t count = 0;// 根据partSize计算left-bottom位置的idxuint32_t partIdxLT, partIdxRT, partIdxLB = deriveLeftBottomIdx(puIdx);PartSize curPS = (PartSize)m_partSize[absPartIdx];/*Merge候选列表的建立(下图中PU尺寸不固定,只表明相对位置),merge列表最多5个(1)空域候选列表 = { A1, B1, B0, A0, B2 },按照顺序至多选择4个+--+           +--+--+|B2|		   |B1|B0|+--+--+--+--+--+--+--+|           |+           +|  Current  |+    PU     +|           |+--+           +|A1|           |+--+--+--+--+--+|A0|+--+需要特殊处理的情况,针对下面的PU 2情况1: 不能存在A1的运动信息				情况2: 不能存在B1的运动信息+--+        +--+--+            +--+--+--+--+|B2|        |B1|B0|		       |           |+--+--+--+--+--+--+--+--+--+         +--+    PU 1   +--+|           |           |            |B2|           |B0|+           +           +			+--+--+--+--+--+--+|           |           |               |           |+    PU 1   +    PU 2   +			+--+    PU 2   +|           |           |			|A1|           |+           +           +			+--+--+--+--+--+|           |           |            |A0|+--+--+--+--+--+--+--+--+            +--+|A0|+--+(2)时域候选列表 = { H or C3(if H not exist) }Current PU+----+----+----+----+|         |         |+         +         +|         |         |+----+----+----+----+|         | C3 |    |+         +----+    +|         |         |+----+----+----+----+----+|  H |+----+*/// leftuint32_t leftPartIdx = 0;const CUData* cuLeft = getPULeft(leftPartIdx, partIdxLB);// 检查A1是否存在bool isAvailableA1 = cuLeft &&/* isDiffMER()用于检查当前PU和空域需要参考的PU是否位于同一merge区域(1)相邻块x = xP - 1,相邻块y = yP + nPSH - 1(2)当前块x = xP,当前块y = yP我理解这里相邻块x = xP - 1不是表示相邻块左上角的x,而是相邻区域(不然似乎对应不上)*/cuLeft->isDiffMER(xP - 1, yP + nPSH - 1, xP, yP) && // 检查是否为情况1,如果是情况1,则不存在A1信息!(puIdx == 1 && (curPS == SIZE_Nx2N || curPS == SIZE_nLx2N || curPS == SIZE_nRx2N)) &&cuLeft->isInter(leftPartIdx);// 如果A1块存在,则取出dir和mvif (isAvailableA1){// get Inter DircandDir[count] = cuLeft->m_interDir[leftPartIdx];// get Mv from LeftcuLeft->getMvField(cuLeft, leftPartIdx, 0, candMvField[count][0]);if (isInterB)cuLeft->getMvField(cuLeft, leftPartIdx, 1, candMvField[count][1]);if (++count == maxNumMergeCand)return maxNumMergeCand;}// 更新partIdxLT和partIdxRTderiveLeftRightTopIdx(puIdx, partIdxLT, partIdxRT);// aboveuint32_t abovePartIdx = 0;const CUData* cuAbove = getPUAbove(abovePartIdx, partIdxRT);// 检查B1是否存在bool isAvailableB1 = cuAbove &&/* 检查当前PU和空域需要参考的PU是否位于同一merge区域(1)相邻块x = xP + nPSW - 1,相邻块y = yP - 1(2)当前块x = xP,当前块y = yP*/ cuAbove->isDiffMER(xP + nPSW - 1, yP - 1, xP, yP) &&!(puIdx == 1 && (curPS == SIZE_2NxN || curPS == SIZE_2NxnU || curPS == SIZE_2NxnD)) &&cuAbove->isInter(abovePartIdx);if (isAvailableB1 && (!isAvailableA1 || !cuLeft->hasEqualMotion(leftPartIdx, *cuAbove, abovePartIdx))){// get Inter DircandDir[count] = cuAbove->m_interDir[abovePartIdx];// get Mv from LeftcuAbove->getMvField(cuAbove, abovePartIdx, 0, candMvField[count][0]);if (isInterB)cuAbove->getMvField(cuAbove, abovePartIdx, 1, candMvField[count][1]);if (++count == maxNumMergeCand)return maxNumMergeCand;}// above rightuint32_t aboveRightPartIdx = 0;const CUData* cuAboveRight = getPUAboveRight(aboveRightPartIdx, partIdxRT);// 检查B0是否存在bool isAvailableB0 = cuAboveRight &&/* 检查当前PU和空域需要参考的PU是否位于同一merge区域(1)相邻块x = xP + nPSW,相邻块y = yP - 1(2)当前块x = xP,当前块y = yP*/ cuAboveRight->isDiffMER(xP + nPSW, yP - 1, xP, yP) &&cuAboveRight->isInter(aboveRightPartIdx);if (isAvailableB0 && (!isAvailableB1 || !cuAbove->hasEqualMotion(abovePartIdx, *cuAboveRight, aboveRightPartIdx))){// get Inter DircandDir[count] = cuAboveRight->m_interDir[aboveRightPartIdx];// get Mv from LeftcuAboveRight->getMvField(cuAboveRight, aboveRightPartIdx, 0, candMvField[count][0]);if (isInterB)cuAboveRight->getMvField(cuAboveRight, aboveRightPartIdx, 1, candMvField[count][1]);if (++count == maxNumMergeCand)return maxNumMergeCand;}// left bottomuint32_t leftBottomPartIdx = 0;const CUData* cuLeftBottom = this->getPUBelowLeft(leftBottomPartIdx, partIdxLB);// 检查A0是否存在bool isAvailableA0 = cuLeftBottom &&/* 检查当前PU和空域需要参考的PU是否位于同一merge区域(1)相邻块x = xP - 1,相邻块y = yP + nPSH(2)当前块x = xP,当前块y = yP*/ cuLeftBottom->isDiffMER(xP - 1, yP + nPSH, xP, yP) &&cuLeftBottom->isInter(leftBottomPartIdx);if (isAvailableA0 && (!isAvailableA1 || !cuLeft->hasEqualMotion(leftPartIdx, *cuLeftBottom, leftBottomPartIdx))){// get Inter DircandDir[count] = cuLeftBottom->m_interDir[leftBottomPartIdx];// get Mv from LeftcuLeftBottom->getMvField(cuLeftBottom, leftBottomPartIdx, 0, candMvField[count][0]);if (isInterB)cuLeftBottom->getMvField(cuLeftBottom, leftBottomPartIdx, 1, candMvField[count][1]);if (++count == maxNumMergeCand)return maxNumMergeCand;}// above left// 如果前面获取的merge cand小于4个,还会检查左上角的块,即B2if (count < 4){uint32_t aboveLeftPartIdx = 0;const CUData* cuAboveLeft = getPUAboveLeft(aboveLeftPartIdx, absPartAddr);// 检查B2是否可用bool isAvailableB2 = cuAboveLeft &&cuAboveLeft->isDiffMER(xP - 1, yP - 1, xP, yP) &&cuAboveLeft->isInter(aboveLeftPartIdx);if (isAvailableB2 && (!isAvailableA1 || !cuLeft->hasEqualMotion(leftPartIdx, *cuAboveLeft, aboveLeftPartIdx))&& (!isAvailableB1 || !cuAbove->hasEqualMotion(abovePartIdx, *cuAboveLeft, aboveLeftPartIdx))){// get Inter DircandDir[count] = cuAboveLeft->m_interDir[aboveLeftPartIdx];// get Mv from LeftcuAboveLeft->getMvField(cuAboveLeft, aboveLeftPartIdx, 0, candMvField[count][0]);if (isInterB)cuAboveLeft->getMvField(cuAboveLeft, aboveLeftPartIdx, 1, candMvField[count][1]);if (++count == maxNumMergeCand)return maxNumMergeCand;}}/*检查TemporalMVP是否可用,如果可用则去获取时域上的参考列表*/if (m_slice->m_sps->bTemporalMVPEnabled){// 获取右下角pu idxuint32_t partIdxRB = deriveRightBottomIdx(puIdx);MV colmv;int ctuIdx = -1;// image boundary checkif (m_encData->getPicCTU(m_cuAddr)->m_cuPelX + g_zscanToPelX[partIdxRB] + UNIT_SIZE < m_slice->m_sps->picWidthInLumaSamples &&m_encData->getPicCTU(m_cuAddr)->m_cuPelY + g_zscanToPelY[partIdxRB] + UNIT_SIZE < m_slice->m_sps->picHeightInLumaSamples){uint32_t absPartIdxRB = g_zscanToRaster[partIdxRB];uint32_t numUnits = s_numPartInCUSize;// 检查absPartIdxRB是否是最后一列或者最后一行bool bNotLastCol = lessThanCol(absPartIdxRB, numUnits - 1); // is not at the last column of CTUbool bNotLastRow = lessThanRow(absPartIdxRB, numUnits - 1); // is not at the last row    of CTU// 确定时域候选列表同位PU的位置if (bNotLastCol && bNotLastRow){absPartAddr = g_rasterToZscan[absPartIdxRB + RASTER_SIZE + 1];ctuIdx = m_cuAddr;}else if (bNotLastCol)absPartAddr = g_rasterToZscan[(absPartIdxRB + 1) & (numUnits - 1)];else if (bNotLastRow){absPartAddr = g_rasterToZscan[absPartIdxRB + RASTER_SIZE - numUnits + 1];ctuIdx = m_cuAddr + 1;}else // is the right bottom corner of CTUabsPartAddr = 0;}// B帧具有两个时域候选模式,P帧只有一个int maxList = isInterB ? 2 : 1;int dir = 0, refIdx = 0;for (int list = 0; list < maxList; list++){// 获取colocated-mvbool bExistMV = ctuIdx >= 0 && getColMVP(colmv, refIdx, list, ctuIdx, absPartAddr);if (!bExistMV){// 如果右下角的PU没有可用MV,则从C3位置获取mv,作为可用的mvuint32_t partIdxCenter = deriveCenterIdx(puIdx);bExistMV = getColMVP(colmv, refIdx, list, m_cuAddr, partIdxCenter);}// 如果找到可用MV,则加入到队列中if (bExistMV){dir |= (1 << list);candMvField[count][list].mv = colmv;candMvField[count][list].refIdx = refIdx;}}if (dir != 0){candDir[count] = (uint8_t)dir;if (++count == maxNumMergeCand)return maxNumMergeCand;}}// B帧处理组合列表(没研究过)if (isInterB){const uint32_t cutoff = count * (count - 1);uint32_t priorityList0 = 0xEDC984; // { 0, 1, 0, 2, 1, 2, 0, 3, 1, 3, 2, 3 }uint32_t priorityList1 = 0xB73621; // { 1, 0, 2, 0, 2, 1, 3, 0, 3, 1, 3, 2 }for (uint32_t idx = 0; idx < cutoff; idx++, priorityList0 >>= 2, priorityList1 >>= 2){int i = priorityList0 & 3;int j = priorityList1 & 3;if ((candDir[i] & 0x1) && (candDir[j] & 0x2)){// get Mv from cand[i] and cand[j]int refIdxL0 = candMvField[i][0].refIdx;int refIdxL1 = candMvField[j][1].refIdx;int refPOCL0 = m_slice->m_refPOCList[0][refIdxL0];int refPOCL1 = m_slice->m_refPOCList[1][refIdxL1];if (!(refPOCL0 == refPOCL1 && candMvField[i][0].mv == candMvField[j][1].mv)){candMvField[count][0].mv = candMvField[i][0].mv;candMvField[count][0].refIdx = refIdxL0;candMvField[count][1].mv = candMvField[j][1].mv;candMvField[count][1].refIdx = refIdxL1;candDir[count] = 3;if (++count == maxNumMergeCand)return maxNumMergeCand;}}}}int numRefIdx = (isInterB) ? X265_MIN(m_slice->m_numRefIdx[0], m_slice->m_numRefIdx[1]) : m_slice->m_numRefIdx[0];int r = 0;int refcnt = 0;// 如果当前MV候选列表长度不足5个,需要填充(0,0)while (count < maxNumMergeCand){candDir[count] = 1;candMvField[count][0].mv.word = 0;candMvField[count][0].refIdx = r;if (isInterB){candDir[count] = 3;candMvField[count][1].mv.word = 0;candMvField[count][1].refIdx = r;}count++;if (refcnt == numRefIdx - 1)r = 0;else{++r;++refcnt;}}return count;
}

2.1.2 运动补偿(motionCompensation)

运动补偿会根据前面提取到的MV,进行预测,获取到参考帧中的参考块。在x265中,主要调用了predInterLumaPixel()进行帧间的运动补偿

void Predict::motionCompensation(const CUData& cu, const PredictionUnit& pu, Yuv& predYuv, bool bLuma, bool bChroma)
{int refIdx0 = cu.m_refIdx[0][pu.puAbsPartIdx];int refIdx1 = cu.m_refIdx[1][pu.puAbsPartIdx];// 是否是P帧if (cu.m_slice->isInterP()){/* P Slice */WeightValues wv0[3];X265_CHECK(refIdx0 >= 0, "invalid P refidx\n");X265_CHECK(refIdx0 < cu.m_slice->m_numRefIdx[0], "P refidx out of range\n");const WeightParam *wp0 = cu.m_slice->m_weightPredTable[0][refIdx0]; // 加权预测相关,没有研究过MV mv0 = cu.m_mv[0][pu.puAbsPartIdx];cu.clipMv(mv0);if (cu.m_slice->m_pps->bUseWeightPred && wp0->wtPresent){for (int plane = 0; plane < (bChroma ? 3 : 1); plane++){wv0[plane].w      = wp0[plane].inputWeight;wv0[plane].offset = wp0[plane].inputOffset * (1 << (X265_DEPTH - 8));wv0[plane].shift  = wp0[plane].log2WeightDenom;wv0[plane].round  = wp0[plane].log2WeightDenom >= 1 ? 1 << (wp0[plane].log2WeightDenom - 1) : 0;}ShortYuv& shortYuv = m_predShortYuv[0];if (bLuma)predInterLumaShort(pu, shortYuv, *cu.m_slice->m_refReconPicList[0][refIdx0], mv0);if (bChroma)predInterChromaShort(pu, shortYuv, *cu.m_slice->m_refReconPicList[0][refIdx0], mv0);addWeightUni(pu, predYuv, shortYuv, wv0, bLuma, bChroma);}else{	// 亮度模式运动补偿if (bLuma)predInterLumaPixel(pu, predYuv, *cu.m_slice->m_refReconPicList[0][refIdx0], mv0);// 色度模式运动补偿if (bChroma)predInterChromaPixel(pu, predYuv, *cu.m_slice->m_refReconPicList[0][refIdx0], mv0);}}else // B帧(没有研究){	// ...}
}
2.1.2.1 获取预测块(predInterLumaPixel)

从参考帧中获取对应的参考块

void Predict::predInterLumaPixel(const PredictionUnit& pu, Yuv& dstYuv, const PicYuv& refPic, const MV& mv) const
{pixel* dst = dstYuv.getLumaAddr(pu.puAbsPartIdx);intptr_t dstStride = dstYuv.m_size;intptr_t srcStride = refPic.m_stride;intptr_t srcOffset = (mv.x >> 2) + (mv.y >> 2) * srcStride;int partEnum = partitionFromSizes(pu.width, pu.height);const pixel* src = refPic.getLumaAddr(pu.ctuAddr, pu.cuAbsPartIdx + pu.puAbsPartIdx) + srcOffset;int xFrac = mv.x & 3; // 水平方向偏移量int yFrac = mv.y & 3; // 垂直方向偏移量/*下面根据mv的值确定偏移量(1)如果x和y的偏移量都为0,直接copy,使用copy_pp()如果有偏移量,还会进行像素插值,用于后续的亚像素搜索,下面的8tap表示8抽头(2)如果有x方向的偏移量,使用luma_hpp()进行水平方向的亚像素插值(3)如果有y方向的偏移量,使用luma_vpp()进行垂直方向的亚像素插值(4)如果x和y方向的偏移量都不为0,使用luma_hvpp()进行两个方向的亚像素插值*/if (!(yFrac | xFrac))/*调试过程中发现会使用到的copy函数(非正方形也有对应的处理函数,例如blockcopy_pp_32x16_avx)p.pu[LUMA_64x64].copy_pp  = PFX(blockcopy_pp_64x64_avx);p.pu[LUMA_32x32].copy_pp  = PFX(blockcopy_pp_32x32_avx);p.pu[LUMA_16x16].copy_pp  = x265_blockcopy_pp_16x16_sse2; p.pu[LUMA_8x8].copy_pp  = x265_blockcopy_pp_8x8_sse2;*/primitives.pu[partEnum].copy_pp(dst, dstStride, src, srcStride);else if (!yFrac)/*调试过程中发现会使用到的hpp函数p.pu[LUMA_8x8].luma_hpp = PFX(interp_8tap_horiz_pp_8x8_avx2);p.pu[LUMA_16x16].luma_hpp = PFX(interp_8tap_horiz_pp_16x16_avx2);p.pu[LUMA_32x32].luma_hpp = PFX(interp_8tap_horiz_pp_32x32_avx2); */primitives.pu[partEnum].luma_hpp(src, srcStride, dst, dstStride, xFrac);else if (!xFrac)/*调试过程中发现会使用到的vpp函数p.pu[LUMA_8x8].luma_vpp = PFX(interp_8tap_vert_pp_8x8_avx2);p.pu[LUMA_16x16].luma_vpp = PFX(interp_8tap_vert_pp_16x16_avx2);p.pu[LUMA_32x32].luma_vpp = PFX(interp_8tap_vert_pp_32x32_avx2); */primitives.pu[partEnum].luma_vpp(src, srcStride, dst, dstStride, yFrac);else/*调试过程中发现可以使用的vpp函数ALL_LUMA_PU_T(luma_hvpp, interp_8tap_hv_pp_cpu);interp_8tap_hv_pp_cpu<size>是一个模板函数,模板变量为size当size = 1时,表示对8x8块进行处理当size = 2时,表示对16x16块进行处理当size = 3时,表示对32x32块进行处理*/primitives.pu[partEnum].luma_hvpp(src, srcStride, dst, dstStride, xFrac, yFrac);
}

2.1.3 计算不编码残差的损失(encodeResAndCalcRdSkipCU)

前面已经获取了基于SAD的最佳merge模式,这里计算如果不对残差进行编码,直接进行skip带来的损失,检查是否可以直接使用skip。这里计算distortion使用的是SSE而不是SAD

/* Note: this function overwrites the RD cost variables of interMode, but leaves the sa8d cost unharmed */
// 该函数会覆盖interMode中的RDCost,但不会改动sa8d开销
void Search::encodeResAndCalcRdSkipCU(Mode& interMode)
{CUData& cu = interMode.cu;Yuv* reconYuv = &interMode.reconYuv;const Yuv* fencYuv = interMode.fencYuv;Yuv* predYuv = &interMode.predYuv;X265_CHECK(!cu.isIntra(0), "intra CU not expected\n");uint32_t depth  = cu.m_cuDepth[0];// No residual coding : SKIP mode// skip模式不去编码残差cu.setPredModeSubParts(MODE_SKIP);cu.clearCbf();cu.setTUDepthSubParts(0, 0, depth);reconYuv->copyFromYuv(interMode.predYuv);// 计算基于SSE的Rdcost// Lumaint part = partitionFromLog2Size(cu.m_log2CUSize[0]);// 计算sse损失,需要注意的是计算的双方是orig block和recon blockinterMode.lumaDistortion = primitives.cu[part].sse_pp(fencYuv->m_buf[0], fencYuv->m_size, reconYuv->m_buf[0], reconYuv->m_size);interMode.distortion = interMode.lumaDistortion;// Chromaif (m_csp != X265_CSP_I400 && m_frame->m_fencPic->m_picCsp != X265_CSP_I400){interMode.chromaDistortion = m_rdCost.scaleChromaDist(1, primitives.chroma[m_csp].cu[part].sse_pp(fencYuv->m_buf[1], fencYuv->m_csize, reconYuv->m_buf[1], reconYuv->m_csize));interMode.chromaDistortion += m_rdCost.scaleChromaDist(2, primitives.chroma[m_csp].cu[part].sse_pp(fencYuv->m_buf[2], fencYuv->m_csize, reconYuv->m_buf[2], reconYuv->m_csize));interMode.distortion += interMode.chromaDistortion;}cu.m_distortion[0] = interMode.distortion;m_entropyCoder.load(m_rqt[depth].cur); // 将当前CU的信息输入到熵编码器中,为后续的编码做准备m_entropyCoder.resetBits(); // 重置比特缓冲区if (m_slice->m_pps->bTransquantBypassEnabled)m_entropyCoder.codeCUTransquantBypassFlag(cu.m_tqBypass[0]);m_entropyCoder.codeSkipFlag(cu, 0); // 编码skip flagint skipFlagBits = m_entropyCoder.getNumberOfWrittenBits();m_entropyCoder.codeMergeIndex(cu, 0); // 编码merge idxinterMode.mvBits = m_entropyCoder.getNumberOfWrittenBits() - skipFlagBits;interMode.coeffBits = 0;interMode.totalBits = interMode.mvBits + skipFlagBits;if (m_rdCost.m_psyRd)interMode.psyEnergy = m_rdCost.psyCost(part, fencYuv->m_buf[0], fencYuv->m_size, reconYuv->m_buf[0], reconYuv->m_size);else if(m_rdCost.m_ssimRd)interMode.ssimEnergy = m_quant.ssimDistortion(cu, fencYuv->m_buf[0], fencYuv->m_size, reconYuv->m_buf[0], reconYuv->m_size, cu.m_log2CUSize[0], TEXT_LUMA, 0);interMode.resEnergy = primitives.cu[part].sse_pp(fencYuv->m_buf[0], fencYuv->m_size, predYuv->m_buf[0], predYuv->m_size);// 更新该模式的损失updateModeCost(interMode);// 存储已编码信息m_entropyCoder.store(interMode.contexts);
}

2.1.4 计算编码残差的损失(encodeResAndCalcRdInterCU)

沿用前面的最佳merge模式,依据这个模式,在本函数中进行SSE的distortion的计算,使用estimateResidualQT()计算损失。此外,还考虑了cbf为0的情况,即不编码和不传输残差的方式,评估是否有可能使用这种编码方式

/* encode residual and calculate rate-distortion for a CU block.* Note: this function overwrites the RD cost variables of interMode, but leaves the sa8d cost unharmed */
void Search::encodeResAndCalcRdInterCU(Mode& interMode, const CUGeom& cuGeom)
{ProfileCUScope(interMode.cu, interRDOElapsedTime[cuGeom.depth], countInterRDO[cuGeom.depth]);CUData& cu = interMode.cu;Yuv* reconYuv = &interMode.reconYuv;Yuv* predYuv = &interMode.predYuv;uint32_t depth = cuGeom.depth;ShortYuv* resiYuv = &m_rqt[depth].tmpResiYuv;const Yuv* fencYuv = interMode.fencYuv;X265_CHECK(!cu.isIntra(0), "intra CU not expected\n");uint32_t log2CUSize = cuGeom.log2CUSize;int sizeIdx = log2CUSize - 2;// 将预测块pred和编码块enc做差值,获得残差resiYuv->subtract(*fencYuv, *predYuv, log2CUSize, m_frame->m_fencPic->m_picCsp);uint32_t tuDepthRange[2];cu.getInterTUQtDepthRange(tuDepthRange, 0);m_entropyCoder.load(m_rqt[depth].cur);if ((m_limitTU & X265_TU_LIMIT_DFS) && !(m_limitTU & X265_TU_LIMIT_NEIGH))m_maxTUDepth = -1;else if (m_limitTU & X265_TU_LIMIT_BFS)memset(&m_cacheTU, 0, sizeof(TUInfoCache));Cost costs;if (m_limitTU & X265_TU_LIMIT_NEIGH){/* Save and reload maxTUDepth to avoid changing of maxTUDepth between modes */int32_t tempDepth = m_maxTUDepth;if (m_maxTUDepth != -1){uint32_t splitFlag = interMode.cu.m_partSize[0] != SIZE_2Nx2N;uint32_t minSize = tuDepthRange[0];uint32_t maxSize = tuDepthRange[1];maxSize = X265_MIN(maxSize, cuGeom.log2CUSize - splitFlag);m_maxTUDepth = x265_clip3(cuGeom.log2CUSize - maxSize, cuGeom.log2CUSize - minSize, (uint32_t)m_maxTUDepth);}estimateResidualQT(interMode, cuGeom, 0, 0, *resiYuv, costs, tuDepthRange);m_maxTUDepth = tempDepth;}else // 估计编码残差,并计算对应的rdcostestimateResidualQT(interMode, cuGeom, 0, 0, *resiYuv, costs, tuDepthRange);/*检查是否使用bypass(旁路)模式进行编码(1)对于那些概率接近均匀分布的符号,使用bypass编码可以减少编码开销(2)这些符号的概率大致相同,不适合使用普通的上下文自适应二进制算术编码*/uint32_t tqBypass = cu.m_tqBypass[0];if (!tqBypass){// 计算Cbf为0情况下的损失,随后与当前模式的costs进行对比,Cbf为0表示不编码残差,也不传输残差sse_t cbf0Dist = primitives.cu[sizeIdx].sse_pp(fencYuv->m_buf[0], fencYuv->m_size, predYuv->m_buf[0], predYuv->m_size);if (m_csp != X265_CSP_I400 && m_frame->m_fencPic->m_picCsp != X265_CSP_I400){cbf0Dist += m_rdCost.scaleChromaDist(1, primitives.chroma[m_csp].cu[sizeIdx].sse_pp(fencYuv->m_buf[1], predYuv->m_csize, predYuv->m_buf[1], predYuv->m_csize));cbf0Dist += m_rdCost.scaleChromaDist(2, primitives.chroma[m_csp].cu[sizeIdx].sse_pp(fencYuv->m_buf[2], predYuv->m_csize, predYuv->m_buf[2], predYuv->m_csize));}/* Consider the RD cost of not signaling any residual */m_entropyCoder.load(m_rqt[depth].cur);m_entropyCoder.resetBits();m_entropyCoder.codeQtRootCbfZero();uint32_t cbf0Bits = m_entropyCoder.getNumberOfWrittenBits();uint32_t cbf0Energy; uint64_t cbf0Cost;if (m_rdCost.m_psyRd){cbf0Energy = m_rdCost.psyCost(log2CUSize - 2, fencYuv->m_buf[0], fencYuv->m_size, predYuv->m_buf[0], predYuv->m_size);cbf0Cost = m_rdCost.calcPsyRdCost(cbf0Dist, cbf0Bits, cbf0Energy);}else if(m_rdCost.m_ssimRd){cbf0Energy = m_quant.ssimDistortion(cu, fencYuv->m_buf[0], fencYuv->m_size, predYuv->m_buf[0], predYuv->m_size, log2CUSize, TEXT_LUMA, 0);cbf0Cost = m_rdCost.calcSsimRdCost(cbf0Dist, cbf0Bits, cbf0Energy);}elsecbf0Cost = m_rdCost.calcRdCost(cbf0Dist, cbf0Bits);// 对比cbf为0的cost和当前模式的costif (cbf0Cost < costs.rdcost) // {cu.clearCbf();cu.setTUDepthSubParts(0, 0, depth);}}if (cu.getQtRootCbf(0))saveResidualQTData(cu, *resiYuv, 0, 0);/* calculate signal bits for inter/merge/skip coded CU */m_entropyCoder.load(m_rqt[depth].cur);m_entropyCoder.resetBits();if (m_slice->m_pps->bTransquantBypassEnabled)m_entropyCoder.codeCUTransquantBypassFlag(tqBypass);uint32_t coeffBits, bits, mvBits;// 启用merge && size = 2Nx2N && 根节点Cbf为0if (cu.m_mergeFlag[0] && cu.m_partSize[0] == SIZE_2Nx2N && !cu.getQtRootCbf(0)){// 根节点的Cbf为0,说明子块不再需要继续预测,直接skipcu.setPredModeSubParts(MODE_SKIP);/* Merge/Skip */coeffBits = mvBits = 0;m_entropyCoder.codeSkipFlag(cu, 0); // 编码skip Flagint skipFlagBits = m_entropyCoder.getNumberOfWrittenBits();m_entropyCoder.codeMergeIndex(cu, 0); // 编码merge idxmvBits = m_entropyCoder.getNumberOfWrittenBits() - skipFlagBits;bits = mvBits + skipFlagBits;}else{m_entropyCoder.codeSkipFlag(cu, 0);int skipFlagBits = m_entropyCoder.getNumberOfWrittenBits();m_entropyCoder.codePredMode(cu.m_predMode[0]);m_entropyCoder.codePartSize(cu, 0, cuGeom.depth);m_entropyCoder.codePredInfo(cu, 0);mvBits = m_entropyCoder.getNumberOfWrittenBits() - skipFlagBits;bool bCodeDQP = m_slice->m_pps->bUseDQP;m_entropyCoder.codeCoeff(cu, 0, bCodeDQP, tuDepthRange);bits = m_entropyCoder.getNumberOfWrittenBits();coeffBits = bits - mvBits - skipFlagBits;}m_entropyCoder.store(interMode.contexts);if (cu.getQtRootCbf(0))reconYuv->addClip(*predYuv, *resiYuv, log2CUSize, m_frame->m_fencPic->m_picCsp);elsereconYuv->copyFromYuv(*predYuv);// update with clipped distortion and cost (qp estimation loop uses unclipped values)// 计算最佳的SSEsse_t bestLumaDist = primitives.cu[sizeIdx].sse_pp(fencYuv->m_buf[0], fencYuv->m_size, reconYuv->m_buf[0], reconYuv->m_size);interMode.distortion = bestLumaDist;// 计算chroma分量if (m_csp != X265_CSP_I400 && m_frame->m_fencPic->m_picCsp != X265_CSP_I400){sse_t bestChromaDist = m_rdCost.scaleChromaDist(1, primitives.chroma[m_csp].cu[sizeIdx].sse_pp(fencYuv->m_buf[1], fencYuv->m_csize, reconYuv->m_buf[1], reconYuv->m_csize));bestChromaDist += m_rdCost.scaleChromaDist(2, primitives.chroma[m_csp].cu[sizeIdx].sse_pp(fencYuv->m_buf[2], fencYuv->m_csize, reconYuv->m_buf[2], reconYuv->m_csize));interMode.chromaDistortion = bestChromaDist;interMode.distortion += bestChromaDist;}if (m_rdCost.m_psyRd) // 计算心理视觉的rdcostinterMode.psyEnergy = m_rdCost.psyCost(sizeIdx, fencYuv->m_buf[0], fencYuv->m_size, reconYuv->m_buf[0], reconYuv->m_size);else if(m_rdCost.m_ssimRd)interMode.ssimEnergy = m_quant.ssimDistortion(cu, fencYuv->m_buf[0], fencYuv->m_size, reconYuv->m_buf[0], reconYuv->m_size, cu.m_log2CUSize[0], TEXT_LUMA, 0);interMode.resEnergy = primitives.cu[sizeIdx].sse_pp(fencYuv->m_buf[0], fencYuv->m_size, predYuv->m_buf[0], predYuv->m_size);interMode.totalBits = bits;interMode.lumaDistortion = bestLumaDist;interMode.coeffBits = coeffBits;interMode.mvBits = mvBits;cu.m_distortion[0] = interMode.distortion;// 更新costupdateModeCost(interMode);checkDQP(interMode, cuGeom);
}

2.2 常规帧间预测(checkInter_rd0_4)

该函数的作用为进行常规的帧间预测,其中主要调用了predInterSearch()进行帧间搜索,并使用SAD来衡量模式的损失

void Analysis::checkInter_rd0_4(Mode& interMode, const CUGeom& cuGeom, PartSize partSize, uint32_t refMask[2])
{interMode.initCosts();interMode.cu.setPartSizeSubParts(partSize);interMode.cu.setPredModeSubParts(MODE_INTER);int numPredDir = m_slice->isInterP() ? 1 : 2;// 是否使用编码分析重用if (m_param->analysisLoadReuseLevel > 1 && m_param->analysisLoadReuseLevel != 10 && m_reuseInterDataCTU){int refOffset = cuGeom.geomRecurId * 16 * numPredDir + partSize * numPredDir * 2;int index = 0;uint32_t numPU = interMode.cu.getNumPartInter(0);for (uint32_t part = 0; part < numPU; part++){MotionData* bestME = interMode.bestME[part];for (int32_t i = 0; i < numPredDir; i++)bestME[i].ref = m_reuseRef[refOffset + index++];}}// multi-pass优化if (m_param->analysisMultiPassRefine && m_param->rc.bStatRead && m_reuseInterDataCTU){uint32_t numPU = interMode.cu.getNumPartInter(0);for (uint32_t part = 0; part < numPU; part++){MotionData* bestME = interMode.bestME[part];for (int32_t i = 0; i < numPredDir; i++){int* ref = &m_reuseRef[i * m_frame->m_analysisData.numPartitions * m_frame->m_analysisData.numCUsInFrame];bestME[i].ref = ref[cuGeom.absPartIdx];bestME[i].mv = m_reuseMv[i][cuGeom.absPartIdx].word;bestME[i].mvpIdx = m_reuseMvpIdx[i][cuGeom.absPartIdx];}}}// 进行帧间搜索predInterSearch(interMode, cuGeom, m_bChromaSa8d && (m_csp != X265_CSP_I400 && m_frame->m_fencPic->m_picCsp != X265_CSP_I400), refMask);/* predInterSearch sets interMode.sa8dBits */// 帧间搜索使用SAD来衡量最佳模式const Yuv& fencYuv = *interMode.fencYuv;Yuv& predYuv = interMode.predYuv;int part = partitionFromLog2Size(cuGeom.log2CUSize);// 计算SADinterMode.distortion = primitives.cu[part].sa8d(fencYuv.m_buf[0], fencYuv.m_size, predYuv.m_buf[0], predYuv.m_size);if (m_bChromaSa8d && (m_csp != X265_CSP_I400 && m_frame->m_fencPic->m_picCsp != X265_CSP_I400)){interMode.distortion += primitives.chroma[m_csp].cu[part].sa8d(fencYuv.m_buf[1], fencYuv.m_csize, predYuv.m_buf[1], predYuv.m_csize);interMode.distortion += primitives.chroma[m_csp].cu[part].sa8d(fencYuv.m_buf[2], fencYuv.m_csize, predYuv.m_buf[2], predYuv.m_csize);}interMode.sa8dCost = m_rdCost.calcRdSADCost((uint32_t)interMode.distortion, interMode.sa8dBits);if (m_param->analysisSaveReuseLevel > 1 && m_reuseInterDataCTU){int refOffset = cuGeom.geomRecurId * 16 * numPredDir + partSize * numPredDir * 2;int index = 0;uint32_t numPU = interMode.cu.getNumPartInter(0);for (uint32_t puIdx = 0; puIdx < numPU; puIdx++){MotionData* bestME = interMode.bestME[puIdx];for (int32_t i = 0; i < numPredDir; i++)m_reuseRef[refOffset + index++] = bestME[i].ref;}}
}

2.2.1 帧间预测搜索(predInterSearch)

该函数的作用是为当前PU寻找到最佳的Inter模式,在执行运动搜索之前,需要先确定可参考的MV
(1)如果当前PU是子CU(即partSize不等于2Nx2N),会评估merge模式(mergeEstimation)
(2)构建AMVP列表并从中选出最佳候选模式(getPMV,selectPMV)
(3)进行运动估计(motionEstimate)

/* find the best inter prediction for each PU of specified mode */
void Search::predInterSearch(Mode& interMode, const CUGeom& cuGeom, bool bChromaMC, uint32_t refMasks[2])
{ProfileCUScope(interMode.cu, motionEstimationElapsedTime, countMotionEstimate);CUData& cu = interMode.cu;Yuv* predYuv = &interMode.predYuv;// 12 mv candidates including lowresMVMV mvc[(MD_ABOVE_LEFT + 1) * 2 + 2]; // motion vector candidatesconst Slice *slice = m_slice;int numPart     = cu.getNumPartInter(0);int numPredDir  = slice->isInterP() ? 1 : 2;const int* numRefIdx = slice->m_numRefIdx;uint32_t lastMode = 0;int      totalmebits = 0;MV       mvzero(0, 0);Yuv&     tmpPredYuv = m_rqt[cuGeom.depth].tmpPredYuv;MergeData merge;memset(&merge, 0, sizeof(merge));bool useAsMVP = false;// 分成不同子块进行帧间预测for (int puIdx = 0; puIdx < numPart; puIdx++){MotionData* bestME = interMode.bestME[puIdx];PredictionUnit pu(cu, cuGeom, puIdx);// 配置一些原子计算函数和变量m_me.setSourcePU(*interMode.fencYuv, pu.ctuAddr, pu.cuAbsPartIdx, pu.puAbsPartIdx, pu.width, pu.height, m_param->searchMethod, m_param->subpelRefine, bChromaMC);useAsMVP = false;x265_analysis_inter_data* interDataCTU = NULL;int cuIdx;cuIdx = (interMode.cu.m_cuAddr * m_param->num4x4Partitions) + cuGeom.absPartIdx;if (m_param->analysisLoadReuseLevel == 10 && m_param->interRefine > 1){interDataCTU = m_frame->m_analysisData.interData;if ((cu.m_predMode[pu.puAbsPartIdx] == interDataCTU->modes[cuIdx + pu.puAbsPartIdx])&& (cu.m_partSize[pu.puAbsPartIdx] == interDataCTU->partSize[cuIdx + pu.puAbsPartIdx])&& !(interDataCTU->mergeFlag[cuIdx + puIdx])&& (cu.m_cuDepth[0] == interDataCTU->depth[cuIdx]))useAsMVP = true;}/* find best cost merge candidate. note: 2Nx2N merge and bidir are handled as separate modes */// 1. 尽管在checkMerge_2Nx2N_rd0_4当中检查了2Nx2N块的merge模式,这里会对非2Nx2N的块去检查merge模式uint32_t mrgCost = numPart == 1 ? MAX_UINT : mergeEstimation(cu, cuGeom, pu, puIdx, merge);bestME[0].cost = MAX_UINT;bestME[1].cost = MAX_UINT;// 根据块信息来计算当前块使用的比特数(固定开销)getBlkBits((PartSize)cu.m_partSize[0], slice->isInterP(), puIdx, lastMode, m_listSelBits);bool bDoUnidir = true;// 获取相邻块的MV,为后续构建AMVP做准备cu.getNeighbourMV(puIdx, pu.puAbsPartIdx, interMode.interNeighbours);/* Uni-directional prediction */if ((m_param->analysisLoadReuseLevel > 1 && m_param->analysisLoadReuseLevel != 10)|| (m_param->analysisMultiPassRefine && m_param->rc.bStatRead) || (m_param->bAnalysisType == AVC_INFO) || (useAsMVP)){// 双向预测没有研究      // ...        }else if (m_param->bDistributeMotionEstimation) // 分布式运动估计,与多线程相关(没研究过){// ...}if (bDoUnidir) // 如果是单向预测{interMode.bestME[puIdx][0].ref = interMode.bestME[puIdx][1].ref = -1;uint32_t refMask = refMasks[puIdx] ? refMasks[puIdx] : (uint32_t)-1;for (int list = 0; list < numPredDir; list++){for (int ref = 0; ref < numRefIdx[list]; ref++){ProfileCounter(interMode.cu, totalMotionReferences[cuGeom.depth]);if (!(refMask & (1 << ref))){ProfileCounter(interMode.cu, skippedMotionReferences[cuGeom.depth]);continue;}uint32_t bits = m_listSelBits[list] + MVP_IDX_BITS;bits += getTUBits(ref, numRefIdx[list]);// 3. 基于interNeighbours,构建AMVP列表,列表长度为2int numMvc = cu.getPMV(interMode.interNeighbours, list, ref, interMode.amvpCand[list][ref], mvc);const MV* amvp = interMode.amvpCand[list][ref];// 从AMVP列表中选择最佳的MV(2选1)int mvpIdx = selectMVP(cu, pu, amvp, list, ref);MV mvmin, mvmax, outmv, mvp = amvp[mvpIdx], mvp_lowres;bool bLowresMVP = false;if (!m_param->analysisSave && !m_param->analysisLoad) /* Prevents load/save outputs from diverging when lowresMV is not available */{// 获取低分辨率帧的MVMV lmv = getLowresMV(cu, pu, list, ref);if (lmv.notZero())mvc[numMvc++] = lmv;if (m_param->bEnableHME)mvp_lowres = lmv;}if (m_param->searchMethod == X265_SEA){int puX = puIdx & 1;int puY = puIdx >> 1;for (int planes = 0; planes < INTEGRAL_PLANE_NUM; planes++)m_me.integral[planes] = interMode.fencYuv->m_integral[list][ref][planes] + puX * pu.width + puY * pu.height * m_slice->m_refFrameList[list][ref]->m_reconPic->m_stride;}// 设置搜索范围(searchRange默认为57)setSearchRange(cu, mvp, m_param->searchRange, mvmin, mvmax);// 3. 进行运动估计int satdCost = m_me.motionEstimate(&slice->m_mref[list][ref], mvmin, mvmax, mvp, numMvc, mvc, m_param->searchRange, outmv, m_param->maxSlices, m_param->bSourceReferenceEstimation ? m_slice->m_refFrameList[list][ref]->m_fencPic->getLumaAddr(0) : 0);// 默认不使用HMEif (m_param->bEnableHME && mvp_lowres.notZero() && mvp_lowres != mvp){MV outmv_lowres;setSearchRange(cu, mvp_lowres, m_param->searchRange, mvmin, mvmax);int lowresMvCost = m_me.motionEstimate(&slice->m_mref[list][ref], mvmin, mvmax, mvp_lowres, numMvc, mvc, m_param->searchRange, outmv_lowres, m_param->maxSlices,m_param->bSourceReferenceEstimation ? m_slice->m_refFrameList[list][ref]->m_fencPic->getLumaAddr(0) : 0);if (lowresMvCost < satdCost){outmv = outmv_lowres;satdCost = lowresMvCost;bLowresMVP = true;}}/* Get total cost of partition, but only include MV bit cost once */// 计算对应MV的损失值bits += m_me.bitcost(outmv);uint32_t mvCost = m_me.mvcost(outmv);uint32_t cost = (satdCost - mvCost) + m_rdCost.getCost(bits);/* Update LowresMVP to best AMVP cand*/if (bLowresMVP)updateMVP(amvp[mvpIdx], outmv, bits, cost, mvp_lowres);/* Refine MVP selection, updates: mvpIdx, bits, cost */mvp = checkBestMVP(amvp, outmv, mvpIdx, bits, cost);// 更新损失值if (cost < bestME[list].cost){bestME[list].mv      = outmv;bestME[list].mvp     = mvp;bestME[list].mvpIdx  = mvpIdx;bestME[list].ref     = ref;bestME[list].cost    = cost;bestME[list].bits    = bits;bestME[list].mvCost  = mvCost;}}/* the second list ref bits start at bit 16 */refMask >>= 16;}}/* Bi-directional prediction */MotionData bidir[2];uint32_t bidirCost = MAX_UINT;int bidirBits = 0;if (slice->isInterB() && !cu.isBipredRestriction() &&  /* biprediction is possible for this PU */cu.m_partSize[pu.puAbsPartIdx] != SIZE_2Nx2N &&    /* 2Nx2N biprediction is handled elsewhere */bestME[0].cost != MAX_UINT && bestME[1].cost != MAX_UINT){// B帧双向预测(没研究过)// ...}/* select best option and store into CU */// 检查最佳的模式if (mrgCost < bidirCost && mrgCost < bestME[0].cost && mrgCost < bestME[1].cost){cu.m_mergeFlag[pu.puAbsPartIdx] = true;cu.m_mvpIdx[0][pu.puAbsPartIdx] = merge.index; /* merge candidate ID is stored in L0 MVP idx */cu.setPUInterDir(merge.dir, pu.puAbsPartIdx, puIdx);cu.setPUMv(0, merge.mvField[0].mv, pu.puAbsPartIdx, puIdx);cu.setPURefIdx(0, merge.mvField[0].refIdx, pu.puAbsPartIdx, puIdx);cu.setPUMv(1, merge.mvField[1].mv, pu.puAbsPartIdx, puIdx);cu.setPURefIdx(1, merge.mvField[1].refIdx, pu.puAbsPartIdx, puIdx);totalmebits += merge.bits;}else if (bidirCost < bestME[0].cost && bidirCost < bestME[1].cost){lastMode = 2;cu.m_mergeFlag[pu.puAbsPartIdx] = false;cu.setPUInterDir(3, pu.puAbsPartIdx, puIdx);cu.setPUMv(0, bidir[0].mv, pu.puAbsPartIdx, puIdx);cu.setPURefIdx(0, bestME[0].ref, pu.puAbsPartIdx, puIdx);cu.m_mvd[0][pu.puAbsPartIdx] = bidir[0].mv - bidir[0].mvp;cu.m_mvpIdx[0][pu.puAbsPartIdx] = bidir[0].mvpIdx;cu.setPUMv(1, bidir[1].mv, pu.puAbsPartIdx, puIdx);cu.setPURefIdx(1, bestME[1].ref, pu.puAbsPartIdx, puIdx);cu.m_mvd[1][pu.puAbsPartIdx] = bidir[1].mv - bidir[1].mvp;cu.m_mvpIdx[1][pu.puAbsPartIdx] = bidir[1].mvpIdx;totalmebits += bidirBits;}else if (bestME[0].cost <= bestME[1].cost){lastMode = 0;cu.m_mergeFlag[pu.puAbsPartIdx] = false;cu.setPUInterDir(1, pu.puAbsPartIdx, puIdx);cu.setPUMv(0, bestME[0].mv, pu.puAbsPartIdx, puIdx);cu.setPURefIdx(0, bestME[0].ref, pu.puAbsPartIdx, puIdx);cu.m_mvd[0][pu.puAbsPartIdx] = bestME[0].mv - bestME[0].mvp;cu.m_mvpIdx[0][pu.puAbsPartIdx] = bestME[0].mvpIdx;cu.setPURefIdx(1, REF_NOT_VALID, pu.puAbsPartIdx, puIdx);cu.setPUMv(1, mvzero, pu.puAbsPartIdx, puIdx);totalmebits += bestME[0].bits;}else{	// 存储最佳模式信息lastMode = 1;cu.m_mergeFlag[pu.puAbsPartIdx] = false;cu.setPUInterDir(2, pu.puAbsPartIdx, puIdx);cu.setPUMv(1, bestME[1].mv, pu.puAbsPartIdx, puIdx);cu.setPURefIdx(1, bestME[1].ref, pu.puAbsPartIdx, puIdx);cu.m_mvd[1][pu.puAbsPartIdx] = bestME[1].mv - bestME[1].mvp;cu.m_mvpIdx[1][pu.puAbsPartIdx] = bestME[1].mvpIdx;cu.setPURefIdx(0, REF_NOT_VALID, pu.puAbsPartIdx, puIdx);cu.setPUMv(0, mvzero, pu.puAbsPartIdx, puIdx);totalmebits += bestME[1].bits;}// 进行最佳模式的运动补偿,这样可以获得重建帧,用于后续的帧间预测motionCompensation(cu, pu, *predYuv, true, bChromaMC);}interMode.sa8dBits += totalmebits;
}
2.2.1.1 对子PU评估merge模式(mergeEstimation)

没有太多需要注释的地方,这里主要的一个区别是计算损失时使用的是SATD

/* estimation of best merge coding of an inter PU (2Nx2N merge PUs are evaluated as their own mode) */
uint32_t Search::mergeEstimation(CUData& cu, const CUGeom& cuGeom, const PredictionUnit& pu, int puIdx, MergeData& m)
{// 2Nx2N的块不会使用当前这个函数X265_CHECK(cu.m_partSize[0] != SIZE_2Nx2N, "mergeEstimation() called for 2Nx2N\n");MVField  candMvField[MRG_MAX_NUM_CANDS][2];uint8_t  candDir[MRG_MAX_NUM_CANDS];uint32_t numMergeCand = cu.getInterMergeCandidates(pu.puAbsPartIdx, puIdx, candMvField, candDir);if (cu.isBipredRestriction()){/* do not allow bidir merge candidates if PU is smaller than 8x8, drop L1 reference */for (uint32_t mergeCand = 0; mergeCand < numMergeCand; ++mergeCand){if (candDir[mergeCand] == 3){candDir[mergeCand] = 1;candMvField[mergeCand][1].refIdx = REF_NOT_VALID;}}}Yuv& tempYuv = m_rqt[cuGeom.depth].tmpPredYuv;uint32_t outCost = MAX_UINT;// 遍历merge候选列表,从中寻找到一个最佳的模式for (uint32_t mergeCand = 0; mergeCand < numMergeCand; ++mergeCand){/* Prevent TMVP candidates from using unavailable reference pixels */if (m_bFrameParallel) // 是否允许帧级并行{// Parallel slices bound checkif (m_param->maxSlices > 1){if (cu.m_bFirstRowInSlice &((candMvField[mergeCand][0].mv.y < (2 * 4)) | (candMvField[mergeCand][1].mv.y < (2 * 4))))continue;// Last row in slice can't reference beyond bound since it is another slice area// TODO: we may beyond bound in future since these area have a chance to finish because we use parallel slices. Necessary prepare research on load balanceif (cu.m_bLastRowInSlice &&((candMvField[mergeCand][0].mv.y > -3 * 4) | (candMvField[mergeCand][1].mv.y > -3 * 4)))continue;}if (candMvField[mergeCand][0].mv.y >= (m_param->searchRange + 1) * 4 ||candMvField[mergeCand][1].mv.y >= (m_param->searchRange + 1) * 4)continue;}cu.m_mv[0][pu.puAbsPartIdx] = candMvField[mergeCand][0].mv;cu.m_refIdx[0][pu.puAbsPartIdx] = (int8_t)candMvField[mergeCand][0].refIdx;cu.m_mv[1][pu.puAbsPartIdx] = candMvField[mergeCand][1].mv;cu.m_refIdx[1][pu.puAbsPartIdx] = (int8_t)candMvField[mergeCand][1].refIdx;// 运动补偿,获得预测块motionCompensation(cu, pu, tempYuv, true, m_me.bChromaSATD);// 计算的是SATDuint32_t costCand = m_me.bufSATD(tempYuv.getLumaAddr(pu.puAbsPartIdx), tempYuv.m_size);if (m_me.bChromaSATD)costCand += m_me.bufChromaSATD(tempYuv, pu.puAbsPartIdx);uint32_t bitsCand = getTUBits(mergeCand, numMergeCand);costCand = costCand + m_rdCost.getCost(bitsCand);if (costCand < outCost){outCost = costCand;m.bits = bitsCand;m.index = mergeCand;}}m.mvField[0] = candMvField[m.index][0];m.mvField[1] = candMvField[m.index][1];m.dir = candDir[m.index];return outCost;
}
2.2.1.2 AMPV的实现

与Merge模式类似,AVMP也是从可用的空域相邻参考块和时域参考块中提取MV,其步骤大致为
(1)获取相邻可用MV(getNeighbourMV)
(2)构建AMVP列表(getPMV)
(3)从AMVP列表中选择最佳候选模式(selectPMV)

/* Constructs a list of candidates for AMVP, and a larger list of motion candidates */
void CUData::getNeighbourMV(uint32_t puIdx, uint32_t absPartIdx, InterNeighbourMV* neighbours) const
{// Set the temporal neighbour to unavailable by default.neighbours[MD_COLLOCATED].unifiedRef = -1;uint32_t partIdxLT, partIdxRT, partIdxLB = deriveLeftBottomIdx(puIdx);deriveLeftRightTopIdx(puIdx, partIdxLT, partIdxRT);// Load the spatial MVs.// 读取空域上可用块的MVgetInterNeighbourMV(neighbours + MD_BELOW_LEFT, partIdxLB, MD_BELOW_LEFT);getInterNeighbourMV(neighbours + MD_LEFT,       partIdxLB, MD_LEFT);getInterNeighbourMV(neighbours + MD_ABOVE_RIGHT,partIdxRT, MD_ABOVE_RIGHT);getInterNeighbourMV(neighbours + MD_ABOVE,      partIdxRT, MD_ABOVE);getInterNeighbourMV(neighbours + MD_ABOVE_LEFT, partIdxLT, MD_ABOVE_LEFT);// 寻找时间域上可用块的MVif (m_slice->m_sps->bTemporalMVPEnabled){uint32_t absPartAddr = m_absIdxInCTU + absPartIdx;uint32_t partIdxRB = deriveRightBottomIdx(puIdx);// co-located RightBottom temporal predictor (H)int ctuIdx = -1;// image boundary checkif (m_encData->getPicCTU(m_cuAddr)->m_cuPelX + g_zscanToPelX[partIdxRB] + UNIT_SIZE < m_slice->m_sps->picWidthInLumaSamples &&m_encData->getPicCTU(m_cuAddr)->m_cuPelY + g_zscanToPelY[partIdxRB] + UNIT_SIZE < m_slice->m_sps->picHeightInLumaSamples){uint32_t absPartIdxRB = g_zscanToRaster[partIdxRB];uint32_t numUnits = s_numPartInCUSize;bool bNotLastCol = lessThanCol(absPartIdxRB, numUnits - 1); // is not at the last column of CTUbool bNotLastRow = lessThanRow(absPartIdxRB, numUnits - 1); // is not at the last row    of CTUif (bNotLastCol && bNotLastRow){absPartAddr = g_rasterToZscan[absPartIdxRB + RASTER_SIZE + 1];ctuIdx = m_cuAddr;}else if (bNotLastCol)absPartAddr = g_rasterToZscan[(absPartIdxRB + 1) & (numUnits - 1)];else if (bNotLastRow){absPartAddr = g_rasterToZscan[absPartIdxRB + RASTER_SIZE - numUnits + 1];ctuIdx = m_cuAddr + 1;}else // is the right bottom corner of CTUabsPartAddr = 0;}if (!(ctuIdx >= 0 && getCollocatedMV(ctuIdx, absPartAddr, neighbours + MD_COLLOCATED))){uint32_t partIdxCenter =  deriveCenterIdx(puIdx);uint32_t curCTUIdx = m_cuAddr;// 获取参考块的MVgetCollocatedMV(curCTUIdx, partIdxCenter, neighbours + MD_COLLOCATED);}}
}

构建AMVP列表的方式如下,简单来说,首先填充空域上的候选列表,其次填充时域上的后续选列表,最后如果列表长度不足2个,则填充0

// Create the PMV list. Called for each reference index.
int CUData::getPMV(InterNeighbourMV *neighbours, uint32_t picList, uint32_t refIdx, MV* amvpCand, MV* pmv) const
{// Direct MVP表示直接从相邻块的运动矢量信息中获取候选运动矢量MV directMV[MD_ABOVE_LEFT + 1];			// Indirect MVP表示scaled的运动矢量,这是因为相邻MV和当前PU指向的参考帧不是同一个参考帧,需要进行scaleMV indirectMV[MD_ABOVE_LEFT + 1];		bool validDirect[MD_ABOVE_LEFT + 1];bool validIndirect[MD_ABOVE_LEFT + 1];// Left candidate.validDirect[MD_BELOW_LEFT]  = getDirectPMV(directMV[MD_BELOW_LEFT], neighbours + MD_BELOW_LEFT, picList, refIdx);validDirect[MD_LEFT]        = getDirectPMV(directMV[MD_LEFT], neighbours + MD_LEFT, picList, refIdx);// Top candidate.validDirect[MD_ABOVE_RIGHT] = getDirectPMV(directMV[MD_ABOVE_RIGHT], neighbours + MD_ABOVE_RIGHT, picList, refIdx);validDirect[MD_ABOVE]       = getDirectPMV(directMV[MD_ABOVE], neighbours + MD_ABOVE, picList, refIdx);validDirect[MD_ABOVE_LEFT]  = getDirectPMV(directMV[MD_ABOVE_LEFT], neighbours + MD_ABOVE_LEFT, picList, refIdx);// Left candidate.validIndirect[MD_BELOW_LEFT]  = getIndirectPMV(indirectMV[MD_BELOW_LEFT], neighbours + MD_BELOW_LEFT, picList, refIdx);validIndirect[MD_LEFT]        = getIndirectPMV(indirectMV[MD_LEFT], neighbours + MD_LEFT, picList, refIdx);// Top candidate.validIndirect[MD_ABOVE_RIGHT] = getIndirectPMV(indirectMV[MD_ABOVE_RIGHT], neighbours + MD_ABOVE_RIGHT, picList, refIdx);validIndirect[MD_ABOVE]       = getIndirectPMV(indirectMV[MD_ABOVE], neighbours + MD_ABOVE, picList, refIdx);validIndirect[MD_ABOVE_LEFT]  = getIndirectPMV(indirectMV[MD_ABOVE_LEFT], neighbours + MD_ABOVE_LEFT, picList, refIdx);/*1.填充空域可用相邻块的MV,读取的顺序为 A0 -> A1 -> B0 -> B1 -> B2+--+           +--+--+|B2|		   |B1|B0|+--+--+--+--+--+--+--+|           |+           +|  Current  |+    PU     +|           |+--+           +|A1|           |+--+--+--+--+--+|A0|+--+*/int num = 0;// Left predictor searchif (validDirect[MD_BELOW_LEFT])amvpCand[num++] = directMV[MD_BELOW_LEFT];else if (validDirect[MD_LEFT])amvpCand[num++] = directMV[MD_LEFT];else if (validIndirect[MD_BELOW_LEFT])amvpCand[num++] = indirectMV[MD_BELOW_LEFT];else if (validIndirect[MD_LEFT])amvpCand[num++] = indirectMV[MD_LEFT];bool bAddedSmvp = num > 0;// Above predictor searchif (validDirect[MD_ABOVE_RIGHT])amvpCand[num++] = directMV[MD_ABOVE_RIGHT];else if (validDirect[MD_ABOVE])amvpCand[num++] = directMV[MD_ABOVE];else if (validDirect[MD_ABOVE_LEFT])amvpCand[num++] = directMV[MD_ABOVE_LEFT];if (!bAddedSmvp){if (validIndirect[MD_ABOVE_RIGHT])amvpCand[num++] = indirectMV[MD_ABOVE_RIGHT];else if (validIndirect[MD_ABOVE])amvpCand[num++] = indirectMV[MD_ABOVE];else if (validIndirect[MD_ABOVE_LEFT])amvpCand[num++] = indirectMV[MD_ABOVE_LEFT];}int numMvc = 0;for (int dir = MD_LEFT; dir <= MD_ABOVE_LEFT; dir++){if (validDirect[dir] && directMV[dir].notZero())pmv[numMvc++] = directMV[dir];if (validIndirect[dir] && indirectMV[dir].notZero())pmv[numMvc++] = indirectMV[dir];}if (num == 2)num -= amvpCand[0] == amvpCand[1];// Get the collocated candidate. At this step, either the first candidate// was found or its value is 0.// 2.填充时域上可用块的MVif (m_slice->m_sps->bTemporalMVPEnabled && num < 2){int tempRefIdx = neighbours[MD_COLLOCATED].refIdx[picList];if (tempRefIdx != -1){uint32_t cuAddr = neighbours[MD_COLLOCATED].cuAddr[picList];const Frame* colPic = m_slice->m_refFrameList[m_slice->isInterB() && !m_slice->m_colFromL0Flag][m_slice->m_colRefIdx];const CUData* colCU = colPic->m_encData->getPicCTU(cuAddr);// Scale the vectorint colRefPOC = colCU->m_slice->m_refPOCList[tempRefIdx >> 4][tempRefIdx & 0xf];int colPOC = colCU->m_slice->m_poc;int curRefPOC = m_slice->m_refPOCList[picList][refIdx];int curPOC = m_slice->m_poc;pmv[numMvc++] = amvpCand[num++] = scaleMvByPOCDist(neighbours[MD_COLLOCATED].mv[picList], curPOC, curRefPOC, colPOC, colRefPOC);}}// 3.如果不满2个,则填充0while (num < AMVP_NUM_CANDS)amvpCand[num++] = 0;return numMvc;
}

从前面已经获取的AMVP列表中选择一个最佳的候选模式,即2选1,使用predInterLumaPixel进行运动补偿,并计算MV带来的损失

/* Pick between the two AMVP candidates which is the best one to use as* MVP for the motion search, based on SAD cost */
int Search::selectMVP(const CUData& cu, const PredictionUnit& pu, const MV amvp[AMVP_NUM_CANDS], int list, int ref)
{if (amvp[0] == amvp[1])return 0;Yuv& tmpPredYuv = m_rqt[cu.m_cuDepth[0]].tmpPredYuv;uint32_t costs[AMVP_NUM_CANDS];for (int i = 0; i < AMVP_NUM_CANDS; i++){MV mvCand = amvp[i];// NOTE: skip mvCand if Y is > merange and -FN>1if (m_bFrameParallel){costs[i] = m_me.COST_MAX;if (mvCand.y >= (m_param->searchRange + 1) * 4)continue;if ((m_param->maxSlices > 1) &((mvCand.y < m_sliceMinY)|  (mvCand.y > m_sliceMaxY)))continue;}cu.clipMv(mvCand);// 执行帧间搜索,并计算对应MV带来的损失predInterLumaPixel(pu, tmpPredYuv, *m_slice->m_refReconPicList[list][ref], mvCand);costs[i] = m_me.bufSAD(tmpPredYuv.getLumaAddr(pu.puAbsPartIdx), tmpPredYuv.m_size);}return (costs[0] <= costs[1]) ? 0 : 1;
}
2.2.1.3 进行运动估计(motionEstimate)

运动估计(下面简称ME)是帧间预测的核心部分,主要完成了确认最佳MV和最佳损失的功能,其主要的步骤为
(1)依据前面获取的相邻可用MV,计算每个MV对应的损失,计算的是亚像素级别的SAD损失(subpelCompare)
(2)进行ME
  ME主要使用的是菱形搜索和六边形搜索,在获取了full pixel的best mv之后,会与neighbour mv进行对比。如果ME获取的best mv损失更小,则继续进行亚像素ME;否则,直接使用neighbour mv
(3)进行亚像素ME
  亚像素ME使用的是SATD,先进行1/2像素ME,随后进行1/4像素的ME

PS:需要注意的是,在进行整像素ME,计算的图像数据由fenc给出;在进行亚像素ME时,使用的是插值之后的ref图像数据

int MotionEstimate::motionEstimate(ReferencePlanes *ref,const MV &       mvmin,const MV &       mvmax,const MV &       qmvp,int              numCandidates,const MV *       mvc,int              merange,MV &             outQMv,uint32_t         maxSlices,pixel *          srcReferencePlane)
{ALIGN_VAR_16(int, costs[16]);bool hme = srcReferencePlane && srcReferencePlane == ref->fpelLowerResPlane[0];if (ctuAddr >= 0)blockOffset = ref->reconPic->getLumaAddr(ctuAddr, absPartIdx) - ref->reconPic->getLumaAddr(0);intptr_t stride = hme ? ref->lumaStride / 2 : ref->lumaStride;pixel* fenc = fencPUYuv.m_buf[0];pixel* fref = srcReferencePlane == 0 ? ref->fpelPlane[0] + blockOffset : srcReferencePlane + blockOffset;// qmvp表示前面AMVP当中最佳候选模式,这里设置为初始mv,也就是运动搜索的起点setMVP(qmvp);MV qmvmin = mvmin.toQPel(); // 转换成1/4像素MV qmvmax = mvmax.toQPel();/* The term cost used here means satd/sad values for that particular search.* The costs used in ME integer search only includes the SAD cost of motion* residual and sqrtLambda times MVD bits.  The subpel refine steps use SATD* cost of residual and sqrtLambda * MVD bits.  Mode decision will be based* on video distortion cost (SSE/PSNR) plus lambda times all signaling bits* (mode + MVD bits). */// measure SAD cost at clipped QPEL MVP// 根据min和max值进行clipMV pmv = qmvp.clipped(qmvmin, qmvmax);MV bestpre = pmv;int bprecost;if (ref->isLowres)bprecost = ref->lowresQPelCost(fenc, blockOffset, pmv, sad, hme);else // 对于cliped的AVMP最佳候选模式,进行亚像素级别的运动估计,获得初始损失值bprecost = subpelCompare(ref, pmv, sad);/* re-measure full pel rounded MVP with SAD as search start point */MV bmv = pmv.roundToFPel();int bcost = bprecost;if (pmv.isSubpel())bcost = sad(fenc, FENC_STRIDE, fref + bmv.x + bmv.y * stride, stride) + mvcost(bmv << 2);// measure SAD cost at MV(0) if MVP is not zeroif (pmv.notZero()) {// 如果MVP不为0,则计算零矢量的损失,与当前pmv的损失进行对比int cost = sad(fenc, FENC_STRIDE, fref, stride) + mvcost(MV(0, 0));if (cost < bcost){bcost = cost;bmv = 0;bmv.y = X265_MAX(X265_MIN(0, mvmax.y), mvmin.y);}}X265_CHECK(!(ref->isLowres && numCandidates), "lowres motion candidates not allowed\n")// measure SAD cost at each QPEL motion vector candidate// 1.遍历MV候选列表(mvc),随后计算每个MV对应的损失for (int i = 0; i < numCandidates; i++){MV m = mvc[i].clipped(qmvmin, qmvmax);if (m.notZero() & (m != pmv ? 1 : 0) & (m != bestpre ? 1 : 0)) // check already measured{// mvcost返回的是MVD消耗的比特数,已经乘以lambdaint cost = subpelCompare(ref, m, sad) + mvcost(m);if (cost < bprecost){bprecost = cost;bestpre = m;}}}pmv = pmv.roundToFPel();MV omv = bmv;  // current search origin or starting point// 2.进行运动搜索int search = ref->isHMELowres ? (hme ? searchMethodL0 : searchMethodL1) : searchMethod;switch (search){case X265_DIA_SEARCH:{/* diamond search, radius 1 *//*使用钻石(菱形)搜索,半径为1,搜索的顺序如下,其中0为起始点13 0 42*/bcost <<= 4;int i = merange;do{/*COST_MV_X4_DIR的定义为#define COST_MV_X4_DIR(m0x, m0y, m1x, m1y, m2x, m2y, m3x, m3y, costs) \{ \pixel *pix_base = fref + bmv.x + bmv.y * stride; \sad_x4(fenc, \pix_base + (m0x) + (m0y) * stride, \pix_base + (m1x) + (m1y) * stride, \pix_base + (m2x) + (m2y) * stride, \pix_base + (m3x) + (m3y) * stride, \stride, costs); \(costs)[0] += mvcost((bmv + MV(m0x, m0y)) << 2); \	// 上,MV(0, -1), lambda * R0(costs)[1] += mvcost((bmv + MV(m1x, m1y)) << 2); \  // 下,MV(0,  1), lambda * R1(costs)[2] += mvcost((bmv + MV(m2x, m2y)) << 2); \  // 左,MV(-1, 0), lambda * R2(costs)[3] += mvcost((bmv + MV(m3x, m3y)) << 2); \  // 右,MV(1,  0), lambda * R3}上面的计算方式为: SAD + lambda * R,其中mvcost返回的是当前mvd消耗的比特数,已经乘以了lambda*/COST_MV_X4_DIR(0, -1, 0, 1, -1, 0, 1, 0, costs);/*#define COPY1_IF_LT(x, y) {if ((y) < (x)) (x) = (y);}下面的代码用于存储最佳的cost和对应的位置(1)先将cost左移4位,空出来低4位,用于存储最佳搜索点位置(2)由于bcost前面已经左移4位,所以直接比较就能获取最佳cost(3)获取最佳cost之后,计算对应的mvCOPY1_IF_LT(bcost, (costs[0] << 4) + 1);	//  1 -> 0001,表示上方COPY1_IF_LT(bcost, (costs[1] << 4) + 3);	//  3 -> 0011,表示下方COPY1_IF_LT(bcost, (costs[2] << 4) + 4);	//  4 -> 0100,表示左侧COPY1_IF_LT(bcost, (costs[3] << 4) + 12);	// 12 -> 1100,表示右侧*/if ((bmv.y - 1 >= mvmin.y) & (bmv.y - 1 <= mvmax.y))	// 检查是否超出上边界COPY1_IF_LT(bcost, (costs[0] << 4) + 1);if ((bmv.y + 1 >= mvmin.y) & (bmv.y + 1 <= mvmax.y))	// 检查是否超出下边界COPY1_IF_LT(bcost, (costs[1] << 4) + 3);COPY1_IF_LT(bcost, (costs[2] << 4) + 4);COPY1_IF_LT(bcost, (costs[3] << 4) + 12);// 检查后4位是否为0,如果为0,说明基于原点(0,0)的搜索带来的损失最小,就不用继续搜索了if (!(bcost & 15))break;/*举例如下,如果当前已经确认最好的块为右侧块,即MV(1, 0),则低4位为1100(bcost << 28) >> 30 = 11,bmv.x -= -1,水平方向,向右移动一个单位(这里二进制的11表示-1)(bcost << 30) >> 30 = 00,bmv.y -= 00,垂直方向,不移动*/bmv.x -= (bcost << 28) >> 30; // bcost是int类型,一共32位,先左移28位,保留了最低4位,随后右移30位,取出了x坐标bmv.y -= (bcost << 30) >> 30; // 先左移30位,保留低2位,再右移30位,取出了y坐标// 将后4位置0,取出最佳cost(如果要取出原始的cost,还需要向右移动4位)bcost &= ~15;}while (--i && bmv.checkRange(mvmin, mvmax)); // 检查是否超出了运动搜索范围或者超出了mv的范围bcost >>= 4; // 向右移动4位,此时是真实的最佳的costbreak;}case X265_HEX_SEARCH: // 六边形搜索,半径为2{
me_hex2:/* hexagon search, radius 2 *//*六边形搜索的顺序如下,其中0为初始位置2   31    0    45   6*/
#if 0for (int i = 0; i < merange / 2; i++){omv = bmv;COST_MV(omv.x - 2, omv.y);COST_MV(omv.x - 1, omv.y + 2);COST_MV(omv.x + 1, omv.y + 2);COST_MV(omv.x + 2, omv.y);COST_MV(omv.x + 1, omv.y - 2);COST_MV(omv.x - 1, omv.y - 2);if (omv == bmv)break;if (!bmv.checkRange(mvmin, mvmax))break;}#else // if 0/* equivalent to the above, but eliminates duplicate candidates *//*COST_MV_X3_DIR的定义如下,与前面很类似,只不过这里是一次性计算3个点#define COST_MV_X3_DIR(m0x, m0y, m1x, m1y, m2x, m2y, costs) \{ \pixel *pix_base = fref + bmv.x + bmv.y * stride; \sad_x3(fenc, \pix_base + (m0x) + (m0y) * stride, \pix_base + (m1x) + (m1y) * stride, \pix_base + (m2x) + (m2y) * stride, \stride, costs); \(costs)[0] += mvcost((bmv + MV(m0x, m0y)) << 2); \(costs)[1] += mvcost((bmv + MV(m1x, m1y)) << 2); \(costs)[2] += mvcost((bmv + MV(m2x, m2y)) << 2); \}*/COST_MV_X3_DIR(-2, 0, -1, 2,  1, 2, costs);bcost <<= 3;if ((bmv.y >= mvmin.y) & (bmv.y <= mvmax.y))COPY1_IF_LT(bcost, (costs[0] << 3) + 2);	// 1号位置if ((bmv.y + 2 >= mvmin.y) & (bmv.y + 2 <= mvmax.y)){COPY1_IF_LT(bcost, (costs[1] << 3) + 3);	// 2号位置COPY1_IF_LT(bcost, (costs[2] << 3) + 4);	// 3号位置}COST_MV_X3_DIR(2, 0,  1, -2, -1, -2, costs);	if ((bmv.y >= mvmin.y) & (bmv.y <= mvmax.y))	COPY1_IF_LT(bcost, (costs[0] << 3) + 5);	// 4号位置if ((bmv.y - 2 >= mvmin.y) & (bmv.y - 2 <= mvmax.y)){COPY1_IF_LT(bcost, (costs[1] << 3) + 6);	// 5号位置COPY1_IF_LT(bcost, (costs[2] << 3) + 7);	// 6号位置}// 最佳损失对应的位置是否位于上述6个位置if (bcost & 7){int dir = (bcost & 7) - 2; // 记录最佳位置// const MV hex2[8] = { MV(-1, -2), MV(-2, 0), MV(-1, 2), MV(1, 2), MV(2, 0), MV(1, -2), MV(-1, -2), MV(-2, 0) };if ((bmv.y + hex2[dir + 1].y >= mvmin.y) & (bmv.y + hex2[dir + 1].y <= mvmax.y)){bmv += hex2[dir + 1]; // 更新bmv位置/* half hexagon, not overlapping the previous iteration */// 基于前面搜索的最佳损失位置dir,再进行一次半六边形搜索for (int i = (merange >> 1) - 1; i > 0 && bmv.checkRange(mvmin, mvmax); i--){/*假设前面记录的最佳位置为1号位置,即dir = 0,那么(1)dir + 0 => 5号位置(2)dir + 1 => 1号位置(3)dir + 2 => 2号位置*/COST_MV_X3_DIR(hex2[dir + 0].x, hex2[dir + 0].y,hex2[dir + 1].x, hex2[dir + 1].y,hex2[dir + 2].x, hex2[dir + 2].y,costs);bcost &= ~7;if ((bmv.y + hex2[dir + 0].y >= mvmin.y) & (bmv.y + hex2[dir + 0].y <= mvmax.y))COPY1_IF_LT(bcost, (costs[0] << 3) + 1);if ((bmv.y + hex2[dir + 1].y >= mvmin.y) & (bmv.y + hex2[dir + 1].y <= mvmax.y))COPY1_IF_LT(bcost, (costs[1] << 3) + 2);if ((bmv.y + hex2[dir + 2].y >= mvmin.y) & (bmv.y + hex2[dir + 2].y <= mvmax.y))COPY1_IF_LT(bcost, (costs[2] << 3) + 3);if (!(bcost & 7))break;dir += (bcost & 7) - 2;dir = mod6m1[dir + 1];bmv += hex2[dir + 1];}} // if ((bmv.y + hex2[dir + 1].y >= mvmin.y) & (bmv.y + hex2[dir + 1].y <= mvmax.y))}bcost >>= 3; // 获取真实的最佳损失
#endif // if 0/* square refine */// 进行正方形搜索,获取更加精细的MV/*正方形搜索的顺序为6 2 7 3 0 45 1 8*/int dir = 0;COST_MV_X4_DIR(0, -1,  0, 1, -1, 0, 1, 0, costs);if ((bmv.y - 1 >= mvmin.y) & (bmv.y - 1 <= mvmax.y))COPY2_IF_LT(bcost, costs[0], dir, 1);if ((bmv.y + 1 >= mvmin.y) & (bmv.y + 1 <= mvmax.y))COPY2_IF_LT(bcost, costs[1], dir, 2);COPY2_IF_LT(bcost, costs[2], dir, 3);COPY2_IF_LT(bcost, costs[3], dir, 4);COST_MV_X4_DIR(-1, -1, -1, 1, 1, -1, 1, 1, costs);if ((bmv.y - 1 >= mvmin.y) & (bmv.y - 1 <= mvmax.y))COPY2_IF_LT(bcost, costs[0], dir, 5);if ((bmv.y + 1 >= mvmin.y) & (bmv.y + 1 <= mvmax.y))COPY2_IF_LT(bcost, costs[1], dir, 6);if ((bmv.y - 1 >= mvmin.y) & (bmv.y - 1 <= mvmax.y))COPY2_IF_LT(bcost, costs[2], dir, 7);if ((bmv.y + 1 >= mvmin.y) & (bmv.y + 1 <= mvmax.y))COPY2_IF_LT(bcost, costs[3], dir, 8);// const MV square1[9] = { MV(0, 0), MV(0, -1), MV(0, 1), MV(-1, 0), MV(1, 0), MV(-1, -1), MV(-1, 1), MV(1, -1), MV(1, 1) };bmv += square1[dir];break;}case X265_UMH_SEARCH: // 非对称十字多边形搜索(比较复杂,没研究过){// ...}case X265_STAR_SEARCH: // Adapted from HM ME{	// 星型搜索(slow及更慢的档位会使用,没有研究)// ...}case X265_SEA: // Successive Elimination Algorithm{// ...}case X265_FULL_SEARCH: // 全搜索{// ...}default:X265_CHECK(0, "invalid motion estimate mode\n");break;}/*3.进行亚像素搜索检查相邻块MV的最佳cost与运动搜索出来的最佳cost关系(1)如果相邻块MV的性能更好,即bprecost < bcost,则抛弃当前搜索出来的mv,使用相邻块的mv(2)否则,使用当前搜索出来的mv进行后续的亚像素搜索*/if (bprecost < bcost){bmv = bestpre;bcost = bprecost;}elsebmv = bmv.toQPel(); // promote search bmv to qpelconst SubpelWorkload& wl = workload[this->subpelRefine];// check mv range for slice bound// 检查mv是否超出了slice边界,一般配置下一个slice就是一帧,这种情况出现的概率应该比较低if ((maxSlices > 1) & ((bmv.y < qmvmin.y) | (bmv.y > qmvmax.y))){bmv.y = x265_min(x265_max(bmv.y, qmvmin.y), qmvmax.y);bcost = subpelCompare(ref, bmv, satd) + mvcost(bmv);}if (!bcost) // 没有损失,直接跳过子像素搜索,此时返回的cost只包括比特开销{/* if there was zero residual at the clipped MVP, we can skip subpel* refine, but we do need to include the mvcost in the returned cost */bcost = mvcost(bmv);}else if (ref->isLowres) // 低分辨率图像{// ..}else{	pixelcmp_t hpelcomp;// 检查使用satd还是使用sad衡量损失(默认应该是satd)if (wl.hpel_satd){bcost = subpelCompare(ref, bmv, satd) + mvcost(bmv);hpelcomp = satd;}elsehpelcomp = sad;// 进行1/2像素运动搜索for (int iter = 0; iter < wl.hpel_iters; iter++){int bdir = 0;for (int i = 1; i <= wl.hpel_dirs; i++){// 按照正方形方式进行搜索MV qmv = bmv + square1[i] * 2;// check mv range for slice boundif ((qmv.y < qmvmin.y) | (qmv.y > qmvmax.y))continue;// 计算损失并确认最佳的MVint cost = subpelCompare(ref, qmv, hpelcomp) + mvcost(qmv);COPY2_IF_LT(bcost, cost, bdir, i);}if (bdir)bmv += square1[bdir] * 2;elsebreak;}/* if HPEL search used SAD, remeasure with SATD before QPEL */// 如果半像素搜索使用了SAD,那么需要在进行评估1/4像素之前重新使用SATD计算一边,因为1/4像素搜索使用的是SATDif (!wl.hpel_satd)bcost = subpelCompare(ref, bmv, satd) + mvcost(bmv);// 进行1/4像素运动搜索for (int iter = 0; iter < wl.qpel_iters; iter++){int bdir = 0;for (int i = 1; i <= wl.qpel_dirs; i++){MV qmv = bmv + square1[i];// check mv range for slice boundif ((qmv.y < qmvmin.y) | (qmv.y > qmvmax.y))continue;int cost = subpelCompare(ref, qmv, satd) + mvcost(qmv);COPY2_IF_LT(bcost, cost, bdir, i);}if (bdir)bmv += square1[bdir];elsebreak;}}// check mv range for slice boundX265_CHECK(((bmv.y >= qmvmin.y) & (bmv.y <= qmvmax.y)), "mv beyond range!");x265_emms();outQMv = bmv;return bcost;
}

2.3 P帧当中的Intra模式(checkIntraInInter)

如果Inter模式带来的损失值比较大,P帧当中的一些块也有可能会使用Intra模式,整体的流程基本和帧内预测一致

/* Note that this function does not save the best intra prediction, it must* be generated later. It records the best mode in the cu */
void Search::checkIntraInInter(Mode& intraMode, const CUGeom& cuGeom)
{ProfileCUScope(intraMode.cu, intraAnalysisElapsedTime, countIntraAnalysis);CUData& cu = intraMode.cu;uint32_t depth = cuGeom.depth;cu.setPartSizeSubParts(SIZE_2Nx2N);cu.setPredModeSubParts(MODE_INTRA);const uint32_t initTuDepth = 0;uint32_t log2TrSize = cuGeom.log2CUSize - initTuDepth;uint32_t tuSize = 1 << log2TrSize;const uint32_t absPartIdx = 0;// Reference sample smoothingIntraNeighbors intraNeighbors;initIntraNeighbors(cu, absPartIdx, initTuDepth, true, &intraNeighbors);initAdiPattern(cu, cuGeom, absPartIdx, intraNeighbors, ALL_IDX);const pixel* fenc = intraMode.fencYuv->m_buf[0];uint32_t stride = intraMode.fencYuv->m_size;int sad, bsad;uint32_t bits, bbits, mode, bmode;uint64_t cost, bcost;// 33 Angle modes onceint scaleTuSize = tuSize;int scaleStride = stride;int costShift = 0;int sizeIdx = log2TrSize - 2;if (tuSize > 32) // CU尺寸是否为64{// CU is 64x64, we scale to 32x32 and adjust required parametersprimitives.scale2D_64to32(m_fencScaled, fenc, stride);fenc = m_fencScaled;pixel nScale[129];intraNeighbourBuf[1][0] = intraNeighbourBuf[0][0];primitives.scale1D_128to64[NONALIGNED](nScale + 1, intraNeighbourBuf[0] + 1);// we do not estimate filtering for downscaled samplesmemcpy(&intraNeighbourBuf[0][1], &nScale[1], 2 * 64 * sizeof(pixel));   // Top & Left pixelsmemcpy(&intraNeighbourBuf[1][1], &nScale[1], 2 * 64 * sizeof(pixel));scaleTuSize = 32;scaleStride = 32;costShift = 2;sizeIdx = 5 - 2; // log2(scaleTuSize) - 2}pixelcmp_t sa8d = primitives.cu[sizeIdx].sa8d;int predsize = scaleTuSize * scaleTuSize;m_entropyCoder.loadIntraDirModeLuma(m_rqt[depth].cur);/* there are three cost tiers for intra modes:*  pred[0]          - mode probable, least cost*  pred[1], pred[2] - less probable, slightly more cost*  non-mpm modes    - all cost the same (rbits) */// 初始化MPMuint64_t mpms;uint32_t mpmModes[3];uint32_t rbits = getIntraRemModeBits(cu, absPartIdx, mpmModes, mpms);// DC// 进行DC模式的预测primitives.cu[sizeIdx].intra_pred[DC_IDX](m_intraPredAngs, scaleStride, intraNeighbourBuf[0], 0, (scaleTuSize <= 16));bsad = sa8d(fenc, scaleStride, m_intraPredAngs, scaleStride) << costShift;bmode = mode = DC_IDX;bbits = (mpms & ((uint64_t)1 << mode)) ? m_entropyCoder.bitsIntraModeMPM(mpmModes, mode) : rbits;bcost = m_rdCost.calcRdSADCost(bsad, bbits);// PLANAR// 进行Planar模式的预测pixel* planar = intraNeighbourBuf[0];if (tuSize & (8 | 16 | 32))planar = intraNeighbourBuf[1];primitives.cu[sizeIdx].intra_pred[PLANAR_IDX](m_intraPredAngs, scaleStride, planar, 0, 0);sad = sa8d(fenc, scaleStride, m_intraPredAngs, scaleStride) << costShift;mode = PLANAR_IDX;bits = (mpms & ((uint64_t)1 << mode)) ? m_entropyCoder.bitsIntraModeMPM(mpmModes, mode) : rbits;cost = m_rdCost.calcRdSADCost(sad, bits);COPY4_IF_LT(bcost, cost, bmode, mode, bsad, sad, bbits, bits);bool allangs = true;if (primitives.cu[sizeIdx].intra_pred_allangs){primitives.cu[sizeIdx].transpose(m_fencTransposed, fenc, scaleStride);primitives.cu[sizeIdx].intra_pred_allangs(m_intraPredAngs, intraNeighbourBuf[0], intraNeighbourBuf[1], (scaleTuSize <= 16)); }elseallangs = false;// 定义角度模式的实现方式
#define TRY_ANGLE(angle) \if (allangs) { \if (angle < 18) \sad = sa8d(m_fencTransposed, scaleTuSize, &m_intraPredAngs[(angle - 2) * predsize], scaleTuSize) << costShift; \else \sad = sa8d(fenc, scaleStride, &m_intraPredAngs[(angle - 2) * predsize], scaleTuSize) << costShift; \bits = (mpms & ((uint64_t)1 << angle)) ? m_entropyCoder.bitsIntraModeMPM(mpmModes, angle) : rbits; \cost = m_rdCost.calcRdSADCost(sad, bits); \} else { \int filter = !!(g_intraFilterFlags[angle] & scaleTuSize); \primitives.cu[sizeIdx].intra_pred[angle](m_intraPredAngs, scaleTuSize, intraNeighbourBuf[filter], angle, scaleTuSize <= 16); \sad = sa8d(fenc, scaleStride, m_intraPredAngs, scaleTuSize) << costShift; \bits = (mpms & ((uint64_t)1 << angle)) ? m_entropyCoder.bitsIntraModeMPM(mpmModes, angle) : rbits; \cost = m_rdCost.calcRdSADCost(sad, bits); \}// 是否允许快速帧内预测if (m_param->bEnableFastIntra){int asad = 0;uint32_t lowmode, highmode, amode = 5, abits = 0;uint64_t acost = MAX_INT64;/* pick the best angle, sampling at distance of 5 */for (mode = 5; mode < 35; mode += 5){TRY_ANGLE(mode);COPY4_IF_LT(acost, cost, amode, mode, asad, sad, abits, bits);}/* refine best angle at distance 2, then distance 1 */for (uint32_t dist = 2; dist >= 1; dist--){lowmode = amode - dist;highmode = amode + dist;X265_CHECK(lowmode >= 2 && lowmode <= 34, "low intra mode out of range\n");TRY_ANGLE(lowmode);COPY4_IF_LT(acost, cost, amode, lowmode, asad, sad, abits, bits);X265_CHECK(highmode >= 2 && highmode <= 34, "high intra mode out of range\n");TRY_ANGLE(highmode);COPY4_IF_LT(acost, cost, amode, highmode, asad, sad, abits, bits);}if (amode == 33){TRY_ANGLE(34);COPY4_IF_LT(acost, cost, amode, 34, asad, sad, abits, bits);}COPY4_IF_LT(bcost, acost, bmode, amode, bsad, asad, bbits, abits);}else // calculate and search all intra prediction angles for lowest cost{// 遍历35种模式for (mode = 2; mode < 35; mode++){TRY_ANGLE(mode);COPY4_IF_LT(bcost, cost, bmode, mode, bsad, sad, bbits, bits);}}cu.setLumaIntraDirSubParts((uint8_t)bmode, absPartIdx, depth + initTuDepth);intraMode.initCosts();intraMode.totalBits = bbits;intraMode.distortion = bsad;intraMode.sa8dCost = bcost;intraMode.sa8dBits = bbits;
}

这样x265的帧间预测简单分析就结束了

相关文章:

  • 北京网站建设多少钱?
  • 辽宁网页制作哪家好_网站建设
  • 高端品牌网站建设_汉中网站制作
  • springboot提升-切面编程
  • macos系统内置php文件列表 系统自带php卸载方法
  • 沐渥科技:两显氮气柜和三显氮气柜要怎么选择?
  • FPGA开发:可编程逻辑器件概述
  • vue中的css深度选择器
  • 基于STM32的RTOS--freertos的使用(HAL实现多任务)
  • react 子组件调用父组件方法,获取的数据不是最新值
  • 用RNN(循环神经网络)预测股票价格
  • 前端技术(六)—— AJAX详解
  • 为什么 2!=false 和 2!=true 返回的都是true
  • Java-IO:浅谈对IO的认识
  • 【大规模语言模型:从理论到实践】Transformer中PositionalEncoder详解
  • java 给list对象根据给定条数进行分组工具类
  • 视频中的噪点怎么去除?
  • ES之三:springboot集成ES
  • 【407天】跃迁之路——程序员高效学习方法论探索系列(实验阶段164-2018.03.19)...
  • 77. Combinations
  • canvas 五子棋游戏
  • classpath对获取配置文件的影响
  • Druid 在有赞的实践
  • ES6, React, Redux, Webpack写的一个爬 GitHub 的网页
  • gf框架之分页模块(五) - 自定义分页
  • git 常用命令
  • If…else
  • markdown编辑器简评
  • MySQL用户中的%到底包不包括localhost?
  • Octave 入门
  • Shadow DOM 内部构造及如何构建独立组件
  • 从伪并行的 Python 多线程说起
  • 七牛云假注销小指南
  • 它承受着该等级不该有的简单, leetcode 564 寻找最近的回文数
  • #我与Java虚拟机的故事#连载06:收获颇多的经典之作
  • #我与Java虚拟机的故事#连载16:打开Java世界大门的钥匙
  • (31)对象的克隆
  • (LeetCode 49)Anagrams
  • (vue)el-cascader级联选择器按勾选的顺序传值,摆脱层级约束
  • (八)Flink Join 连接
  • (附源码)springboot 个人网页的网站 毕业设计031623
  • (附源码)springboot掌上博客系统 毕业设计063131
  • (十三)Maven插件解析运行机制
  • (贪心) LeetCode 45. 跳跃游戏 II
  • (淘宝无限适配)手机端rem布局详解(转载非原创)
  • (转)jQuery 基础
  • (转)MVC3 类型“System.Web.Mvc.ModelClientValidationRule”同时存在
  • ***linux下安装xampp,XAMPP目录结构(阿里云安装xampp)
  • .net 8 发布了,试下微软最近强推的MAUI
  • .NET C# 使用GDAL读取FileGDB要素类
  • .NET 依赖注入和配置系统
  • .NET6 开发一个检查某些状态持续多长时间的类
  • .NET单元测试
  • .Net多线程总结
  • .NET基础篇——反射的奥妙
  • /etc/sudoers (root权限管理)
  • /proc/interrupts 和 /proc/stat 查看中断的情况
  • @ 代码随想录算法训练营第8周(C语言)|Day57(动态规划)