走读webrtc 中的视频JitterBuffer(二) · 音视频开发之路

## CMDecodingState VCMDecodingState 是用于判断nalu是否可以连续解码，判断的依据因不同编码格式而不同。它支持了三种编码格式：VP8，VP9，H264，看下它定义的几个成员变量 ~~~ uint16_t sequence_num_; uint32_t time_stamp_; int picture_id_; int temporal_id_; int tl0_pic_id_; bool full_sync_; // Sync flag when temporal layers are used. ~~~ picture\_id,temporal\_id,tl0\_pic\_id是携带在vp8，vp9中的信息，用于标识Nalu间的关系及是否可连续解码。而H264并没有携带这些信息，在成员函数`ContinuousFrame`中，可以看到对H264的处理逻辑。在这篇文章里也只关心H264的处理。 ### 成员函数 ContinuousFrame ~~~ bool VCMDecodingState::ContinuousFrame(const VCMFrameBuffer* frame) const { // Check continuity based on the following hierarchy: // - Temporal layers (stop here if out of sync). // - Picture Id when available. // - Sequence numbers. // Return true when in initial state. // Note that when a method is not applicable it will return false. assert(frame != NULL); // A key frame is always considered continuous as it doesn't refer to any // frames and therefore won't introduce any errors even if prior frames are // missing. if (frame->FrameType() == VideoFrameType::kVideoFrameKey && HaveSpsAndPps(frame->GetNaluInfos())) { return true; } // When in the initial state we always require a key frame to start decoding. if (in_initial_state_) return false; if (ContinuousLayer(frame->TemporalId(), frame->Tl0PicId())) return true; // tl0picId is either not used, or should remain unchanged. if (frame->Tl0PicId() != tl0_pic_id_) return false; // Base layers are not continuous or temporal layers are inactive. // In the presence of temporal layers, check for Picture ID/sequence number // continuity if sync can be restored by this frame. if (!full_sync_ && !frame->LayerSync()) return false; if (UsingPictureId(frame)) { if (UsingFlexibleMode(frame)) { return ContinuousFrameRefs(frame); } else { return ContinuousPictureId(frame->PictureId()); } } else { return ContinuousSeqNum(static_cast<uint16_t>(frame->GetLowSeqNum())) && HaveSpsAndPps(frame->GetNaluInfos()); } } ~~~ 对H264的nalu，pic\_id值为kNoPictureId，Tl0picId的值为kNoTl0PicIdx，TemporalId的值为kNoTemporaId。所以对pictureid或temporalid的判断，都是可以忽略。那么对H264的执行逻辑是这段语句 ~~~ return ContinuousSeqNum(static_cast<uint16_t>(frame->GetLowSeqNum())) && HaveSpsAndPps(frame->GetNaluInfos()); ~~~ 是通过seqnum，是否有sps，pps来判断帧间的解码连续性。 **如果两个nalu是连续的则后一个的nalu的中最小的seqnum是等于前一个nalu中最大的seqnum加1的，成员函数ContinuousSeqNum就是这个判断逻辑。** ### 成员函数HaveSpsAndPps 它做了两件事: 1. 判断nalu是否是同一个GOP 2. 判断GOP中是否有SPS和PPS ~~~ bool VCMDecodingState::HaveSpsAndPps(const std::vector<NaluInfo>& nalus) const { std::set<int> new_sps; std::map<int, int> new_pps; for (const NaluInfo& nalu : nalus) { // Check if this nalu actually contains sps/pps information or dependencies. if (nalu.sps_id == -1 && nalu.pps_id == -1) continue; switch (nalu.type) { case H264::NaluType::kPps: if (nalu.pps_id < 0) { RTC_LOG(LS_WARNING) << "Received pps without pps id."; } else if (nalu.sps_id < 0) { RTC_LOG(LS_WARNING) << "Received pps without sps id."; } else { new_pps[nalu.pps_id] = nalu.sps_id; } break; case H264::NaluType::kSps: if (nalu.sps_id < 0) { RTC_LOG(LS_WARNING) << "Received sps without sps id."; } else { new_sps.insert(nalu.sps_id); } break; default: { int needed_sps = -1; auto pps_it = new_pps.find(nalu.pps_id); if (pps_it != new_pps.end()) { needed_sps = pps_it->second; } else { auto pps_it2 = received_pps_.find(nalu.pps_id); if (pps_it2 == received_pps_.end()) { return false; } needed_sps = pps_it2->second; } if (new_sps.find(needed_sps) == new_sps.end() && received_sps_.find(needed_sps) == received_sps_.end()) { return false; } break; } } } return true; } ~~~ 是否是同一个GOP的判断是根据sps\_id和pps\_id： 1. **pps\_id为 pic\_parameter\_set\_id**，表示当前pps的id，某个pps在码流中会被相应的slice引用。slice引用pps的方式就是在slice header中保存pps的 id。 2. **sps\_id为 seq\_parameter\_set\_id**，表示当前sps的id。被pps引用，在pps中带有所引用的sps的id。 **那么在一个GOP内的nalu，各slice中pps id应该是相同的。pps中的sps id与sps中的 id是相同的。如果两个nalu的seqnum是连续的，且属于同一个GOP，且存在SPS，PPS，则认为帧间是可连续解码的。** ### VCMJitterBuffer中对nalu是否可连续解码的处理知道了H264判断nalu间是否可连续解码的依据，再回过头来看看VMCJitterBuffer的**InsertPacket**方法关于nalu间是否可连续解码的逻辑，涉及到三个成员函数：**FindAndInsertContinuousFramesWithState，FindAndInsertContinuousFrames，IsContinuous** * **FindAndInsertContinuousFramesWithState**成员函数，它的作用就是根据最近一次可解码nalu的信息(记录在VCMDecodingState中)在incomplete framelist中寻找同属一个GOP内的nalu。从incomplete framelis中删除，插入到decodable framelist中 ~~~ void VCMJitterBuffer::FindAndInsertContinuousFramesWithState( const VCMDecodingState& original_decoded_state) {//寻找同一个GOP内的Nalu // Copy original_decoded_state so we can move the state forward with each // decodable frame we find. VCMDecodingState decoding_state; decoding_state.CopyFrom(original_decoded_state); // When temporal layers are available, we search for a complete or decodable // frame until we hit one of the following: // 1. Continuous base or sync layer. // 2. The end of the list was reached. //对H264可以忽略temporal的处理逻辑 for (FrameList::iterator it = incomplete_frames_.begin();it != incomplete_frames_.end();) { VCMFrameBuffer* frame = it->second; if (IsNewerTimestamp(original_decoded_state.time_stamp(),frame->Timestamp())) { ++it; continue; } if (IsContinuousInState(*frame, decoding_state)) { decodable_frames_.InsertFrame(frame); incomplete_frames_.erase(it++); decoding_state.SetState(frame); } else if (frame->TemporalId() <= 0) { break; } else { ++it; } } } ~~~ * 成员函数**FindAndInsertContinuousFrames**，是通过一个nalu在incomplete framelist中寻找同属一个GOP内的nalu ~~~ void VCMJitterBuffer::FindAndInsertContinuousFrames( const VCMFrameBuffer& new_frame) { VCMDecodingState decoding_state; decoding_state.CopyFrom(last_decoded_state_); decoding_state.SetState(&new_frame); FindAndInsertContinuousFramesWithState(decoding_state); } ~~~ * 成员函数**IsContinuous**是用于判断nalu是否可以连续解码 ~~~ bool VCMJitterBuffer::IsContinuous(const VCMFrameBuffer& frame) const { if (IsContinuousInState(frame, last_decoded_state_)) {//与last_decoded_state_代表的上一个nalu是可连续解码的 return true; } //还有一种情况：该frame与last_decoded_state_代表的nalu是在seqnum上是不连续， //但是属于同一个GOP内的，所以要遍历decodable framelist进行判断 VCMDecodingState decoding_state; decoding_state.CopyFrom(last_decoded_state_); for (FrameList::const_iterator it = decodable_frames_.begin(); it != decodable_frames_.end(); ++it) { VCMFrameBuffer* decodable_frame = it->second; if (IsNewerTimestamp(decodable_frame->Timestamp(), frame.Timestamp())) { break; } decoding_state.SetState(decodable_frame); if (IsContinuousInState(frame, decoding_state)) { return true; } } return false; } ~~~ 判断nalu是否可连续解码，需要考虑两种情况： 1. 该nalu与last\_decoded\_state\_代表的上一个nalu在同一个GOP内，且seqnum是连续的。 2. 属于同一个GOP，但是seqnum不连续，此时应该去遍历decodable framelist，寻找在同一个GOP内，seqnum连续的nalu。对VCMJitterBuffer的插入操作，就时涉及到对rtp包的处理和对nalu，GOP的处理。也通过这两篇文章讲的比较清楚了。后面将会关注去nalu的处理。