ReceiverTracker是怎么处理数据的

发布时间：2021-12-16 16:32:03 阅读：138 作者：iii 栏目：云计算

开发者测试专用服务器限时活动，0元免费领，库存有限，领完即止！点击查看>>

本篇内容介绍了“ReceiverTracker是怎么处理数据的”的有关知识，在实际案例的操作过程中，不少人都会遇到这样的困境，接下来就让小编带领大家学习一下如何处理这些情况吧！希望大家仔细阅读，能够学有所成！

ReceiverTracker可以以Driver中具体的算法计算出在具体的executor上启动Receiver。启动Receiver的方法是封装在一个task中运行，这个task是job中唯一的task。实质上讲，ReceiverTracker启动Receiver时封装成一个又一个的job。启动Receiver的方法中有一个ReceiverSupervisorImpl，ReceiverSupervisorImpl的start方法会导致Receiver早work节点上真正的执行。转过来通过BlockGenerator把接收到的数据放入block中，并通过ReceiverSupervisorImpl把block进行存储，然后把数据的元数据汇报给ReceiverTracker。

下面就讲ReceiverTracker在接收到数据之后具体怎么处理。

ReceiverSupervisorImpl把block进行存储是通过receivedBlockHandler来写的。

private val receivedBlockHandler: ReceivedBlockHandler = {
  if (WriteAheadLogUtils.enableReceiverLog(env.conf)) {
    ...
    new WriteAheadLogBasedBlockHandler(env.blockManager, receiver.streamId,
      receiver.storageLevel, env.conf, hadoopConf, checkpointDirOption.get)
  } else {
    new BlockManagerBasedBlockHandler(env.blockManager, receiver.storageLevel)
  }
}

一种是通过WAL的方式，一种是通过BlockManager的方式。

/** Store block and report it to driver */
def pushAndReportBlock(
    receivedBlock: ReceivedBlock,
    metadataOption: Option[Any],
    blockIdOption: Option[StreamBlockId]
  ) {
  val blockId = blockIdOption.getOrElse(nextBlockId)
  val time = System.currentTimeMillis
  val blockStoreResult = receivedBlockHandler.storeBlock(blockId, receivedBlock)
  logDebug(s"Pushed block $blockId in ${(System.currentTimeMillis - time)} ms")
  val numRecords = blockStoreResult.numRecords
  val blockInfo = ReceivedBlockInfo(streamId, numRecords, metadataOption, blockStoreResult)
  trackerEndpoint.askWithRetry[Boolean](AddBlock(blockInfo))
  logDebug(s"Reported block $blockId")
}

把数据存储起来切向ReceiverTracker汇报。汇报的时候是元数据。

/** Information about blocks received by the receiver */
private[streaming] case class ReceivedBlockInfo(
    streamId: Int,
    numRecords: Option[Long],
    metadataOption: Option[Any],
    blockStoreResult: ReceivedBlockStoreResult

Sealed关键字的意思就是所有的子类都在当前的文件中

ReceiverTracker管理Receiver的启动、回收、接收汇报的元数据。ReceiverTracker在实例化之前必须所有的input stream都已经被added和streamingcontext.start()。因为ReceiverTracker要为每个input stream启动一个Receiver。

ReceiverTracker中有所有的输入数据来源和ID。

private val receiverInputStreams = ssc.graph.getReceiverInputStreams()
private val receiverInputStreamIds = receiverInputStreams.map { _.id }

ReceiverTracker的状态

/** Enumeration to identify current state of the ReceiverTracker */
object TrackerState extends Enumeration {
type TrackerState = Value
val Initialized, Started, Stopping, Stopped = Value
}

下面看一下ReceiverTracker在接收到ReceiverSupervisorImpl发送的AddBlock的消息后的处理。

case AddBlock(receivedBlockInfo) =>
  if (WriteAheadLogUtils.isBatchingEnabled(ssc.conf, isDriver = true)) {
    walBatchingThreadPool.execute(new Runnable {
      override def run(): Unit = Utils.tryLogNonFatalError {
        if (active) {
          context.reply(addBlock(receivedBlockInfo))
        } else {
          throw new IllegalStateException("ReceiverTracker RpcEndpoint shut down.")
        }
      }
    })
  } else {
    context.reply(addBlock(receivedBlockInfo))
  }

先判断一下是不是WAL得方式，如果是就用线程池中的一个线程来回复addBlock，因为WAL非常消耗性能。否则就直接回复addBlock。

让后交给receiverBlockTracker 进行处理

/** Add new blocks for the given stream */
private def addBlock(receivedBlockInfo: ReceivedBlockInfo): Boolean = {
receivedBlockTracker.addBlock(receivedBlockInfo)
}

ReceiverBlockTracker是在Driver端管理blockInfo的。

/** Add received block. This event will get written to the write ahead log (if enabled). */
def addBlock(receivedBlockInfo: ReceivedBlockInfo): Boolean = {
  try {
    val writeResult = writeToLog(BlockAdditionEvent(receivedBlockInfo))
    if (writeResult) {
      synchronized {
        getReceivedBlockQueue(receivedBlockInfo.streamId) += receivedBlockInfo
      }
      logDebug(s"Stream ${receivedBlockInfo.streamId} received " +
        s"block ${receivedBlockInfo.blockStoreResult.blockId}")
    } else {
      logDebug(s"Failed to acknowledge stream ${receivedBlockInfo.streamId} receiving " +
        s"block ${receivedBlockInfo.blockStoreResult.blockId} in the Write Ahead Log.")
    }
    writeResult
  } catch {
    case NonFatal(e) =>
      logError(s"Error adding block $receivedBlockInfo", e)
      false
  }
}

writeToLog的代码很简单，首先判断是不是WAL得方式，如果是就把blockInfo写入到日志中，用于以后恢复数据。否则的话就直接返回true。然后就把block的信息放入streamIdToUnallocatedBlockQueues中。

private val streamIdToUnallocatedBlockQueues = new mutable.HashMap[Int, ReceivedBlockQueue]

这个数据结构很精妙，key是streamid，value是一个队列，把每一个stream接收的block信息分开存储。这样ReceiverBlockTracker就有了所有stream接收到的block信息。

/** Write an update to the tracker to the write ahead log */
private def writeToLog(record: ReceivedBlockTrackerLogEvent): Boolean = {
  if (isWriteAheadLogEnabled) {
    logTrace(s"Writing record: $record")
    try {
      writeAheadLogOption.get.write(ByteBuffer.wrap(Utils.serialize(record)),
        clock.getTimeMillis())
      true
    } catch {
      case NonFatal(e) =>
        logWarning(s"Exception thrown while writing record: $record to the WriteAheadLog.", e)
        false
    }
  } else {
    true
  }
}

详细看一下ReceiverBlockTracker的注释。这个class会追踪所有接收到的blocks，并把他们按batch分配，如果有需要这个class接收的所有action都可以写WAL中，如果指定了checkpoint的目录，当Driver崩溃了，ReceiverBlockTracker的状态（包括接收的blocks和分配的blocks）都可以恢复。如果实例化这个class的时候指定了checkpoint，就会从中读取之前保存的信息。

/**
* Class that keep track of all the received blocks, and allocate them to batches
* when required. All actions taken by this class can be saved to a write ahead log
* (if a checkpoint directory has been provided), so that the state of the tracker
* (received blocks and block-to-batch allocations) can be recovered after driver failure.
*
* Note that when any instance of this class is created with a checkpoint directory,
* it will try reading events from logs in the directory.
*/
private[streaming] class ReceivedBlockTracker(

下面看一下ReceiverTracker接收到CleanupOldBlocks后的处理。

case c: CleanupOldBlocks =>
receiverTrackingInfos.values.flatMap(_.endpoint).foreach(_.send(c))

ReceiverTracker接收到这条消息后会给它管理的每一个Receiver发送这个消息。ReceiverSupervisorImpl接收到消息后使用receivedBlockHandler清理数据。

private def cleanupOldBlocks(cleanupThreshTime: Time): Unit = {
logDebug(s"Cleaning up blocks older then $cleanupThreshTime")
receivedBlockHandler.cleanupOldBlocks(cleanupThreshTime.milliseconds)
}

ReceiverTracker还可以随时调整某一个streamID接收数据的速度，向对应的ReceiverSupervisorImpl发送UpdateRateLimit的消息。

case UpdateReceiverRateLimit(streamUID, newRate) =>
  for (info <- receiverTrackingInfos.get(streamUID); eP <- info.endpoint) {
    eP.send(UpdateRateLimit(newRate))
  }

ReceiverSupervisorImpl接收到消息后。

case UpdateRateLimit(eps) =>
  logInfo(s"Received a new rate limit: $eps.")
  registeredBlockGenerators.foreach { bg =>
    bg.updateRate(eps)
  }

/**
* Set the rate limit to `newRate`. The new rate will not exceed the maximum rate configured by
* {{{spark.streaming.receiver.maxRate}}}, even if `newRate` is higher than that.
*
* @param newRate A new rate in events per second. It has no effect if it's 0 or negative.
*/
private[receiver] def updateRate(newRate: Long): Unit =
  if (newRate > 0) {
    if (maxRateLimit > 0) {
      rateLimiter.setRate(newRate.min(maxRateLimit))
    } else {
      rateLimiter.setRate(newRate)
    }
  }

ReceiverTracker是一个门面设计模式，看似调用的是ReceiverTracker的功能，其实调用的是别的类的功能。

“ReceiverTracker是怎么处理数据的”的内容就介绍到这里了，感谢大家的阅读。如果想了解更多行业相关的知识可以关注亿速云网站，小编将为大家输出更多高质量的实用文章！

亿速云「云服务器」，即开即用、新一代英特尔至强铂金CPU、三副本存储NVMe SSD云盘，价格低至29元/月。点击查看>>

向AI问一下细节

ReceiverTracker是怎么处理数据的

猜你喜欢

最新资讯

相关推荐

开发者交流群：

相关标签