本篇内容介绍了“hadoop2.4源码分析”的有关知识,在实际案例的操作过程中,不少人都会遇到这样的困境,接下来就让小编带领大家学习一下如何处理这些情况吧!希望大家仔细阅读,能够学有所成!
ZKFailoverController是整个HA的协调者。下面我们将分析几个实际的问题。
1.怎么协调选举的?怎么选举出来active的?
2.active宕机后,做了什么事情,如何切换的?
下面,我们来分析第一个问题 怎么协调选举的?怎么选举出来active的?
步骤1:参看NameNode源码,可以看出,对于使用HA的NN来说,进入Standby是必须的。 升级除外
protected HAState createHAState(StartupOption startOpt) { if (!haEnabled || startOpt == StartupOption.UPGRADE) { return ACTIVE_STATE; } else { return STANDBY_STATE; //standby状态 } }
步骤2:此时的HealthMonitor监控NN,发现是HEALTH的状态,会执行:
if (healthy) { //设置状态,用于通知回调函数 enterState(State.SERVICE_HEALTHY); }
enterState会通知回调函数,进行处理。对于HEALTH状态的开始执行选举方法。
elector.joinElection(targetToData(localTarget));
通过创建零时节点,来抢占节点,获取Active
createLockNodeAsync();
对于创建节点,会触发ZK的EVENT时间。
对于事件的处理,见源码部分:
public synchronized void processResult(int rc, String path, Object ctx, String name) { if (isStaleClient(ctx)) return; LOG.debug("CreateNode result: " + rc + " for path: " + path + " connectionState: " + zkConnectionState + " for " + this); Code code = Code.get(rc);//为了方便使用,这里自定义了一组状态 if (isSuccess(code)) {//成功返回,成功创建zklocakpath节点 // we successfully created the znode. we are the leader. start monitoring if (becomeActive()) {//要将本节点上的NN变成active monitorActiveStatus();//继续监控节点状态 } else { reJoinElectionAfterFailureToBecomeActive();//失败,继续选举尝试 } return; } if (isNodeExists(code)) {//节点存在,说明已经有active,wait即可 if (createRetryCount == 0) { // znode exists and we did not retry the operation. so a different // instance has created it. become standby and monitor lock. becomeStandby(); } // if we had retried then the znode could have been created by our first // attempt to the server (that we lost) and this node exists response is // for the second attempt. verify this case via ephemeral node owner. this // will happen on the callback for monitoring the lock. monitorActiveStatus();//不过努力成为active的动作不能停 return; } String errorMessage = "Received create error from Zookeeper. code:" + code.toString() + " for path " + path; LOG.debug(errorMessage); if (shouldRetry(code)) { if (createRetryCount < maxRetryNum) { LOG.debug("Retrying createNode createRetryCount: " + createRetryCount); ++createRetryCount; createLockNodeAsync(); return; } errorMessage = errorMessage + ". Not retrying further znode create connection errors."; } else if (isSessionExpired(code)) { // This isn't fatal - the client Watcher will re-join the election LOG.warn("Lock acquisition failed because session was lost"); return; } fatalError(errorMessage); }
对于获取Active的机器,调用becomeActive()方法
private synchronized void becomeActive() throws ServiceFailedException { LOG.info("Trying to make " + localTarget + " active..."); try { HAServiceProtocolHelper.transitionToActive(localTarget.getProxy( conf, FailoverController.getRpcTimeoutToNewActive(conf)), createReqInfo()); String msg = "Successfully transitioned " + localTarget + " to active state"; LOG.info(msg); serviceState = HAServiceState.ACTIVE; recordActiveAttempt(new ActiveAttemptRecord(true, msg)); } catch (Throwable t) { String msg = "Couldn't make " + localTarget + " active"; LOG.fatal(msg, t); recordActiveAttempt(new ActiveAttemptRecord(false, msg + "\n" + StringUtils.stringifyException(t))); if (t instanceof ServiceFailedException) { throw (ServiceFailedException)t; } else { throw new ServiceFailedException("Couldn't transition to active", t); }
通过对RPC进过一系列的调用,最终执行NameNode的
synchronized void transitionToActive() throws ServiceFailedException, AccessControlException { namesystem.checkSuperuserPrivilege(); if (!haEnabled) { throw new ServiceFailedException("HA for namenode is not enabled"); } state.setState(haContext, ACTIVE_STATE); }
OVER
2.active宕机后,做了什么事情,如何切换的?
active宕机后或者异常会导致ZK节点的消失或监控状态的UNHEALTH,这些都会导致新一轮的选举,原理同上。
“hadoop2.4源码分析”的内容就介绍到这里了,感谢大家的阅读。如果想了解更多行业相关的知识可以关注亿速云网站,小编将为大家输出更多高质量的实用文章!
免责声明:本站发布的内容(图片、视频和文字)以原创、转载和分享为主,文章观点不代表本网站立场,如果涉及侵权请联系站长邮箱:is@yisu.com进行举报,并提供相关证据,一经查实,将立刻删除涉嫌侵权内容。