本篇内容介绍了“Kubernetes Critical Pod怎么使用”的有关知识,在实际案例的操作过程中,不少人都会遇到这样的困境,接下来就让小编带领大家学习一下如何处理这些情况吧!希望大家仔细阅读,能够学有所成!
规则1:
Enable Feature Gate ExperimentalCriticaPodAnnotation
必须隶属于kube-system
namespace;
必须加上Annotation scheduler.alpha.kubernetes.io/critical-pod=""
规则2:
Enable Feature Gate ExperimentalCriticaPodAnnotation, PodPriority
Pod的Priority不为空,且不小于2 * 10^9
;
system-node-critical priority = 10^9 + 1000;
system-cluster-critical priority = 10^9;
满足规则1或规则2之一,就认为该Pod为Critical Pod;
在default scheduler进行pod调度的predicate阶段,会注册GeneralPredicates
为default predicates之一,并没有判断critical Pod使用EssentialPredicates
来对critical Pod进行predicate process。这意味着什么呢?
我们看看GeneralPredicates和EssentialPredicates的关系就知道了。GeneralPredicates中,先调用noncriticalPredicates,再调用EssentialPredicates。因此如果你给Deployment/StatefulSet等(DeamonSet除外)标识为Critical,那么在scheduler调度时,仍然走GeneralPredicates的流程,会调用noncriticalPredicates,而你却希望它直接走EssentialPredicates。
// GeneralPredicates checks whether noncriticalPredicates and EssentialPredicates pass. noncriticalPredicates are the predicates // that only non-critical pods need and EssentialPredicates are the predicates that all pods, including critical pods, need func GeneralPredicates(pod *v1.Pod, meta algorithm.PredicateMetadata, nodeInfo *schedulercache.NodeInfo) (bool, []algorithm.PredicateFailureReason, error) { var predicateFails []algorithm.PredicateFailureReason fit, reasons, err := noncriticalPredicates(pod, meta, nodeInfo) if err != nil { return false, predicateFails, err } if !fit { predicateFails = append(predicateFails, reasons...) } fit, reasons, err = EssentialPredicates(pod, meta, nodeInfo) if err != nil { return false, predicateFails, err } if !fit { predicateFails = append(predicateFails, reasons...) } return len(predicateFails) == 0, predicateFails, nil }
noncriticalPredicates原意是想对non-critical pod做的额外predicate逻辑,这个逻辑就是PodFitsResources检查。
pkg/scheduler/algorithm/predicates/predicates.go:1076 // noncriticalPredicates are the predicates that only non-critical pods need func noncriticalPredicates(pod *v1.Pod, meta algorithm.PredicateMetadata, nodeInfo *schedulercache.NodeInfo) (bool, []algorithm.PredicateFailureReason, error) { var predicateFails []algorithm.PredicateFailureReason fit, reasons, err := PodFitsResources(pod, meta, nodeInfo) if err != nil { return false, predicateFails, err } if !fit { predicateFails = append(predicateFails, reasons...) } return len(predicateFails) == 0, predicateFails, nil }
PodFitsResources就做以下检查资源是否满足要求:
Allowed Pod Number;
CPU;
Memory;
EphemeralStorage;
Extended Resources;
也就是说,如果你给Deployment/StatefulSet等(DeamonSet除外)标识为Critical,那么对应的Pod调度时仍然会检查Allowed Pod Number, CPU, Memory, EphemeralStorage,Extended Resources
是否足够,如果不满足则会触发预选失败,并且在Preempt阶段也只是根据对应的PriorityClass进行正常的抢占逻辑,并没有针对Critical Pod进行特殊处理,因此最终可能会因为找不到满足资源要求的Node,导致该Critical Pod调度失败,一直处于Pending状态。
而用户设置Critical Pod是不想因为资源不足导致调度失败的。那如果我就是想使用Deployment/StatefulSet等(DeamonSet除外)标识为Critical Pod来部署关键服务呢?有以下两个办法:
按照前面提到的规则2,给Pod设置system-cluster-critical
或system-node-critical
Priority Class,这样就会在scheduler正常的Preempt流程中抢占到资源完成调度。
按照前面提到的规则1,并且修改GeneralPredicates
的代码如下,检测是否为Critical Pod,如果是,则不执行noncriticalPredicates逻辑,也就是说predicate阶段不对Allowed Pod Number, CPU, Memory, EphemeralStorage,Extended Resources
资源进行检查。
func GeneralPredicates(pod *v1.Pod, meta algorithm.PredicateMetadata, nodeInfo *schedulercache.NodeInfo) (bool, []algorithm.PredicateFailureReason, error) { var predicateFails, resons []algorithm.PredicateFailureReason var fit bool var err error // **Modify**: check whether the pod is a Critical Pod, don't invoke noncriticalPredicates if false. isCriticalPod := utilfeature.DefaultFeatureGate.Enabled(features.ExperimentalCriticalPodAnnotation) && kubelettypes.IsCriticalPod(newPod) if !isCriticalPod { fit, reasons, err = noncriticalPredicates(pod, meta, nodeInfo) if err != nil { return false, predicateFails, err } } if !fit { predicateFails = append(predicateFails, reasons...) } fit, reasons, err = EssentialPredicates(pod, meta, nodeInfo) if err != nil { return false, predicateFails, err } if !fit { predicateFails = append(predicateFails, reasons...) } return len(predicateFails) == 0, predicateFails, nil }
方法1,其实Kubernetes在Admission Priority检查时已经帮你做了。
// admitPod makes sure a new pod does not set spec.Priority field. It also makes sure that the PriorityClassName exists if it is provided and resolves the pod priority from the PriorityClassName. func (p *priorityPlugin) admitPod(a admission.Attributes) error { ... if utilfeature.DefaultFeatureGate.Enabled(features.PodPriority) { var priority int32 if len(pod.Spec.PriorityClassName) == 0 && utilfeature.DefaultFeatureGate.Enabled(features.ExperimentalCriticalPodAnnotation) && kubelettypes.IsCritical(a.GetNamespace(), pod.Annotations) { pod.Spec.PriorityClassName = scheduling.SystemClusterCritical } ... }
在Admission时候会对Pod的Priority进行检查,如果发现您已经:
Enable PriorityClass Feature Gate;
Enable ExperimentalCriticalPodAnnotation Feature Gate;
给Pod添加了ExperimentalCriticalPodAnnotation;
部署在kube-system namespace;
没有手动设置自定义PriorityClass;
那么,Admisson Priority阶段会自动给Pod添加SystemClusterCritical(system-cluster-critical) PriorityClass;
通过上面的分析,给出如下最佳实践:在Kubernetes集群中,通过非DeamonSet方式(比如Deployment、RS等)部署关键服务时,为了在集群资源不足时仍能保证抢占调度成功,请确保如下事宜:
Enable PriorityClass Feature Gate;
Enable ExperimentalCriticalPodAnnotation Feature Gate;
给Pod添加了ExperimentalCriticalPodAnnotation;
部署在kube-system namespace;
千万不要手动设置自定义PriorityClass;
“Kubernetes Critical Pod怎么使用”的内容就介绍到这里了,感谢大家的阅读。如果想了解更多行业相关的知识可以关注亿速云网站,小编将为大家输出更多高质量的实用文章!
免责声明:本站发布的内容(图片、视频和文字)以原创、转载和分享为主,文章观点不代表本网站立场,如果涉及侵权请联系站长邮箱:is@yisu.com进行举报,并提供相关证据,一经查实,将立刻删除涉嫌侵权内容。