当前位置: 首页 > news >正文

【云原生】Hadoop HA on k8s 环境部署

文章目录

    • 一、概述
    • 二、开始部署
      • 1)添加 journalNode 编排
        • 1、控制器Statefulset
        • 2、service
      • 2)修改配置
        • 1、修改values.yaml
        • 2、修改hadoop/templates/hadoop-configmap.yaml
      • 3)开始安装
      • 4)测试验证
      • 5)卸载

一、概述

在 Hadoop 2.0.0 之前,一个集群只有一个Namenode,这将面临单点故障问题。如果 Namenode 机器挂掉了,整个集群就用不了了。只有重启 Namenode ,才能恢复集群。另外正常计划维护集群的时候,还必须先停用整个集群,这样没办法达到 7 * 24小时可用状态。Hadoop 2.0 及之后版本增加了 Namenode 高可用机制,这里主要讲Hadoop HA on k8s 环境部署。

非高可用 k8s 环境,可参考我这篇文章:【云原生】Hadoop on k8s 环境部署
高可用非 k8s 环境,可参考我这篇文章:大数据Hadoop之——Hadoop 3.3.4 HA(高可用)原理与实现(QJM)

HDFS
在这里插入图片描述
YARN
在这里插入图片描述

二、开始部署

这里是基于非高可用编排的基础上改造。不了解的小伙伴,可以先看我上面的文章。

1)添加 journalNode 编排

1、控制器Statefulset

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: {{ include "hadoop.fullname" . }}-hdfs-jn
  annotations:
    checksum/config: {{ include (print $.Template.BasePath "/hadoop-configmap.yaml") . | sha256sum }}
  labels:
    app.kubernetes.io/name: {{ include "hadoop.name" . }}
    helm.sh/chart: {{ include "hadoop.chart" . }}
    app.kubernetes.io/instance: {{ .Release.Name }}
    app.kubernetes.io/component: hdfs-jn
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: {{ include "hadoop.name" . }}
      app.kubernetes.io/instance: {{ .Release.Name }}
      app.kubernetes.io/component: hdfs-jn
  serviceName: {{ include "hadoop.fullname" . }}-hdfs-jn
  replicas: {{ .Values.hdfs.jounralNode.replicas }}
  template:
    metadata:
      labels:
        app.kubernetes.io/name: {{ include "hadoop.name" . }}
        app.kubernetes.io/instance: {{ .Release.Name }}
        app.kubernetes.io/component: hdfs-jn
    spec:
      affinity:
        podAntiAffinity:
        {{- if eq .Values.antiAffinity "hard" }}
          requiredDuringSchedulingIgnoredDuringExecution:
          - topologyKey: "kubernetes.io/hostname"
            labelSelector:
              matchLabels:
                app.kubernetes.io/name: {{ include "hadoop.name" . }}
                app.kubernetes.io/instance: {{ .Release.Name }}
                app.kubernetes.io/component: hdfs-jn
        {{- else if eq .Values.antiAffinity "soft" }}
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 5
            podAffinityTerm:
              topologyKey: "kubernetes.io/hostname"
              labelSelector:
                matchLabels:
                  app.kubernetes.io/name: {{ include "hadoop.name" . }}
                  app.kubernetes.io/instance: {{ .Release.Name }}
                  app.kubernetes.io/component: hdfs-jn
        {{- end }}
      terminationGracePeriodSeconds: 0
      containers:
      - name: hdfs-jn
        image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
        imagePullPolicy: {{ .Values.image.pullPolicy | quote }}
        command:
           - "/bin/bash"
           - "/opt/apache/tmp/hadoop-config/bootstrap.sh"
           - "-d"
        resources:
{{ toYaml .Values.hdfs.jounralNode.resources | indent 10 }}
        readinessProbe:
          tcpSocket:
            port: 8485
          initialDelaySeconds: 10
          timeoutSeconds: 2
        livenessProbe:
          tcpSocket:
            port: 8485
          initialDelaySeconds: 10
          timeoutSeconds: 2
        volumeMounts:
        - name: hadoop-config
          mountPath: /opt/apache/tmp/hadoop-config
        {{- range .Values.persistence.journalNode.volumes }}
        - name: {{ .name }}
          mountPath: {{ .mountPath }}
        {{- end }}
        securityContext:
          runAsUser: {{ .Values.securityContext.runAsUser }}
          privileged: {{ .Values.securityContext.privileged }}
      volumes:
      - name: hadoop-config
        configMap:
          name: {{ include "hadoop.fullname" . }}
{{- if .Values.persistence.journalNode.enabled }}
  volumeClaimTemplates:
  {{- range .Values.persistence.journalNode.volumes }}
  - metadata:
      name: {{ .name }}
      labels:
        app.kubernetes.io/name: {{ include "hadoop.name" $ }}
        helm.sh/chart: {{ include "hadoop.chart" $ }}
        app.kubernetes.io/instance: {{ $.Release.Name }}
        app.kubernetes.io/component: hdfs-jn
    spec:
      accessModes:
      - {{ $.Values.persistence.journalNode.accessMode | quote }}
      resources:
        requests:
          storage: {{ $.Values.persistence.journalNode.size | quote }}
    {{- if $.Values.persistence.journalNode.storageClass }}
    {{- if (eq "-" $.Values.persistence.journalNode.storageClass) }}
      storageClassName: ""
    {{- else }}
      storageClassName: "{{ $.Values.persistence.journalNode.storageClass }}"
    {{- end }}
    {{- end }}
      {{- else }}
      - name: dfs
        emptyDir: {}
      {{- end }}
{{- end }}

2、service

# A headless service to create DNS records
apiVersion: v1
kind: Service
metadata:
  name: {{ include "hadoop.fullname" . }}-hdfs-jn
  labels:
    app.kubernetes.io/name: {{ include "hadoop.name" . }}
    helm.sh/chart: {{ include "hadoop.chart" . }}
    app.kubernetes.io/instance: {{ .Release.Name }}
    app.kubernetes.io/component: hdfs-jn
spec:
  ports:
  - name: jn
    port: {{ .Values.service.journalNode.ports.jn }}
    protocol: TCP
    {{- if and (eq .Values.service.journalNode.type "NodePort") .Values.service.journalNode.nodePorts.jn }}
    nodePort: {{ .Values.service.journalNode.nodePorts.jn }}
    {{- end }}
  type: {{ .Values.service.journalNode.type }}
  selector:
    app.kubernetes.io/name: {{ include "hadoop.name" . }}
    app.kubernetes.io/instance: {{ .Release.Name }}
    app.kubernetes.io/component: hdfs-jn

2)修改配置

1、修改values.yaml

image:
  repository: myharbor.com/bigdata/hadoop
  tag: 3.3.2
  pullPolicy: IfNotPresent

# The version of the hadoop libraries being used in the image.
hadoopVersion: 3.3.2
logLevel: INFO

# Select antiAffinity as either hard or soft, default is soft
antiAffinity: "soft"

hdfs:
  nameNode:
    replicas: 2
    pdbMinAvailable: 1

    resources:
      requests:
        memory: "256Mi"
        cpu: "10m"
      limits:
        memory: "2048Mi"
        cpu: "1000m"

  dataNode:
    # Will be used as dfs.datanode.hostname
    # You still need to set up services + ingress for every DN
    # Datanodes will expect to
    externalHostname: example.com
    externalDataPortRangeStart: 9866
    externalHTTPPortRangeStart: 9864

    replicas: 3

    pdbMinAvailable: 1

    resources:
      requests:
        memory: "256Mi"
        cpu: "10m"
      limits:
        memory: "2048Mi"
        cpu: "1000m"

  webhdfs:
    enabled: true

  jounralNode:
    replicas: 3
    pdbMinAvailable: 1

    resources:
      requests:
        memory: "256Mi"
        cpu: "10m"
      limits:
        memory: "2048Mi"
        cpu: "1000m"

yarn:
  resourceManager:
    pdbMinAvailable: 1
    replicas: 2

    resources:
      requests:
        memory: "256Mi"
        cpu: "10m"
      limits:
        memory: "2048Mi"
        cpu: "2000m"

  nodeManager:
    pdbMinAvailable: 1

    # The number of YARN NodeManager instances.
    replicas: 1

    # Create statefulsets in parallel (K8S 1.7+)
    parallelCreate: false

    # CPU and memory resources allocated to each node manager pod.
    # This should be tuned to fit your workload.
    resources:
      requests:
        memory: "256Mi"
        cpu: "500m"
      limits:
        memory: "2048Mi"
        cpu: "1000m"

persistence:
  nameNode:
    enabled: true
    storageClass: "hadoop-ha-nn-local-storage"
    accessMode: ReadWriteOnce
    size: 1Gi
    local:
    - name: hadoop-ha-nn-0
      host: "local-168-182-110"
      path: "/opt/bigdata/servers/hadoop-ha/nn/data/data1"
    - name: hadoop-ha-nn-1
      host: "local-168-182-111"
      path: "/opt/bigdata/servers/hadoop-ha/nn/data/data1"

  dataNode:
    enabled: true
    enabledStorageClass: false
    storageClass: "hadoop-ha-dn-local-storage"
    accessMode: ReadWriteOnce
    size: 1Gi
    local:
    - name: hadoop-ha-dn-0
      host: "local-168-182-110"
      path: "/opt/bigdata/servers/hadoop-ha/dn/data/data1"
    - name: hadoop-ha-dn-1
      host: "local-168-182-110"
      path: "/opt/bigdata/servers/hadoop-ha/dn/data/data2"
    - name: hadoop-ha-dn-2
      host: "local-168-182-110"
      path: "/opt/bigdata/servers/hadoop-ha/dn/data/data3"
    - name: hadoop-ha-dn-3
      host: "local-168-182-111"
      path: "/opt/bigdata/servers/hadoop-ha/dn/data/data1"
    - name: hadoop-ha-dn-4
      host: "local-168-182-111"
      path: "/opt/bigdata/servers/hadoop-ha/dn/data/data2"
    - name: hadoop-ha-dn-5
      host: "local-168-182-111"
      path: "/opt/bigdata/servers/hadoop-ha/dn/data/data3"
    - name: hadoop-ha-dn-6
      host: "local-168-182-112"
      path: "/opt/bigdata/servers/hadoop-ha/dn/data/data1"
    - name: hadoop-ha-dn-7
      host: "local-168-182-112"
      path: "/opt/bigdata/servers/hadoop-ha/dn/data/data2"
    - name: hadoop-ha-dn-8
      host: "local-168-182-112"
      path: "/opt/bigdata/servers/hadoop-ha/dn/data/data3"
    volumes:
    - name: dfs1
      mountPath: /opt/apache/hdfs/datanode1
      hostPath: /opt/bigdata/servers/hadoop-ha/dn/data/data1
    - name: dfs2
      mountPath: /opt/apache/hdfs/datanode2
      hostPath: /opt/bigdata/servers/hadoop-ha/dn/data/data2
    - name: dfs3
      mountPath: /opt/apache/hdfs/datanode3
      hostPath: /opt/bigdata/servers/hadoop-ha/dn/data/data3

  journalNode:
    enabled: true
    storageClass: "hadoop-ha-jn-local-storage"
    accessMode: ReadWriteOnce
    size: 1Gi
    local:
    - name: hadoop-ha-jn-0
      host: "local-168-182-110"
      path: "/opt/bigdata/servers/hadoop-ha/jn/data/data1"
    - name: hadoop-ha-jn-1
      host: "local-168-182-111"
      path: "/opt/bigdata/servers/hadoop-ha/jn/data/data1"
    - name: hadoop-ha-jn-2
      host: "local-168-182-112"
      path: "/opt/bigdata/servers/hadoop-ha/jn/data/data1"
    volumes:
    - name: jn
      mountPath: /opt/apache/hdfs/journalnode

service:
  nameNode:
    type: NodePort
    ports:
      dfs: 9000
      webhdfs: 9870
    nodePorts:
      dfs: 30900
      webhdfs: 30870
  dataNode:
    type: NodePort
    ports:
      webhdfs: 9864
    nodePorts:
      webhdfs: 30864
  resourceManager:
    type: NodePort
    ports:
      web: 8088
    nodePorts:
      web: 30088
  journalNode:
    type: ClusterIP
    ports:
      jn: 8485
    nodePorts:
      jn: ""


securityContext:
  runAsUser: 9999
  privileged: true

2、修改hadoop/templates/hadoop-configmap.yaml

修改的内容比较多,这里就不贴出来了,最下面会给出git下载地址。

3)开始安装

# 创建存储目录
mkdir -p /opt/bigdata/servers/hadoop-ha/{nn,dn,jn}/data/data{1..3}
chmod -R 777 -R /opt/bigdata/servers/hadoop-ha/{nn,dn,jn}

helm install hadoop-ha ./hadoop -n hadoop-ha --create-namespace

查看

kubectl get pods,svc -n hadoop-ha -owide

在这里插入图片描述

HDFS WEB-nn1:http://192.168.182.110:31870/dfshealth.html#tab-overview
在这里插入图片描述

HDFS WEB-nn2:http://192.168.182.110:31871/dfshealth.html#tab-overview
在这里插入图片描述
YARN WEB-rm1:http://192.168.182.110:31088/cluster/cluster
在这里插入图片描述
YARN WEB-rm2:http://192.168.182.110:31089/cluster/cluster
在这里插入图片描述

4)测试验证

kubectl exec -it hadoop-ha-hadoop-hdfs-nn-0 -n hadoop-ha -- bash

在这里插入图片描述

5)卸载

helm uninstall hadoop-ha -n hadoop-ha

kubectl delete pod -n hadoop-ha `kubectl get pod -n hadoop-ha|awk 'NR>1{print $1}'` --force
kubectl patch ns hadoop-ha -p '{"metadata":{"finalizers":null}}'
kubectl delete ns hadoop-ha --force

rm -fr /opt/bigdata/servers/hadoop-ha/{nn,dn,jn}/data/data{1..3}/*

git下载地址:https://gitee.com/hadoop-bigdata/hadoop-ha-on-k8s

Hadoop HA on k8s 环境部署就先到这里,这里描述的不是很多,有疑问的小伙伴欢迎给我留言,可能有些地方还不太完善,后续会继续完善并在此基础上添加其它服务进来,会持续分享【大数据+云原生】相关的文章,请小伙伴耐心等待~

相关文章:

  • 四元数是什么
  • 大衣哥家里再添喜事,生产厂家免费送给他一辆新车
  • 爬取疫情数据并存到mysql数据库
  • 场景应用:网络的子网掩码为255.255.240.0,它能够处理的主机数是多少?
  • Qt5开发从入门到精通——第七篇六节( 图形视图—— 图元的旋转、缩放、切变、和位移)
  • 内网穿透工具natapp的注册、下载、安装与使用(详细教程)
  • CDH openssl 安装报错 TXT_DB error number 2
  • 【Linux线程同步专题】一、什么是线程同步、互斥量与死锁
  • 内网渗透-Linux权限维持
  • Git 便捷操作
  • 美国项目管理协会和埃森哲最新报告:越来越多的公司设立首席转型官一职
  • y145.第八章 Servless和Knative从入门到精通 -- 消息系统基础和Eventing及实践(九)
  • CDQ整体二分-三维偏序(陌上花开)
  • Vue3+elementplus搭建通用管理系统实例十三:添加树形选择器及多选功能
  • GBASE 8s 高可用配置参数
  • ES6, React, Redux, Webpack写的一个爬 GitHub 的网页
  • Fundebug计费标准解释:事件数是如何定义的?
  • Java 23种设计模式 之单例模式 7种实现方式
  • JS函数式编程 数组部分风格 ES6版
  • Python中eval与exec的使用及区别
  • 从零开始在ubuntu上搭建node开发环境
  • 分布式事物理论与实践
  • 微信小程序开发问题汇总
  • 温故知新之javascript面向对象
  • 详解NodeJs流之一
  • 一份游戏开发学习路线
  • 源码之下无秘密 ── 做最好的 Netty 源码分析教程
  • 云大使推广中的常见热门问题
  • 如何通过报表单元格右键控制报表跳转到不同链接地址 ...
  • ​ ​Redis(五)主从复制:主从模式介绍、配置、拓扑(一主一从结构、一主多从结构、树形主从结构)、原理(复制过程、​​​​​​​数据同步psync)、总结
  • ​​​​​​​sokit v1.3抓手机应用socket数据包: Socket是传输控制层协议,WebSocket是应用层协议。
  • ​【C语言】长篇详解,字符系列篇3-----strstr,strtok,strerror字符串函数的使用【图文详解​】
  • # 计算机视觉入门
  • #HarmonyOS:软件安装window和mac预览Hello World
  • ()、[]、{}、(())、[[]]等各种括号的使用
  • (a /b)*c的值
  • (PWM呼吸灯)合泰开发板HT66F2390-----点灯大师
  • (安全基本功)磁盘MBR,分区表,活动分区,引导扇区。。。详解与区别
  • (附源码)springboot猪场管理系统 毕业设计 160901
  • (南京观海微电子)——COF介绍
  • (一)SpringBoot3---尚硅谷总结
  • (转)es进行聚合操作时提示Fielddata is disabled on text fields by default
  • .md即markdown文件的基本常用编写语法
  • .NET CF命令行调试器MDbg入门(四) Attaching to Processes
  • .NET 服务 ServiceController
  • .NET/C# 项目如何优雅地设置条件编译符号?
  • .net2005怎么读string形的xml,不是xml文件。
  • .vimrc php,修改home目录下的.vimrc文件,vim配置php高亮显示
  • /etc/skel 目录作用
  • @DateTimeFormat 和 @JsonFormat 注解详解
  • @RequestMapping-占位符映射
  • [2013AAA]On a fractional nonlinear hyperbolic equation arising from relative theory
  • [8-23]知识梳理:文件系统、Bash基础特性、目录管理、文件管理、文本查看编辑处理...
  • [AIGC] Spring Interceptor 拦截器详解
  • [AutoSar]BSW_Com07 CAN报文接收流程的函数调用