helm repo add pfisterer-hadoop https://pfisterer.github.io/apache-hadoop-helm/ helm install hadoop pfisterer-hadoop/hadoop
wait a minutes
NAME: hadoop LAST DEPLOYED: Thu Apr 4 14:45:34 2024 NAMESPACE: default STATUS: deployed REVISION: 1 TEST SUITE: None NOTES: 1. You can check the status of HDFS by running this command: kubectl exec -n default -it hadoop-hadoop-hdfs-nn-0 -- /opt/hadoop/bin/hdfs dfsadmin -report 2. You can list the yarn nodes by running this command: kubectl exec -n default -it hadoop-hadoop-yarn-rm-0 -- /opt/hadoop/bin/yarn node -list 3. Create a port-forward to the yarn resource manager UI: kubectl port-forward -n default hadoop-hadoop-yarn-rm-0 8088:8088 Then open the ui in your browser: open http://localhost:8088 4. You can run included hadoop tests like this: kubectl exec -n default -it hadoop-hadoop-yarn-nm-0 -- /opt/hadoop/bin/hadoop jar /opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.3.2-tests.jar TestDFSIO -write -nrFiles 5 -fileSize 128MB -resFile /tmp/TestDFSIOwrite.txt 5. You can list the mapreduce jobs like this: kubectl exec -n default -it hadoop-hadoop-yarn-rm-0 -- /opt/hadoop/bin/mapred job -list 6. This chart can also be used with the zeppelin chart helm install --namespace default --set hadoop.useConfigMap=true,hadoop.configMapName=hadoop-hadoop stable/zeppelin 7. You can scale the number of yarn nodes like this: helm upgrade hadoop --set yarn.nodeManager.replicas=4 stable/hadoop Make sure to update the values.yaml if you want to make this permanent.
hadoop 1.2.0 · pfisterer/apache-hadoop-helm
The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models.
https://artifacthub.io/packages/helm/apache-hadoop-helm/hadoop

Seonglae Cho