DevOps Troubleshooting process |
Issue #1: Vagrant failed to reload when Docker installed in CentOS
The following SSH command responded with a non-zero exit status. Vagrant assumes that this means the command failed! chmod 0644 /etc/systemd/system/docker.service.d/http-proxy.conf Stdout from the command: Stderr from the command: chmod: cannot access ‘/etc/systemd/system/docker.service.d/http-proxy.conf’: No such file or directory
Here it is actually starting the vagrant box but it is not able to find a file called http-proxy.conf file. I would like to suggest for this issue, create the file and grant the permission as given:
Now restart the vagrant box. usually it is blocker when you are starting couple vagrant boxes with single vagrant up command where it will be stopped after first instance creation only. You need to do these changes to all nodes one after the other started.
Issue #2 Docker daemon not running
[vagrant@mstr ~]$ docker info Client: Debug Mode: false Plugins: cluster: Manage Docker clusters (Docker Inc., v1.2.0) Server: ERROR: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running? errors pretty printing info
Workaround
start the docker daemon
sudo systemctl start docker sudo systemctl status dockerFix
References:
Issue #3 : Snap package unable to install helm
error: cannot communicate with server: Post http://localhost/v2/snaps/helm: dial unix /run/snapd.socket: connect: no such file or directoryFix is :
Check the snapd daemon running
[root@mstr ~]# systemctl status snapd.service ● snapd.service - Snappy daemon Loaded: loaded (/usr/lib/systemd/system/snapd.service; disabled; vendor preset: disabled) Active: inactive (dead)If not running and tells you Inactive (dead) then give the life by start it and check again!!!
[root@mstr ~]# systemctl start snapd.service [root@mstr ~]# systemctl status snapd.service ● snapd.service - Snappy daemon Loaded: loaded (/usr/lib/systemd/system/snapd.service; disabled; vendor preset: disabled) Active: active (running) since Sun 2019-11-17 05:27:28 UTC; 7s ago Main PID: 23376 (snapd) Tasks: 10 Memory: 15.2M CGroup: /system.slice/snapd.service └─23376 /usr/libexec/snapd/snapd Nov 17 05:27:27 mstr.devopshunter.com systemd[1]: Starting Snappy daemon... Nov 17 05:27:27 mstr.devopshunter.com snapd[23376]: AppArmor status: apparmor not enabled Nov 17 05:27:27 mstr.devopshunter.com snapd[23376]: daemon.go:346: started snapd/2.42.1-1.el7 (...6. Nov 17 05:27:28 mstr.devopshunter.com snapd[23376]: daemon.go:439: adjusting startup timeout by...p) Nov 17 05:27:28 mstr.devopshunter.com snapd[23376]: helpers.go:104: error trying to compare the...sk Nov 17 05:27:28 mstr.devopshunter.com systemd[1]: Started Snappy daemon.Now go on for the
[root@mstr ~]# snap install helm --classic 2019-11-17T05:30:10Z INFO Waiting for restart... Download snap "core18" (1265) from channel "stable" 88% 139kB/s 50.3s
Issue #4: K8s nodes not able to list out
$ kubectl get nodes The connection to the server localhost:8080 was refused - did you specify the right host or port?Solution:
systemctl enable kubelet systemctl start kubelet vi /etc/sysctl.d/k8s.conf net.bridge.bridge-nf-call-ip6tables = 1 net.bridge.bridge-nf-call-iptables = 1 sysctl --system
Issue 5: k8s issue unable to proceed to start the kubeadm
[root@mstr ~]# kubeadm init --pod-network-cidr=192.148.0.0/16 --apiserver-advertise-address=192.168.33.100 [init] Using Kubernetes version: v1.16.3 [preflight] Running pre-flight checks [WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/ [WARNING SystemVerification]: this Docker version is not on the list of validated versions: 19.03.4. Latest validated version: 18.09 [preflight] Pulling images required for setting up a Kubernetes cluster [preflight] This might take a minute or two, depending on the speed of your internet connection [preflight] You can also perform this action in beforehand using 'kubeadm config images pull' error execution phase preflight: [preflight] Some fatal errors occurred: [ERROR ImagePull]: failed to pull image k8s.gcr.io/kube-apiserver:v1.16.3: output: Error response from daemon: Get https://k8s.gcr.io/v2/: dial tcp: lookup k8s.gcr.io on [::1]:53: read udp [::1]:35272->[::1]:53: read: connection refused , error: exit status 1 [ERROR ImagePull]: failed to pull image k8s.gcr.io/kube-controller-manager:v1.16.3: output: Error response from daemon: Get https://k8s.gcr.io/v2/: dial tcp: lookup k8s.gcr.io on [::1]:53: read udp [::1]:40675->[::1]:53: read: connection refused , error: exit status 1 [ERROR ImagePull]: failed to pull image k8s.gcr.io/kube-scheduler:v1.16.3: output: Error response from daemon: Get https://k8s.gcr.io/v2/: dial tcp: lookup k8s.gcr.io on [::1]:53: read udp [::1]:48699->[::1]:53: read: connection refused , error: exit status 1 [ERROR ImagePull]: failed to pull image k8s.gcr.io/kube-proxy:v1.16.3: output: Error response from daemon: Get https://k8s.gcr.io/v2/: dial tcp: lookup k8s.gcr.io on [::1]:53: read udp [::1]:48500->[::1]:53: read: connection refused , error: exit status 1 [ERROR ImagePull]: failed to pull image k8s.gcr.io/pause:3.1: output: Error response from daemon: Get https://k8s.gcr.io/v2/: dial tcp: lookup k8s.gcr.io on [::1]:53: read udp [::1]:46017->[::1]:53: read: connection refused , error: exit status 1 [ERROR ImagePull]: failed to pull image k8s.gcr.io/etcd:3.3.15-0: output: Error response from daemon: Get https://k8s.gcr.io/v2/: dial tcp: lookup k8s.gcr.io on [::1]:53: read udp [::1]:52592->[::1]:53: read: connection refused , error: exit status 1 [ERROR ImagePull]: failed to pull image k8s.gcr.io/coredns:1.6.2: output: Error response from daemon: Get https://k8s.gcr.io/v2/: dial tcp: lookup k8s.gcr.io on [::1]:53: read udp [::1]:53803->[::1]:53: read: connection refused , error: exit status 1 [preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...` To see the stack trace of this error execute with --v=5 or higher [root@mstr ~]#
Solution:
You need to initialize the Kubernetes master in the cluster
kubeadm init --pod-network-cidr=192.148.0.0/16 --apiserver-advertise-address=192.168.33.100 --ignore-preflight-errors=Hostname,SystemVerification,NumCPU
Issue #6: K8s Unable to connect with server
[root@mstr tmp]# kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml Unable to connect to the server: dial tcp: lookup raw.githubusercontent.com on 10.0.2.3:53: server misbehaving [root@mstr tmp]#
Workaround: When I've stried to run the above kubectl command at office network got that error. Once I'm at home able to run it perfectly. So please check your Company VPN network proxy settings before your run that kubectl command.
Issue #7: Docker Networking : Error response from daemon
[vagrant@mydev ~]$ docker network create -d overlay \ > --subnet=192.168.0.0/16 \ > --subnet=192.170.0.0/16 \ > --gateway=192.168.0.100 \ > --gateway=192.170.0.100 \ > --ip-range=192.168.1.0/24 \ > --aux-address="my-router=192.168.1.5" --aux-address="my-switch=192.168.1.6" \ > --aux-address="my-printer=192.170.1.5" --aux-address="my-nas=192.170.1.6" \ > my-multihost-network Error response from daemon: This node is not a swarm manager. Use "docker swarm init" or "docker swarm join" to connect this node to swarm and try again.
Basic Analysis: Check the 'swarm' line in the docker info command output.
docker info
Here from the error line, you can understand the there is an issue due to Swarm inactive state. To turn it on 'active' Workaround:
docker swarm init --advertise-addr 192.168.33.200
Issue #8: Kubernetes join command timeout on AWS ec2 instance
There were 3 ec2 instances created to provision the Kubernetes cluster on them. Master came up and Ready state. But when we run the join command on the other nodes, it was timed out with the following error:root@ip-172-31-xx-xx:~# kubeadm join 172.31.xx.204:6443 --token ld3ea8.jghaj4lpkwyk6b38 --discovery-token-ca-cert-hash sha256:f240647cdeacc429a3a30f6b83a3e9f54f603fbdf87fb24e4ee734d5368a21cf W0426 14:58:03.699933 17866 join.go:346] [preflight] WARNING: JoinControlPane.controlPlane settings will be ignored when control-plane flag is not set. [preflight] Running pre-flight checks [WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd ". Please follow the guide at https://kubernetes.io/docs/setup/cri/ error execution phase preflight: couldn't validate the identity of the API Server: Get https://172.31.35.204:6443/api/v1/na mespaces/kube-public/configmaps/cluster-info?timeout=10s: dial tcp 172.31.35.204:6443: i/o timeout To see the stack trace of this error execute with --v=5 or higher
Solution for such issue is Understand that the AWS Security Group PORT open for inbound rules. Kubernetes uses API service which internally call the HTTP protocol this should be open to all(0.0.0.0/0) inbound connections. And also the Kubernetes master-worker communications may need other TCP inbound connections as well so let it be open.
Security Group settings in AWS for Kubernetes |
Issue # VirtualBox issue (VERR_VMX_NO_VMX) code E_FAIL (0x80004005) gui headless
Stop hyper-v service running by default in Windows 8/10, since it blocks all other calls to VT hardware.
Additional explanation here: https://social.technet.microsoft.com/Forums/windows/en-US/118561b9-7155-46e3-a874-6a38b35c67fd/hyperv-disables-vtx-for-other-hypervisors?forum=w8itprogeneral
Also as you have mentioned, if not already enabled, turn on Intel VT virtualization in BIOS settings and restart the machine.
To turn Hypervisor off, run this from Command Prompt (Admin) (Windows+X):
bcdedit /set hypervisorlaunchtype off
and reboot your computer. To turn it back on again, run:
bcdedit /set hypervisorlaunchtype on
If you receive "The integer data is not valid as specified", try:
bcdedit /set hypervisorlaunchtype auto
The worked solution.
Help required or Support on your project issues?
Jenkins Build Failure
Started by user BhavaniShekhar Running as SYSTEM Building remotely on node2 in workspace /tmp/remote/workspace/Test-build-remotely [Test-build-remotely] $ /bin/sh -xe /tmp/jenkins681586635812408746.sh + echo 'Executed from BUILD REMOTELY Option' Executed from BUILD REMOTELY Option + echo 'Download JDK 17' Download JDK 17 + cd /opt + wget https://download.oracle.com/java/17/latest/jdk-17_linux-x64_bin.tar.gz /tmp/jenkins681586635812408746.sh: line 5: wget: command not found Build step 'Execute shell' marked build as failure Finished: FAILURE
Issue with sending mail on the Linux System
Solution investigate the mail can be sent from the command line or not. us the following command:
echo "Test Mail" | mailx -s "test" "Pavan@gmail.com"Replace the mail id with your company mailid and run that command. Hello guys if you need any support on Docker and DevOps do let us know in comments!
1 comment:
"This amazing article i have ever read in recent times. This is very inforamtive article. I regularly visit this blog for this kind fo helpful posts. Thank you so much for this wonderful blog post, keep posting such helpful information. If you are genuinely searching for a job oriented pega online training or pega online training hyderabad who are expertise to teach 100% practicals based course. And they provide certification material at pega training institutes in hyderabad and you can see this pega online training hyderabad. I was looking for a pega training institutes in pune whose instructor is really good at teaching. So you can either join at pega training institutes in Kolkata or pega training institutes in Bangalore in case if you are staying in Bengaluru. So start finding a job after a rigorous practice at pega training institutes in Mumbai whose faculty trainer the students at pega training institutes in Delhi also and in the end check out this pega interview questions.
Once again thanks a lot for this wonderful blog article, your efforts are priceless."
Post a Comment