Showing posts with label ucp. Show all posts
Showing posts with label ucp. Show all posts

Sunday, November 17, 2019

DevOps Troubleshooting Tricks & tips

Here in this post, I would like to collect all my daily challenges in my DevOps learning operations and possible workarounds, fixes links. I also invite you please share your experiences dealing with DevOps operations.

DevOps Troubleshooting process


Issue #1: Vagrant failed to reload when Docker installed in CentOS


The following SSH command responded with a non-zero exit status.
Vagrant assumes that this means the command failed!

chmod 0644 /etc/systemd/system/docker.service.d/http-proxy.conf

Stdout from the command:



Stderr from the command:

chmod: cannot access ‘/etc/systemd/system/docker.service.d/http-proxy.conf’: No such file or directory



Here it is actually starting the vagrant box but it is not able to find a file called http-proxy.conf file. I would like to suggest for this issue, create the file and grant the permission as given:

Now restart the vagrant box. usually it is blocker when you are starting couple vagrant boxes with single vagrant up command where it will be stopped after first instance creation only. You need to do these changes to all nodes one after the other started.


Issue #2 Docker daemon not running


[vagrant@mstr ~]$ docker info
Client:
 Debug Mode: false
 Plugins:
  cluster: Manage Docker clusters (Docker Inc., v1.2.0)

Server:
ERROR: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
errors pretty printing info


Workaround

start the docker daemon
sudo systemctl start docker
sudo systemctl status docker
Fix

References:

  1. Control docker with systemd
  2. Post steps for Docker installation

Issue #3 : Snap package unable to install helm



error: cannot communicate with server: Post http://localhost/v2/snaps/helm: dial unix /run/snapd.socket: connect: no such file or directory

Fix is :
Check the snapd daemon running
[root@mstr ~]# systemctl status snapd.service
● snapd.service - Snappy daemon
   Loaded: loaded (/usr/lib/systemd/system/snapd.service; disabled; vendor preset: disabled)
   Active: inactive (dead)

If not running and tells you Inactive (dead) then give the life by start it and check again!!!
[root@mstr ~]# systemctl start snapd.service
[root@mstr ~]# systemctl status snapd.service
● snapd.service - Snappy daemon
   Loaded: loaded (/usr/lib/systemd/system/snapd.service; disabled; vendor preset: disabled)
   Active: active (running) since Sun 2019-11-17 05:27:28 UTC; 7s ago
 Main PID: 23376 (snapd)
    Tasks: 10
   Memory: 15.2M
   CGroup: /system.slice/snapd.service
           └─23376 /usr/libexec/snapd/snapd

Nov 17 05:27:27 mstr.devopshunter.com systemd[1]: Starting Snappy daemon...
Nov 17 05:27:27 mstr.devopshunter.com snapd[23376]: AppArmor status: apparmor not enabled
Nov 17 05:27:27 mstr.devopshunter.com snapd[23376]: daemon.go:346: started snapd/2.42.1-1.el7 (...6.
Nov 17 05:27:28 mstr.devopshunter.com snapd[23376]: daemon.go:439: adjusting startup timeout by...p)
Nov 17 05:27:28 mstr.devopshunter.com snapd[23376]: helpers.go:104: error trying to compare the...sk
Nov 17 05:27:28 mstr.devopshunter.com systemd[1]: Started Snappy daemon.

Now go on for the
[root@mstr ~]# snap install helm --classic
2019-11-17T05:30:10Z INFO Waiting for restart...
Download snap "core18" (1265) from channel "stable"                               88%  139kB/s 50.3s

Issue #4: K8s nodes not able to list out


$ kubectl get nodes
The connection to the server localhost:8080 was refused - did you specify the right host or port?
Solution:
systemctl enable kubelet
systemctl start kubelet

vi /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1

sysctl --system


Issue 5: k8s issue unable to proceed to start the kubeadm

[root@mstr ~]# kubeadm init --pod-network-cidr=192.148.0.0/16 --apiserver-advertise-address=192.168.33.100
[init] Using Kubernetes version: v1.16.3
[preflight] Running pre-flight checks
        [WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
        [WARNING SystemVerification]: this Docker version is not on the list of validated versions: 19.03.4. Latest validated version: 18.09
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
error execution phase preflight: [preflight] Some fatal errors occurred:
        [ERROR ImagePull]: failed to pull image k8s.gcr.io/kube-apiserver:v1.16.3: output: Error response from daemon: Get https://k8s.gcr.io/v2/: dial tcp: lookup k8s.gcr.io on [::1]:53: read udp [::1]:35272->[::1]:53: read: connection refused
, error: exit status 1
        [ERROR ImagePull]: failed to pull image k8s.gcr.io/kube-controller-manager:v1.16.3: output: Error response from daemon: Get https://k8s.gcr.io/v2/: dial tcp: lookup k8s.gcr.io on [::1]:53: read udp [::1]:40675->[::1]:53: read: connection refused
, error: exit status 1
        [ERROR ImagePull]: failed to pull image k8s.gcr.io/kube-scheduler:v1.16.3: output: Error response from daemon: Get https://k8s.gcr.io/v2/: dial tcp: lookup k8s.gcr.io on [::1]:53: read udp [::1]:48699->[::1]:53: read: connection refused
, error: exit status 1
        [ERROR ImagePull]: failed to pull image k8s.gcr.io/kube-proxy:v1.16.3: output: Error response from daemon: Get https://k8s.gcr.io/v2/: dial tcp: lookup k8s.gcr.io on [::1]:53: read udp [::1]:48500->[::1]:53: read: connection refused
, error: exit status 1
        [ERROR ImagePull]: failed to pull image k8s.gcr.io/pause:3.1: output: Error response from daemon: Get https://k8s.gcr.io/v2/: dial tcp: lookup k8s.gcr.io on [::1]:53: read udp [::1]:46017->[::1]:53: read: connection refused
, error: exit status 1
        [ERROR ImagePull]: failed to pull image k8s.gcr.io/etcd:3.3.15-0: output: Error response from daemon: Get https://k8s.gcr.io/v2/: dial tcp: lookup k8s.gcr.io on [::1]:53: read udp [::1]:52592->[::1]:53: read: connection refused
, error: exit status 1
        [ERROR ImagePull]: failed to pull image k8s.gcr.io/coredns:1.6.2: output: Error response from daemon: Get https://k8s.gcr.io/v2/: dial tcp: lookup k8s.gcr.io on [::1]:53: read udp [::1]:53803->[::1]:53: read: connection refused
, error: exit status 1
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher
[root@mstr ~]#

Solution:
You need to initialize the Kubernetes master in the cluster
kubeadm init --pod-network-cidr=192.148.0.0/16 --apiserver-advertise-address=192.168.33.100 --ignore-preflight-errors=Hostname,SystemVerification,NumCPU

Issue #6: K8s Unable to connect with server

[root@mstr tmp]# kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
Unable to connect to the server: dial tcp: lookup raw.githubusercontent.com on 10.0.2.3:53: server misbehaving
[root@mstr tmp]#

Workaround: When I've stried to run the above kubectl command at office network got that error. Once I'm at home able to run it perfectly. So please check your Company VPN network proxy settings before your run that kubectl command.

Issue #7: Docker Networking : Error response from daemon

[vagrant@mydev ~]$ docker network create -d overlay \
>                 --subnet=192.168.0.0/16 \
>                 --subnet=192.170.0.0/16 \
>                 --gateway=192.168.0.100 \
>                 --gateway=192.170.0.100 \
>                 --ip-range=192.168.1.0/24 \
>                 --aux-address="my-router=192.168.1.5" --aux-address="my-switch=192.168.1.6" \
>                 --aux-address="my-printer=192.170.1.5" --aux-address="my-nas=192.170.1.6" \
>                 my-multihost-network
Error response from daemon: This node is not a swarm manager. Use "docker swarm init" or "docker swarm join" to connect this node to swarm and try again.

Basic Analysis: Check the 'swarm' line in the docker info command output.
docker info

Here from the error line, you can understand the there is an issue due to Swarm inactive state. To turn it on 'active' Workaround:
docker swarm init --advertise-addr 192.168.33.200

Issue #8: Kubernetes join command timeout on AWS ec2 instance

There were 3 ec2 instances created to provision the Kubernetes cluster on them. Master came up and Ready state. But when we run the join command on the other nodes, it was timed out with the following error:
root@ip-172-31-xx-xx:~# kubeadm join 172.31.xx.204:6443 --token ld3ea8.jghaj4lpkwyk6b38     --discovery-token-ca-cert-hash            sha256:f240647cdeacc429a3a30f6b83a3e9f54f603fbdf87fb24e4ee734d5368a21cf
W0426 14:58:03.699933   17866 join.go:346] [preflight] WARNING: JoinControlPane.controlPlane settings will be ignored when            control-plane flag is not set.
[preflight] Running pre-flight checks
        [WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd           ". Please follow the guide at https://kubernetes.io/docs/setup/cri/
error execution phase preflight: couldn't validate the identity of the API Server: Get https://172.31.35.204:6443/api/v1/na           mespaces/kube-public/configmaps/cluster-info?timeout=10s: dial tcp 172.31.35.204:6443: i/o timeout
To see the stack trace of this error execute with --v=5 or higher

Solution for such issue is Understand that the AWS Security Group PORT open for inbound rules. Kubernetes uses API service which internally call the HTTP protocol this should be open to all(0.0.0.0/0) inbound connections. And also the Kubernetes master-worker communications may need other TCP inbound connections as well so let it be open. 
Security Group settings in AWS for Kubernetes

Issue # VirtualBox issue (VERR_VMX_NO_VMX) code E_FAIL (0x80004005) gui headless

Stop hyper-v service running by default in Windows 8/10, since it blocks all other calls to VT hardware.

Additional explanation here: https://social.technet.microsoft.com/Forums/windows/en-US/118561b9-7155-46e3-a874-6a38b35c67fd/hyperv-disables-vtx-for-other-hypervisors?forum=w8itprogeneral

Also as you have mentioned, if not already enabled, turn on Intel VT virtualization in BIOS settings and restart the machine.


To turn Hypervisor off, run this from Command Prompt (Admin) (Windows+X):

bcdedit /set hypervisorlaunchtype off

and reboot your computer. To turn it back on again, run:

bcdedit /set hypervisorlaunchtype on

If you receive "The integer data is not valid as specified", try:

bcdedit /set hypervisorlaunchtype auto


The worked solution.

Help required or Support on your project issues?

Jenkins Build Failure

Problem in Console Output

Started by user BhavaniShekhar
Running as SYSTEM
Building remotely on node2 in workspace /tmp/remote/workspace/Test-build-remotely
[Test-build-remotely] $ /bin/sh -xe /tmp/jenkins681586635812408746.sh
+ echo 'Executed from BUILD REMOTELY Option'
Executed from BUILD REMOTELY Option
+ echo 'Download JDK 17'
Download JDK 17
+ cd /opt
+ wget https://download.oracle.com/java/17/latest/jdk-17_linux-x64_bin.tar.gz
/tmp/jenkins681586635812408746.sh: line 5: wget: command not found
Build step 'Execute shell' marked build as failure
Finished: FAILURE

Solution: To fix this you need to install wget on that node2 or you can use alternative as curl command.

Issue with sending mail on the Linux System

Solution investigate the mail can be sent from the command line or not. us the following command:

 echo "Test Mail" | mailx -s "test" "Pavan@gmail.com"
 
Replace the mail id with your company mailid and run that command. Hello guys if you need any support on Docker and DevOps do let us know in comments!

Wednesday, November 6, 2019

User Management on Universal Control Plane (UCP)

This is a quick tutorial on Docker UCP usage for User Management. Docker UCP provides us multiuser management and Role-based user control. which allows us to create and manage users and teams in an organization. Let's take a look over this user management in detail in this post.

First, we create Organization then we associate a couple of teams then after that add users to those teams.

Login to your UCP management console.

Create an Organization on UCP


Click on the 'user management' in the left side pane.

User Management on UCP

Now in the right pane work area, you can click on the 'Create Organization' top right button.

Enter your organization name a single word without any spaces. even though you enter the name in Capitals it will convert into lower case and store it.

Create Organization on UCP
To complete it click on the 'Create' button.
Once Organization is created it will be listed in the work area. Click on the newly created organization it will give us the option to create the teams.

Create a Team on UCP


Let's prepare the list of commonly required teams for any organization. Then, create them so the list as  following teams:

  • dev - Development team
  • QA - Quality Assurance team
  • prod - Production team
Create Team on UCP

Create User

There will be 'admin' User already created by UCP. we can create new users with 'admin' roles or without it. We would like to create a user with 'admin' access and another without 'admin' access role.

Let's explore this user creation process now.

Create User on docker UCP


The same way we can create another user that having the 'Docker EE Admin' role.
After creating users the summary looks as:
Users created on UCP summary

Add Users to Team

Go to the organization that you have already created. select it. Choose the team to which you will add the user. Here I am adding user to 'qa' team.

Add user to organization/team in UCP

I hope you enjoyed this post about user management on UCP for Docker EE


Next, let us explore the Docker Trusted Registry (DTR).

Tuesday, November 5, 2019

Docker Enterprise Edition installation on CentOS 7 plus UCP Installation

Hello, dear DevOps Enquist, in this post I would like to discuss with you how to install Docker Enterprise Edition on CentOS 7 and plus Universal Control Plane (UCP) running to control the master and workers on three nodes(Virtualboxes). Amazed with the great features that incorporated into the UCP. You could do lot of things from your browser itself. In the last post I've explored about the swarm cluster that time I'd executed everything on CLI, but this time UCP Web UI.

Why we need a Docker Universal Control Plane(UCP)?

To make more production-ready setup we would do this experiment with three CentOS7 nodes. The following picture tells us how powerful UCP in Docker Enterprise Edition is. You can manage services, multiple deployments using stacks, summary and manage docker containers and their images. you can also add/remove nodes and get their status, category. Docker network full control on it. Storage volumes also you can manage from the UCP admin console.

  • Ease of use with GUI based management
  • High Availability(HA) made simple
  • Access Control - organization, team, users manageable
  • Monitoring - Overall system can be viewed in a single page
  • docker native integration - network capabilities are handled
  • Swarm Managed - Swarm master, worker nodes configured
  • 3rd party plugins - DTR connects as plugin



Universal Control Plane running on Docker-ee with Swarm cluster


Prerequisites for Docker EE installation

Infrastructure designing will be a crucial part of any environment that you build on the Cloud or on-premises Docker ecosystem. First, let's consider what all goes into the master node.

  • Docker-EE installation (docker-ee) requires hub.docker.com signup and download the license 
  • Ports 80 and 443 are required to expose for UCP Containers to run.
  • Docker Trusted Registry (DTR) only can run other than UCP running node because it also requires same reserved ports 80 and 443
  • Download Vagrant as per your system
  • Download VirtualBox
Here most importantly think about - what you run on a machine defines how much resources required.

How to install Docker-EE on CentOS 7?

It is a very interesting story, Docker EE installation on CentOS 7 Vagrant boxes
1. Create three centos7 machines master - mstr, node1, node
2 for slaves. 2. Go to the hub.docker.com login with your credentials
The Vagrantfile content is as follows
 
Vagrant.configure(2) do |config|
  config.vm.box = "centos/7"
  config.vm.boot_timeout=600
  config.landrush.enabled = true

  config.vm.define "mstr" do |mstr|
    mstr.vm.host_name = "mstr.devopshunter.com"
    mstr.vm.network "private_network", ip: "192.168.33.100"
    mstr.vm.provider "virtualbox" do |vb|
      vb.cpus = "2"
      vb.memory = "3070"
    end
  end

  config.vm.define "node1" do |node1|
    node1.vm.network "private_network", ip: "192.168.33.110"
    node1.vm.hostname = "node1.devopshunter.com"
    node1.vm.provider "virtualbox" do |vb|
      vb.cpus = "2"
      vb.memory = "1500"
    end
  end
 
  config.vm.define "node2" do |node2|
    node2.vm.network "private_network", ip: "192.168.33.120"
    node2.vm.hostname = "node2.devopshunter.com"
    node2.vm.provider "virtualbox" do |vb|
      vb.cpus = "2"
      vb.memory = "1500"
    end
  end  
end

 
vagrant up
vagrant status
vagrant status for docker-ee installation on CentOS7

 
vagrant ssh-config

Use the PuTTYgen tool to convert the private_key to corresponding .ppk files. In my experiment, mstr.ppk, node1.ppk, node2.ppk files are generated in respective folders where private_key exists.

Now all set to go for connecting the each VM with corresponding IPs that assigned.
In each node you need to run the following commands:

1. Setup the repo for docker-ee
 
export DOCKERURL="https://storebits.docker.com/ee/centos/sub-eb111810-d6d8-4168-ac96-6e553a77381f"
sudo -E sh -c 'echo "$DOCKERURL/centos" > /etc/yum/vars/dockerurl'
cat /etc/yum/vars/dockerurl

2. Install docker dependdencies storage drivers sudo yum install -y yum-utils device-mapper-persistent-data lvm2 3. Add the repo and tell that available at where (i.e., Path)
 
sudo -E yum-config-manager \
    --add-repo \
    "$DOCKERURL/centos/docker-ee.repo"
yum repo update for docker-ee

4. Now all set to install the Docker  enterprise edition

 
sudo yum -y install docker-ee
sudo systemctl start docker

docker-ee installation on CentOS7 completed!

Now lets confirming by running hello-world container.
 
docker -v
sudo docker run hello-world

docker-ee installation confirmation with hello-world
If we check the docker info on any node it looks like this.
docker info for the docker-ee

Universal Control Plane (UCP) installation

 
docker container run --rm -it --name ucp \
  -v /var/run/docker.sock:/var/run/docker.sock \
  docker/ucp:2.2.5 install \
  --host-address 192.168.33.100 \
  --interactive

Enter username and password when it prompts.
admin
# welcome1
We detected the following hostnames/IP addresses for this system [mstr.devopshunter.com 127.0.0.1 172.17.0.1 192.168.33.100]

You may enter additional aliases (SANs) now or press enter to proceed with the above list.
Additional aliases:
INFO[0000] Initializing a new swarm at 192.168.33.100
INFO[0004] Establishing mutual Cluster Root CA with Swarm

This will automatically activate the swarm cluster master.

Login to UCP at https://192.168.33.100:443
UCP Login page
Universal Control Plane login page

After clicking on Signin we will be prompted to use the 'upload license'. It will be available on your docker hub page from where you have got the docker-ee installation url. You can request for new trail license or else you can also go for skip for now option.

Here, I am loading that docker_subscription.lic file, which was already downloaded.

UCP Manager console

Create a Swarm Node and join

Click on the Nodes, which will shows the a manager node already existing. Click on the 'Add Node' button.
UCP Configuring Nodes joining Swarm cluster
The add node wizard page gives us choice to select node type 'Windows/Linux' and Node role as 'Manager' or 'Worker'. here we go with Linux node type and role as 'worker'


The highlighted bottom given docker swarm join command snippet copy the line, paster and run in the node1 and node2. This will take some time to join the swarm cluster. wait for a while and check the Cluster by refreshing.

Added nodes to Swarm cluster
Initially when the nodes joined they have the status as 'Pending' and 'Awaiting', After join completed it looks 'Healthy UCP worker' status in the Details column.
Healthy UCP nodes


I hope you enjoyed this post keep writing your valuable comments. Keep sharing with your techie friends who can appreciate you!

Categories

Kubernetes (25) Docker (20) git (15) Jenkins (12) AWS (7) Jenkins CI (5) Vagrant (5) K8s (4) VirtualBox (4) CentOS7 (3) docker registry (3) docker-ee (3) ucp (3) Jenkins Automation (2) Jenkins Master Slave (2) Jenkins Project (2) containers (2) create deployment (2) docker EE (2) docker private registry (2) dockers (2) dtr (2) kubeadm (2) kubectl (2) kubelet (2) openssl (2) Alert Manager CLI (1) AlertManager (1) Apache Maven (1) Best DevOps interview questions (1) CentOS (1) Container as a Service (1) DevOps Interview Questions (1) Docker 19 CE on Ubuntu 19.04 (1) Docker Tutorial (1) Docker UCP (1) Docker installation on Ubunutu (1) Docker interview questions (1) Docker on PowerShell (1) Docker on Windows (1) Docker version (1) Docker-ee installation on CentOS (1) DockerHub (1) Features of DTR (1) Fedora (1) Freestyle Project (1) Git Install on CentOS (1) Git Install on Oracle Linux (1) Git Install on RHEL (1) Git Source based installation (1) Git line ending setup (1) Git migration (1) Grafana on Windows (1) Install DTR (1) Install Docker on Windows Server (1) Install Maven on CentOS (1) Issues (1) Jenkins CI server on AWS instance (1) Jenkins First Job (1) Jenkins Installation on CentOS7 (1) Jenkins Master (1) Jenkins automatic build (1) Jenkins installation on Ubuntu 18.04 (1) Jenkins integration with GitHub server (1) Jenkins on AWS Ubuntu (1) Kubernetes Cluster provisioning (1) Kubernetes interview questions (1) Kuberntes Installation (1) Maven (1) Maven installation on Unix (1) Operations interview Questions (1) Oracle Linux (1) Personal access tokens on GitHub (1) Problem in Docker (1) Prometheus (1) Prometheus CLI (1) RHEL (1) SCM (1) SCM Poll (1) SRE interview questions (1) Troubleshooting (1) Uninstall Git (1) Uninstall Git on CentOS7 (1) Universal Control Plane (1) Vagrantfile (1) amtool (1) aws IAM Role (1) aws policy (1) caas (1) chef installation (1) create organization on UCP (1) create team on UCP (1) docker CE (1) docker UCP console (1) docker command line (1) docker commands (1) docker community edition (1) docker container (1) docker editions (1) docker enterprise edition (1) docker enterprise edition deep dive (1) docker for windows (1) docker hub (1) docker installation (1) docker node (1) docker releases (1) docker secure registry (1) docker service (1) docker swarm init (1) docker swarm join (1) docker trusted registry (1) elasticBeanStalk (1) global configurations (1) helm installation issue (1) mvn (1) namespaces (1) promtool (1) service creation (1) slack (1)