Before You Begin Set Env Vars

The following are required (specific tasks may require just a subset of these):

export COMPANY=ev.younite
export KOPS_STATE_STORE="s3://<your-s3-bucket-name>"
export KOPS_CLUSTER_NAME=<your-cluster-name>
export KUBECONFIG="<path-to-kube-config-file>"
export AWS_ACCESS_KEY_ID=<cluster-aws-access-key>
export AWS_SECRET_ACCESS_KEY=<cluster-aws-secret>

Useful Commands for Troubleshooting the Cluster

Permission Denied When Running KOPS, Kubectl and/or AWS CLI Commands

Make sure you are the cluster admin:

aws iam get-user  # This requires IAM get-user permission
aws sts get-caller-identity  # this always works

Log back into the cluster:

kops export kubecfg --admin   # stays logged in for 18 hours

Cluster Can’t Retrieve YOUnite Docker Images

Log the cluster back into the YOUnite ECR.

For this you will need the ECR AWS keys and the cluster’s AWS Keys:

export AWS_ACCESS_KEY_ID=<younite-ecr-access-key>
export AWS_SECRET_ACCESS_KEY=<younite-ecr-secret>
password=`aws ecr get-login-password --region us-west-2`

Reset AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY back to the cluster key values then run:

kubectl create secret docker-registry younite-registry --docker-server=https://160909222919.dkr.ecr.us-west-2.amazonaws.com --docker-username=AWS --docker-password=$password --docker-email=notused@younite.us
unset password

Reset AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY to the cluster key values.

Inspect Cluster Variables

kops edit cluster --name=$KOPS_CLUSTER_NAME

Check Status of Nodes and Pods

kubectl get nodes

kubectl get nodes --show-labels

kubectl get pods

Check What Services are Running

kubectl get services -o=custom-columns="NAME:.metadata.name,EXTERNAL-IP:.status.loadBalancer.ingress[0].hostname"\

Check What Pods are Running on Which Nodes

kubectl get pods --all-namespaces -o jsonpath="{range .items[*]}{.metadata.namespace}{','}{.metadata.name}{','}{.spec.nodeName}{'\n'}{end}" | awk -F',' '{printf "%-40s %-40s %-40s\n",$1,$2,$3}' | column -t

Start/Restart a Service

Login to cluster and apply secrets and any necessary settings (e.g. "rbac" and "storage-class"):

kubectl apply -f <service's .service file.yml>`
kubectl apply -f <service's .deployment file.yml>`
Note
Some services may combine the service and deployment configuration into a single service yml file.

Delete (Stop) a Service

Login to cluster and apply secrets and any necessary settings (e.g. "rbac" and "storage-class"):

kubectl delete service <service-name>

Then either:

kubectl delete deployment <service-name>
#
# or, if a replica set:
#
kubectl delete statefulset <service-name>

Get, Create, Validate, Edit and Update a Cluster

kops get cluster
kops validate cluster
kops create cluster --name=$KOPS_CLUSTER_NAME --zones=us-west-2a

An example of adding an instance group to a cluster:

kops edit ig --name $KOPS_CLUSTER_NAME nodes-us-west-2a #  e.g. add an instance group "nodes-us-west-2a"
kops update cluster --name=$KOPS_CLUSTER_NAME --yes

Restoring the Local Kubeconfig File

If the kube config file is accidentally deleted on your Local IT Host/control machine, it can be recreated with the following:

kops export kubecfg --name your-cluster-name

Starting All Over - Deleting a Cluster

Deleting a cluster properly is super important because if not, artifacts will be left behind and inherited by future invocations. Use the following to properly shutdown a cluster substituting your cluster’s KOPS_STATE_STORE:

kops delete cluster --state=$KOPS_STATE_STORE --yes

Useful Network Debugging

Here are some tools to help resolve network connectivity problems. An issue that frequently arises is the inconsistent or delayed propagation of CNAME records to the cluster, test host, or test clients.

Adaptor Health Check Issue

Adaptors fail to start due to health check failures. The best place to start is to look at the adaptor’s log and find the first error:

kubectl get pods
kubectl logs <adaptor-pod-name>

Adaptor DB Connectivity Issue

The DB adaptor will not register to the master nodes health checks if it cannot connect to its datasource (e.g. database).

If the adaptor pod is showing:

younite-zone-producer1-oracle-db1-adaptor-d6cf44f-n26zk      0/1     Running   6 (3m44s ago)   63m

And kubectl describe pods <adaptor-pod> shows:

 Warning  Unhealthy  3s (x21 over 3m23s)  kubelet            Startup probe failed: HTTP probe failed with statuscode: 503

Look at the adaptor log the first error usually includes:

IO Error: The Network Adapter could not establish the connection

That is not referencing the YOUnite adaptor but some TCP Java adapter. You will also see some errors later in the log about Domain Versions but they are not relevant - just a thread not able to do its work.

Common sources of the problem:

  • The DB is down

  • There isn’t route between the YN adaptor and the DB (bad peering connection perhaps) - run the busybox pod to debug if a networking/routing issue - peering connections need to be made between the DB VPC and the Cluster VPC and the scripts to start the cluster do this. The peering connection also needs two routes with the DB VPC using the CLuster VPC CIDR and the vice versa. This works sometimes and other times it doesn’t — I have not been able to solve this riddle yet.

  • Security Groups - Use the "data-virtualization" security group then

  • Network ACLs - Each VPC has one or more subnets. Typically all of the subnets in a VPC share the same Network ACLs. By default they allow all traffic so there shouldn’t be any need to change things here if all defaults are taken.

  • The adaptor config has the wrong Database IP or port (look at the adaptor’s config in the YOUnite UI)

BOTTOM LINE - If the peering connection between the private default VPC and cluster VPC doesn’t work - then just use the public IP of the database in the adaptor DB URL config.

Curl

The kubernetes pods do have curl loaded on them. Once you login to the cluster (kops export kubecfg --admin) you can log directly into a pod and run curl e.g. login into a Oracle adaptor and test its connection with an Oracle DB:

kubectl exec -it younite-zone-consumer1-oracle-db1-adaptor-55f859d758-rvf2m -- /bin/sh
curl 172.31.0.117:27021

To login into a cluster node (instance) has proved unpredictable at best - this has worked but not always:

ssh -i test-keys.pem ubuntu@<public-ip-of-node>

Busybox

Busybox in a kubernetes configurations has limitations. Busybox runs as an instance not a pod and therefore tests node connectivity and not pod connectivity.

Run Busybox

A busybox.yml file is in this test’s specs directory.

kubectl apply -f busybox.yml

This will run busybox for 12 hours.

Use Busybox

kubectl exec -it busybox -- /bin/sh
Useful
  • Busybox does not have curl but it has wget - it is part of the Docker network so it can use the docker hostnames. For example to check the health of a service (note that sending the response payload to stdout i.e. "-O -" does not work with this version of busybox) :

wget -O response.txt  younite-api:8080/health
cat response.txt
  • Checking the health of a specific pod using the pod’s IP:

    wget -O response.txt  100.96.4.32:8080/health
  • ping <host>

  • traceroute <host>

  • nslookup <host>

To test a database connection: nc -zv <db-ip> <db-port>.

For example:

nc -zv 172.31.10.174 27021

Terminate Busybox

kubectl delete -f busybox.yml

Check the CNAME Value of the YOUnite API Service

If the CNAME for the API server is api.younite.myco.com run the following:

nslookup api.younite.myco.com

You should get a response similar to the following:

Server:		172.31.0.2
Address:	172.31.0.2#53
Non-authoritative answer:
api.younite.myco.com	canonical name = a505346f7839d41dab018c1c9f95b0f4-1518693669.us-west-2.elb.amazonaws.com.
Name:	a505346f7839d41dab018c1c9f95b0f4-1518693669.us-west-2.elb.amazonaws.com
Address: 52.39.170.78
Name:	a505346f7839d41dab018c1c9f95b0f4-1518693669.us-west-2.elb.amazonaws.com
Address: 52.39.52.89

Slow CNAME Updates - Refresh the DNS cache

Flushing a system’s DNS cache allows it to pick up refreshed DNS entries. It is important to note that this action does not guarantee the resolution of CNAME update issues. This is because the DNS cache of the Internet Service Provider (ISP) may not have been updated at that point.

Windows

ipconfig /flushdns

OS X

sudo dscacheutil -flushcache; sudo killall -HUP mDNSResponder

Linux

  • For systemd-based distributions:

sudo systemd-resolve --flush-caches
  • For non-systemd or older distributions:

sudo service nscd restart

Health Endpoints

Public Facing Services

Curl can be used to check any of the public facing services: * YOUnite API * YOUnite Data Virtualization Service * YOUnite Notification Service * Kibana * YOUnite UI

See the kubernetes service spec file for the httpGet path and port number. The default values for each are supplied here:

YOUnite Stack Service CNAME default port path

younite-api

api

8080

/health

younite-ui

ui

443

/ (should get a redirect to IDP)

younite-notification-service

notifications

8080

/actuator/health

younite-data-virtualization-service

dvs

8080

/actuator/health

younite-kibana

kibana

5601

/

For example, to check the API service endpoint for the DNS organization name younite.myco.com check the health endpoint by running the following:

It should respond with:

{"status":"UP","groups":["liveness","readiness"]}[user-name@ip-172-31-15-161 test-client]

Note
The default port is show above. To get the service’s actual port see its deployment file in the Kubernetes spec directory.

Private Services

Not all services have public facing IPs but, they do have health endpoints. To test them, you will need to:

  • Get the IP address of the instances/node that the service is running on. See above:

    • Check What Services are Running

    • Check What Pods are Running on Which Nodes

  • Start a busybox instance in the cluster and use the wget command (instead of curl).

    • See Using Busybox above.

YOUnite Stack Service default port path

YOUnite Off-the-Shelf Adaptors*

8080 - If multiple adaptors are running on a single node then each will have its own port - see each adaptor’s deployment file for the correct port.

/health

younite-mb (message bus)

61613

/

younite-elastic

9200

/

younite-logstash

4560

/

  • All adaptors are supposed to supply a health endpoint however implementations may choose not to provide one.

Note
The default port is show above. To get the service’s actual port see its deployment file in the Kubernetes spec directory.