ERROR: release image arch amd64 does not match host arch arm64

Well, so I tried installing a new ARM-based OpenShift Container Platform cluster on AWS. To prepare, I created an install-config.yaml file and changed the controlPlane.architecture and the compute.architecture field to “arm64” and then launched the installer. That did not work, it still complains about the architecture:

$ ./openshift-install create cluster --dir=.
INFO Credentials loaded from the "default" profile in file "/home/simon/.aws/credentials" 
INFO Consuming Install Config from target directory 
INFO Creating infrastructure resources...         
INFO Waiting up to 20m0s (until 11:07AM) for the Kubernetes API at https://api.skrenger-arm.lab.example.com:6443... 
INFO Pulling VM console logs                      
INFO Pulling debug logs from the bootstrap machine 
ERROR Attempted to gather ClusterOperator status after installation failure: listing ClusterOperator objects: Get "https://api.skrenger-arm.lab.example.com:6443/apis/config.openshift.io/v1/clusteroperators": dial tcp 3.64.25.143:6443: connect: connection refused 
ERROR Bootstrap failed to complete: Get "https://api.skrenger-arm.lab.example.com:6443/version": dial tcp 3.68.144.150:6443: connect: connection refused 
ERROR Failed waiting for Kubernetes API. This error usually happens when there is a problem on the bootstrap host that prevents creating a temporary control plane. 
ERROR The bootstrap machine failed to download the release image 
INFO Pulling quay.io/openshift-release-dev/ocp-release@sha256:9ffb17b909a4fdef5324ba45ec6dd282985dd49d25b933ea401873183ef20bf8... 
INFO cfce1ab124f59e93a0f67d7e85283d524ddfd73a27d0535319d69d1dce746488 
INFO ERROR: release image arch amd64 does not match host arch arm64 
INFO Bootstrap gather logs captured here "/home/simon/Downloads/arm/log-bundle-20221124110737.tar.gz" 
Read the rest of this entry

NVIDIA Docker “permission denied: unknown.” on Jetson Nano

I recently bought an NVIDIA Jetson Nano Developer Kit to fiddle around with things like MicroShift or TensorFlow. The board is typically used with L4T (Linux for Tegra) based on Ubuntu 18.04. Fedora can also be installed, although not all drivers (for example for the GPU) are available yet. So after properly updating the system with the latest packages, when starting a container using the nvidia runtime, I got the following error:

docker run -it --rm --runtime nvidia --network host nvcr.io/nvidia/l4t-ml:r32.6.1-py3
[..]
docker: Error response from daemon: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: error adding seccomp filter rule for syscall clone3: permission denied: unknown.
Read the rest of this entry

Creating a sosreport on CoreOS

With OpenShift 4, Red Hat introduced Red Hat Enterprise Linux CoreOS. It is a very minimalist operating system, focused on running container workload.

This new minimalism comes with some challenges. There are no more RPM packages and most of the tools we know and love are missing! Luckily, there is the Red Hat supplied toolbox container that contains all the necessary tools and is nicely integrated.

So to start the toolbox, use oc debug node/<nodename>. This will start a privileged container on the node you specify, mount the host file system on /host and drop you into a shell:

$ oc debug node/worker-0.lab.openshift.krenger.ch
Starting pod/worker-0labopenshiftkrengerch-debug ...
To use host binaries, run `chroot /host`
If you don't see a command prompt, try pressing enter.
sh-4.2# chroot /host
sh-4.4# toolbox
Container started successfully. To exit, type 'exit'.
sh-4.2#

Now we are running in the toolbox container on our CoreOS host with all the tools we know at our disposal, for example sosreport:

sh-4.2# sosreport

Running sosreport will generate a sosreport in /host/var/tmp/, which means it will be accessible in /var/tmp/ on the CoreOS host itself.

Linux Magic Reboot

If you have worked with remote Linux servers before, I am guessing you already encountered machines that just don’t want to reboot. This is typically due screwed-up network mounts or stuck processes, so the server will hang during shutdown. But it turns out that there are other ways to reboot a server.

One of these is the “Magic SysRq key“. To reboot a server using the SysRq trigger in the kernel, use the following two commands. First, enable the trigger:

echo 1 > /proc/sys/kernel/sysrq

Then, reboot the server the magic way by typing

echo b > /proc/sysrq-trigger

Note that this will reboot the server without unmounting or syncing the filesystems! There are also other options available via the SysRq trigger, some of them are listed in the Wikipedia article above.

Ansible: “python2 yum module is needed for this module”

I am currently toying around with GlusterFS and I am using Ansible to deploy and configure my server.

Using the yum module, I wanted to install the Gluster server package like so:

- name: Install glusterfs-server package
  yum:
    name: glusterfs-server
    state: latest

But when executing the playbook, I received the following error on executing this module:

Read the rest of this entry

Fedora 25: How to make “GNOME Classic” default

As I am working more and more with Linux, I am also using a virtual machine with Fedora 25 installed to play around with some things (notably Docker and Kubernetes). On Fedora 25, the default GNOME desktop environment is GNOME 3. But I personally prefer the GNOME Classic user interface.

To change the desktop environment, on login, select “GNOME Classic” as the desktop environment:

Read the rest of this entry

Workaround for WMI client over IPv6

Some years ago, I wrote some examples for the WMI client on Linux. I still get a lot of queries from people trying to use the WMI client to access Windows hosts and I am often happy to help if there are any problems.

One of the latest problems occurred when trying to access a Windows host over IPv6:

$ wmic -U 'user%password' //FD00:180::0:0:0:0:0 "Select Caption From Win32_OperatingSystem" [..] UNKNOWN - The WMI query had problems. The error text from wmic is: [librpc/rpc/dcerpc_util.c:343:dcerpc_parse_binding()] Unknown dcerpc transport 'FD00' [librpc/rpc/dcerpc_connect.c:337:dcerpc_pipe_connect_ncacn_ip_tcp_recv()] failed NT status (c0000017) in dcerpc_pipe_connect_ncacn_ip_tcp_recv [librpc/rpc/dcerpc_connect.c:828:dcerpc_pipe_connect_b_recv()] failed NT status (c0000017) in dcerpc_pipe_connect_b_recv [wmi/wmic.c:196:main()] ERROR: Login to remote object. NTSTATUS: NT_STATUS_NO_MEMORY - Memory allocation error

This was quite a funny problem, because the same query seemed to work when accessing the host over IPv4. So we quickly suspected that the WMI client does not support IPv6. By looking at the underlying Samba code (e.g. dcerpc_util.c and binding.c), I guessed that this seems to be a parsing issue of some kind.

Read the rest of this entry

OpenShift: List all pods in cluster

I recently started working with OpenShift and needed to get a list of all pods on the cluster. I quickly glanced at the documentation but could not find what I wanted. My colleagues quickly pointed me in the right direction:

oc get pod --all-namespaces -o wide

Here is the command with some example output of what to expect:

# oc get pod --all-namespaces -o wide
NAMESPACE                                 NAME                                                       READY     STATUS               RESTARTS   AGE       IP               NODE
my-project                                my-pod-43-d9mo6                                            1/1       Running              0          1d        192.168.0.183    node3.krenger.local
yet-another-project                       another-pod-43-7g3r0                                       1/1       Running              0          2d        192.168.0.184    node4.krenger.local
[..]

If you just want to know which pods are on a certain node, use oc adm manage-node:

oc adm manage-node node3.krenger.local --list-pods

DBA_INDEXES and DBA_SEGMENTS mismatch

This is a fun one.

We developed a script to process certain indexes on a database somehow, it kept missing some of the indexes that clearly existed. We then found the problem: DBA_INDEXES had more entries than DBA_SEGMENTS. See the following example:

SQL> SELECT owner, index_name as i_name from dba_indexes WHERE owner = 'SIMON'
  2  MINUS
  3  SELECT owner, segment_name as i_name FROM dba_segments WHERE owner = 'SIMON';

OWNER                I_NAME
-------------------- --------------------------------------------------
SIMON                IDX_...
SIMON                IDX_...
SIMON                IDX_...
SIMON                PK_...
SIMON                PK_...
SIMON                UNQ_...
SIMON                UNQ_...

7 rows selected.

So here we see that there are clearly indexes for which there are no segments. We then looked at the tables where these indexes are located and noticed a particular thing: All the corresponding tables for these indexes were empty.

So the reason for this behaviour is called “Deferred Segment Creation“. This means that when a “CREATE TABLE” statement is issued and no rows are inserted, there are no segments that are created. This behaviour can be controlled by the DEFERRED_SEGMENT_CREATION parameter.

This makes sense in large schemas, where not all tables are populated. Instead of having segments created and extents allocated, only the definition of the table is saved. As soon as the table has at least one row, the segments are automatically created.

Hello world

My name is Simon Krenger, I am a Technical Account Manager (TAM) at Red Hat. I advise our customers in using Kubernetes, Containers, Linux and Open Source.

Elsewhere

  1. GitHub
  2. LinkedIn
  3. GitLab