Simon Krenger

Using the Gateway API on OpenShift

With the retirement of Ingress NGINX, there has been some elevated interest in using the Gateway API. Many of the people I deal with have also approached me to ask how to do this on OpenShift. In this blog post, I wanted to show how to get started with the Gateway API on OpenShift by deploying the Gateway API and then use it via a HTTPRoute.

Read the rest of this entry

Tags: Envoy, Gateway API, HTTPRoute, Kubernetes, OpenShift

Start a Pod in Kubernetes using curl

Recently I came across the issue that a customer needed to start a Pod without the kubectl or the oc command line tooling. Since the Kubernetes API is a relatively straight-forward REST API, we can come up with a nice curl command for that, which basically does the following:

To create the Pod, send a POST request
The Pod definition is sent as JSON
The endpoint we’re using is /api/v1/namespaces/mynamespace/pods

Read the rest of this entry

Tags: curl, Kubernetes, Linux, OpenShift, Pod

Example Vector configuration for Splunk Cloud

For my k3s installation, I wanted to try out the free tier of Splunk Cloud for storing logs. That means configuring Vector (my log collector of choice) to forward the logs to Splunk Cloud. That was easier said than done.

Figuring out that we need to use type: splunk_hec_logs in the Vector configuraton was quite straight forward. However figuring out the endpoint for the configuration turned out to be more of a challenge. After some time, I finally understood what my Splunk Cloud Platform instance ID is, and after encountering various 303 See Other and Unexpected status: 404 Not Found errors I finally got a working Vector configuration:

Read the rest of this entry

Tags: k3s, Kubernetes, Logging, Splunk, Vector

Inspecting container checkpoints with checkpointctl

One of the newer features in Kubernetes (1.30 and later) is the Kubelet Checkpoint API. This new API allows users to create a stateful copy of a running container, a functionality which is often used for forensics or for debugging.

In Kubernetes installations where this feature is enabled, a checkpoint can be created by accessing the respective Kubelet API via curl or similar. In the following example I am also using the Kubernetes API /proxy endpoint (the same can also be done on the Node locally via localhost:10250/checkpoint/...):

$ curl -k -X POST --header "Authorization: Bearer $TOKEN" "$KUBERNETES_API_URL/api/v1/nodes/$NODE_NAME/proxy/checkpoint/$NAMESPACE_NAME/$POD_NAME/$CONTAINER_NAME"
{"items":["/var/lib/kubelet/checkpoints/checkpoint-fedora-74d79dd7f4-csrmg_skrenger-container-2024-12-12T12:56:19Z.tar"]}

Read the rest of this entry

Tags: checkpointctl, CRI-O, criu, Forensics, Kubernetes, Linux, OCP, OpenShift 4

Excluding / ignoring sensors in node_exporter

I like to use the Prometheus node_exporter to get metrics about my hardware. However some hardware (such as my X300M-STX mainboard) exposes sensors with some rather nonsensical values:

[..]
node_hwmon_temp_celsius{chip="platform_nct6775_656",sensor="temp13"} 49.75
node_hwmon_temp_celsius{chip="platform_nct6775_656",sensor="temp15"} 3.892313987e+06
node_hwmon_temp_celsius{chip="platform_nct6775_656",sensor="temp16"} 3.892313987e+06
[..]

To ignore such values, node_exporter only allowed the exclusion of complete chips / devices using --collector.hwmon.chip-exclude. However, in newer versions of node_exporter you’ll be able to exclude (or explicitly include) single sensors on a sensor-level using the following command line option:

--collector.hwmon.sensor-exclude="platform_nct6775_656;temp1[5,6]"

The argument is a regex that is matched against the device name and the sensor. Separate the chip name and the sensor name using “;“.

Tags: DeskMini X300, Linux, node_exporter, Open Source, Prometheus

10GbE in the DeskMini X300

As my little home server I have an Asrock DeskMini X300 with an AMD Ryzen 7 5700G (16 cores) and 64GB of memory. A nice low powered home server to play around with. Out of the box, the DeskMini comes with one 1 Gbit network interface (a Realtek chipset). Since most of my devices are connected via WiFi anyway, this was more than enough until now. But then, modernity arrived in my part of the world and we now have 10Gbit fiber internet, great!

10Gbit internet sounds awesome, however devices connected via WiFi will only ever see a real-world maximum of around 700 Mbits/sec via WiFi 6. But maybe my little DeskMini could use all that 10Gbit? Unfortunately, the DeskMini motherboard does not have any of the usual PCIe expansion slots apart from SATA and M.2 slots. So I decided to try the “IOCREST M.2 to Single 10G Ethernet Network Adapter (IO-M2F107-GLAN)” adapter (AliExpress link here), to see if that would work.

Read the rest of this entry

Tags: 10Gbit, DeskMini X300, IOCREST, Linux, Networking

ERROR: release image arch amd64 does not match host arch arm64

Well, so I tried installing a new ARM-based OpenShift Container Platform cluster on AWS. To prepare, I created an install-config.yaml file and changed the controlPlane.architecture and the compute.architecture field to “arm64” and then launched the installer. That did not work, it still complains about the architecture:

$ ./openshift-install create cluster --dir=.
INFO Credentials loaded from the "default" profile in file "/home/simon/.aws/credentials" 
INFO Consuming Install Config from target directory 
INFO Creating infrastructure resources...         
INFO Waiting up to 20m0s (until 11:07AM) for the Kubernetes API at https://api.skrenger-arm.lab.example.com:6443... 
INFO Pulling VM console logs                      
INFO Pulling debug logs from the bootstrap machine 
ERROR Attempted to gather ClusterOperator status after installation failure: listing ClusterOperator objects: Get "https://api.skrenger-arm.lab.example.com:6443/apis/config.openshift.io/v1/clusteroperators": dial tcp 3.64.25.143:6443: connect: connection refused 
ERROR Bootstrap failed to complete: Get "https://api.skrenger-arm.lab.example.com:6443/version": dial tcp 3.68.144.150:6443: connect: connection refused 
ERROR Failed waiting for Kubernetes API. This error usually happens when there is a problem on the bootstrap host that prevents creating a temporary control plane. 
ERROR The bootstrap machine failed to download the release image 
INFO Pulling quay.io/openshift-release-dev/ocp-release@sha256:9ffb17b909a4fdef5324ba45ec6dd282985dd49d25b933ea401873183ef20bf8... 
INFO cfce1ab124f59e93a0f67d7e85283d524ddfd73a27d0535319d69d1dce746488 
INFO ERROR: release image arch amd64 does not match host arch arm64 
INFO Bootstrap gather logs captured here "/home/simon/Downloads/arm/log-bundle-20221124110737.tar.gz"

Read the rest of this entry

Tags: amd64, ARM, arm64, Error message, Installer, OpenShift 4, OpenShift Container Platform, Troubleshooting

“import torch” fails on the NVIDIA Jetson Nano

NVIDIA provides the Linux4Tegra (L4T) distribution as an image for use with the NVIDIA Jetson Nano. However, once you upgrade the whole system, strange problems will pop up, one of which I have described here: NVIDIA Docker “permission denied: unknown.” on Jetson Nano.

When applying a popular solution described here by adding a new repository to your L4T installation, this will result in interesting error messages such as the following when trying to run L4T-ML containers:

docker run  --rm --runtime nvidia -it nvcr.io/nvidia/l4t-ml:r32.7.1-py3 python3 -c "import torch"

[..]
libcurand.so.10: cannot open shared object file: No such file or directory

Read the rest of this entry

Tags: Jetson Nano, libcudnn.so.8, libcurand.so.10, Linux, NVIDIA, nvidia-docker2, PyTorch

NVIDIA Docker “permission denied: unknown.” on Jetson Nano

I recently bought an NVIDIA Jetson Nano Developer Kit to fiddle around with things like MicroShift or TensorFlow. The board is typically used with L4T (Linux for Tegra) based on Ubuntu 18.04. Fedora can also be installed, although not all drivers (for example for the GPU) are available yet. So after properly updating the system with the latest packages, when starting a container using the nvidia runtime, I got the following error:

docker run -it --rm --runtime nvidia --network host nvcr.io/nvidia/l4t-ml:r32.6.1-py3
[..]
docker: Error response from daemon: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: error adding seccomp filter rule for syscall clone3: permission denied: unknown.

Read the rest of this entry

Tags: Containers, Docker, Jetson Nano, L4T, Linux, NVIDIA, nvidia-docker2, Troubleshooting

jq: Delete an element from an array

When working with JSON data, I typically use jq to mangle the data. I keep this post as a reference for myself on how to remove an element from a JSON list or array using jq.

Given we have the following array:

$ echo '{"hello": "world", "myarray": ["a", "b", "c"]}' | jq
{
  "hello": "world",
  "myarray": [
    "a",
    "b",
    "c"
  ]
}

To remove an element from the array, use the del function with the select function to select a single element:

jq 'del(.myarray[] | select(. == "b"))'

So when applying this to the above array, we can remove “b” from the array like so:

$ echo '{"hello": "world", "myarray": ["a", "b", "c"]}' | jq 'del(.myarray[] | select(. == "b"))'
{
  "hello": "world",
  "myarray": [
    "a",
    "c"
  ]
}

Tags: jq, JSON, Linux

Simon Krenger

Using the Gateway API on OpenShift

Start a Pod in Kubernetes using curl

Example Vector configuration for Splunk Cloud

Inspecting container checkpoints with checkpointctl

Excluding / ignoring sensors in node_exporter

10GbE in the DeskMini X300

ERROR: release image arch amd64 does not match host arch arm64

“import torch” fails on the NVIDIA Jetson Nano

NVIDIA Docker “permission denied: unknown.” on Jetson Nano

jq: Delete an element from an array

Hello world

Blog Categories

Elsewhere