Multinode setup using helm charts where access node and data node on different VM

Mohammed_Iliyas_pate · June 21, 2022, 5:16am

Question
Is it possible to have a multinode setup using helm charts, where the Access node is on one virtual machine and data nodes on different nodes?

To Reproduce
With existing commands, all pods are created on the same node\virtual machine.

Expected behavior
On installation, access node should get deployed on the virtual machine \ Node and data nodes on other virtual machines?

Deployment

helm default value.yaml file
pg12.5-ts2.0.0-p0

Deployment
Linux Kubernetes

LorraineP · June 21, 2022, 9:40am

Hi @Mohammed_Iliyas_pate

I checked in with our engineering team, and here’s their advice.

The default values for the multinode recipe have a podAntiAffinity for pods of a same release that should theoretically avoid that situation. It is set to preferred though, not required, so the scheduler can ignore it if there are not enough eligible nodes in the cluster.

values.yaml

affinityTemplate: |
  podAntiAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
    - weight: 100
      podAffinityTerm:
        topologyKey: "kubernetes.io/hostname"
        labelSelector:
          matchLabels:
            app:  {{ template "timescaledb.name" . }}
            release: {{ .Release.Name | quote }}

preferred as a default makes sense, to be as flexible as possible. We’d recommend ensuring the cluster has enough capacity, then changing that default to requiredDuringSchedulingIgnoredDuringExecution

Additional thoughts: that affinityTemplate could also be used to ensure data nodes are in different AZs, if the AZ is available as a node label. Or, it could be used the other way around, to ensure they are in the same AZ to reduce paid cross-zone traffic, if cost is more important than availability.

Mohammed_Iliyas_pate · June 21, 2022, 12:06pm

Thank you @LorraineP for your quick support. I think this is what I was looking for. I will explore the affinity template and try to implement it.

I will update you on this forum.

Steve · July 18, 2022, 6:29am

I too am struggling with this. I have installed several times on a k3s cluster with 3 worker nodes and the installation works but each time I see all pods being created on the same node, which is clearly undesirable. I would expect each pod to be created on a different node in the cluster as it makes no sense to me to have a distributed database which is only running on one node.

I tried the suggestion from Lorraine but it made no difference: I changed the podAntiAffinity as below. Maybe this is wrong, I’m not sure.

   podAntiAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchLabels:
            app:  {{ template "timescaledb.name" . }}
            release: {{ .Release.Name | quote }}
        topologyKey: "kubernetes.io/hostname"

One other thing, the post installation instructions for obtaining the passwords for multi node are incorrect. Given a db name of “dbserver”, it tells you to get the postgres password using the command:

PGPASSWORD_SUPERUSER=$(kubectl get secret --namespace default dbserver-timescaledb -o jsonpath="{.data.password-superuser}" | base64 --decode)

It should be:

PGPASSWORD_SUPERUSER=$(kubectl get secret --namespace default dbserver-timescaledb-access -o jsonpath="{.data.password-superuser}" | base64 --decode)

LorraineP · July 27, 2022, 10:21am

Hi @Steve sorry for the delay in response and thank you for that correction. I’ll get this to the team, just want to check you mean this file

github.com

timescale/timescaledb-kubernetes/blob/88d1bb40521c93feed3b17785d7d1d5c1f56669e/charts/timescaledb-multinode/templates/NOTES.txt

# This file and its contents are licensed under the Apache License 2.0.
# Please see the included NOTICE for copyright information and LICENSE for a copy of the license.

TimescaleDB can be accessed via port 5432 on the following DNS name from within your cluster:
{{ template "timescaledb.fullname" . }}.{{ .Release.Namespace }}.svc.cluster.local

To get your password for superuser run:

    # superuser password
    PGPASSWORD_SUPERUSER=$(kubectl get secret --namespace {{ .Release.Namespace }} {{ template "timescaledb.fullname" . }} -o jsonpath="{.data.password-superuser}" | base64 --decode)

    # admin password
    PGPASSWORD_ADMIN=$(kubectl get secret --namespace {{ .Release.Namespace }} {{ template "timescaledb.fullname" . }} -o jsonpath="{.data.password-admin}" | base64 --decode)

To connect to your database:

1. Run a postgres pod and connect using the psql cli:
    # login as superuser
    kubectl run -i --tty --rm psql --image=postgres \
      --env "PGPASSWORD=$PGPASSWORD_SUPERUSER" \

This file has been truncated. show original

Also, I’ve asked the Timescale team about targeting pods to specific nodes, and hope to have a response for you.

Steve · July 27, 2022, 10:33am

Hi Lorraine,

Yes, that’s the file.

BR. Steve

LorraineP · July 28, 2022, 11:52am

Hello again @Steve

I don’t think we have examples of this in our projects, but I have suggestions from the team that you could explore perhaps? Here they are:

Scheduling a pod to a specific node is possible using taints mechanism in Kubernetes. If a user wants to prevent workloads from being scheduled to the same node for HA purposes, it’s also possible to use topology spread constraints.

Hope this helps!

ulim · September 15, 2022, 5:58pm

The way I have done it is, I have used the node selector option in the chart values file:

nodeSelector: { type: storage }

And I’ve tainted my storage nodes like so:

kubectl taint nodes my-storagenode type=storage:NoExecute