Skip to main content
Feedback

Log Collector Storage Configuration for Tainted Node Scheduling (added in v6.2.1)

Problem

Under topology spread constraints, or node affinity, or when taints are applied on nodes, log collector pods may fail to start due to persistent volume claim issues. A PersistentVolume (PV) may be created on a node or in a zone with a taint, but it prevents pods from being scheduled on that node. As a result, the pods cannot attach their volumes and remain stuck in the Pending state.

ExpectedBehavior

The PersistentVolume (PV) should be created on a node or zone where the pod can be successfully scheduled.

Root Cause

The volumeBindingMode setting of a StorageClass determines when PersistentVolumes (PVs) are provisioned:

volumeBindingModeBehaviorResult
ImmediatePV is provisioned when a PersistentVolumeClaim (PVC) is created.The PV could be provisioned on a random node or zone, preventing the pod from being scheduled.
WaitForFirstConsumerPV is provisioned only after the pod is scheduled.The PV is provisioned on the correct node or zone, allowing the pod to be scheduled successfully

Solution Options

Most cloud providers offer built-in StorageClasses that enable volumeBindingMode: WaitForFirstConsumer. Using these ensures that PVs are provisioned on nodes where pods can be scheduled, preventing pods from being stuck in the Pending state due to node taints.

Cloud ProviderBuilt-in StorageClassvolumeBindingMode
GKEstandard-rwoWaitForFirstConsumer
GKEpremium-rwoWaitForFirstConsumer
GKEstandardImmediate
EKSgp3WaitForFirstConsumer
AKSmanaged-csiWaitForFirstConsumer
AKSmanaged-csi-premiumWaitForFirstConsumer
AKSmanagedImmediate
OpenShiftocs-storagecluster-ceph-rbdWaitForFirstConsumer
LonghornlonghornWaitForFirstConsumer

Option 2: Create a custom StorageClass

If no suitable StorageClass exists, create a custom StorageClass with volumeBindingMode:WaitForFirstConsumer.

Steps for Helm Deployments

Step 1: Verify your StorageClass

Before deploying with Helm, ensure that your Kubernetes cluster has the correct StorageClass configured.

Run the following command to check the available StorageClass:

kubectl get storageclass <name> -o yaml | grep volumeBindingMode

Expected output:

volumeBindingMode: WaitForFirstConsumer

Step 2: Update values.yaml

Next, update the Helm chart templates located in the values.yaml file. Edit deploy/charts/logcollector/values.yaml:

storage:
className: standard-rwo # Change to your StorageClass with WaitForFirstConsumer

(Optional) Step 3: Create a custom StorageClass

If you need a custom StorageClass instead of the available storage class, create a file logcollector-storageclass.yaml in this location:
deploy/charts/logcollector/templates/logcollector-storageclass.yaml

Next, add the following to logcollector-storageclass.yaml file:

{{- if .Values.storage.createStorageClass }}
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: {{ .Values.storage.className }}
provisioner: {{ .Values.storage.provisioner }}
volumeBindingMode: WaitForFirstConsumer
reclaimPolicy: Delete
allowVolumeExpansion: true
{{- end }}

Then, add the following to values.yaml:

storage:
className: apim-logcollector-sc
createStorageClass: true
provisioner: pd.csi.storage.gke.io # Change for your cloud (see examples below)

Step 4: Deploy

Once your values.yaml is configured, and any custom StorageClass or chart edits are done, deploy your application using Helm.

helm upgrade --install <release-name> ./deploy -f values.yaml

Steps for Plain Kubernetes Deployments

Step 1: Verify your StorageClass

Before deploying your application on Kubernetes, ensure that your cluster has the correct StorageClass configured. Run the following command to check the available StorageClass:

kubectl get storageclass <name> -o yaml | grep volumeBindingMode

Step 2: (Optional) Create a custom StorageClass

If you need a custom StorageClass instead of the default, create a file storageclass.yaml:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: apim-logcollector-sc
provisioner: pd.csi.storage.gke.io # Change for your cloud
volumeBindingMode: WaitForFirstConsumer
reclaimPolicy: Delete
allowVolumeExpansion: true

Deploy your application to the cluster using kubectl apply:

kubectl apply -f storageclass.yaml

Step 3: Update your StatefulSet

Before applying your StatefulSet, ensure that its volumeClaimTemplates references the correct StorageClass.

volumeClaimTemplates:
- metadata:
name: log-storage
spec:
storageClassName: "standard-rwo" # Your StorageClass
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 1Gi

Step 4: Deploy or Apply changes

After verifying your StorageClass and updating your StatefulSet, apply all changes to deploy your application to the cluster.

kubectl apply -f statefulset.yaml

Examples

Helm values.yaml

Sample Example
storage:
type: dynamic
className: standard-rwo # GKE - change for your cloud
size: 1Gi
accessMode: ReadWriteOnce

Custom StorageClass by Cloud Provider

GKE (Google Cloud) Example
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: apim-logcollector-sc
provisioner: pd.csi.storage.gke.io
volumeBindingMode: WaitForFirstConsumer
reclaimPolicy: Delete
allowVolumeExpansion: true
parameters:
type: pd-balanced
EKS (AWS) Example
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: apim-logcollector-sc
provisioner: ebs.csi.aws.com
volumeBindingMode: WaitForFirstConsumer
reclaimPolicy: Delete
allowVolumeExpansion: true
parameters:
type: gp3
fsType: ext4
AKS (Azure) Example
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: apim-logcollector-sc
provisioner: disk.csi.azure.com
volumeBindingMode: WaitForFirstConsumer
reclaimPolicy: Delete
allowVolumeExpansion: true
parameters:
skuName: StandardSSD_LRS
OpenShift Example
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: apim-logcollector-sc
provisioner: kubernetes.io/cinder
volumeBindingMode: WaitForFirstConsumer
reclaimPolicy: Delete
allowVolumeExpansion: true
Longhorn (Rancher) Example
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: apim-logcollector-sc
provisioner: driver.longhorn.io
volumeBindingMode: WaitForFirstConsumer
reclaimPolicy: Delete
allowVolumeExpansion: true
parameters:
numberOfReplicas: "3"

Plain Kubernetes StatefulSet

Sample Example
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: logcollector
spec:
serviceName: logcollector-svc
replicas: 3
selector:
matchLabels:
app: logcollector
template:
metadata:
labels:
app: logcollector
spec:
tolerations:
- key: "dedicated"
operator: "Equal"
value: "system"
effect: "NoSchedule"
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: logcollector
containers:
- name: logcollector
image: your-image:tag
volumeMounts:
- name: log-storage
mountPath: /mnt/data/access
volumeClaimTemplates:
- metadata:
name: log-storage
spec:
storageClassName: "standard-rwo"
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 1Gi

Troubleshooting

If you encounter issues during your Kubernetes deployment, verify the following:

Step 1: Verify StorageClass binding mode

Ensure your StorageClass has the correct volumeBindingMode.

kubectl get storageclass -o custom-columns=NAME:.metadata.name,BINDING:.volumeBindingMode

Step 2: Check PersistentVolumeClaim (PVC) status

Verify that your PVCs are bound to a PersistentVolume.

kubectl get pvc -l app=logcollector
kubectl describe pvc <pvc-name>

Step 3: Check pod scheduling

If your pods are not starting as expected, check if any pod is in a Pending state.

kubectl describe pod <pod-name>
kubectl get events --field-selector reason=FailedScheduling
On this Page