Log Collector Storage Configuration for Tainted Node Scheduling (added in v6.2.1)
Problem
Under topology spread constraints, or node affinity, or when taints are applied on nodes, log collector pods may fail to start due to persistent volume claim issues. A PersistentVolume (PV) may be created on a node or in a zone with a taint, but it prevents pods from being scheduled on that node. As a result, the pods cannot attach their volumes and remain stuck in the Pending state.
ExpectedBehavior
The PersistentVolume (PV) should be created on a node or zone where the pod can be successfully scheduled.
Root Cause
The volumeBindingMode setting of a StorageClass determines when PersistentVolumes (PVs) are provisioned:
| volumeBindingMode | Behavior | Result |
|---|---|---|
Immediate | PV is provisioned when a PersistentVolumeClaim (PVC) is created. | The PV could be provisioned on a random node or zone, preventing the pod from being scheduled. |
WaitForFirstConsumer | PV is provisioned only after the pod is scheduled. | The PV is provisioned on the correct node or zone, allowing the pod to be scheduled successfully |
Solution Options
Option 1: Use a built-in StorageClass (Recommended)
Most cloud providers offer built-in StorageClasses that enable volumeBindingMode: WaitForFirstConsumer. Using these ensures that PVs are provisioned on nodes where pods can be scheduled, preventing pods from being stuck in the Pending state due to node taints.
| Cloud Provider | Built-in StorageClass | volumeBindingMode |
|---|---|---|
| GKE | standard-rwo | WaitForFirstConsumer |
| GKE | premium-rwo | WaitForFirstConsumer |
| GKE | standard | Immediate |
| EKS | gp3 | WaitForFirstConsumer |
| AKS | managed-csi | WaitForFirstConsumer |
| AKS | managed-csi-premium | WaitForFirstConsumer |
| AKS | managed | Immediate |
| OpenShift | ocs-storagecluster-ceph-rbd | WaitForFirstConsumer |
| Longhorn | longhorn | WaitForFirstConsumer |
Option 2: Create a custom StorageClass
If no suitable StorageClass exists, create a custom StorageClass with volumeBindingMode:WaitForFirstConsumer.
Steps for Helm Deployments
Step 1: Verify your StorageClass
Before deploying with Helm, ensure that your Kubernetes cluster has the correct StorageClass configured.
Run the following command to check the available StorageClass:
kubectl get storageclass <name> -o yaml | grep volumeBindingMode
Expected output:
volumeBindingMode: WaitForFirstConsumer
Step 2: Update values.yaml
Next, update the Helm chart templates located in the values.yaml file. Edit deploy/charts/logcollector/values.yaml:
storage:
className: standard-rwo # Change to your StorageClass with WaitForFirstConsumer
(Optional) Step 3: Create a custom StorageClass
If you need a custom StorageClass instead of the available storage class, create a file logcollector-storageclass.yaml in this location:
deploy/charts/logcollector/templates/logcollector-storageclass.yaml
Next, add the following to logcollector-storageclass.yaml file:
{{- if .Values.storage.createStorageClass }}
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: {{ .Values.storage.className }}
provisioner: {{ .Values.storage.provisioner }}
volumeBindingMode: WaitForFirstConsumer
reclaimPolicy: Delete
allowVolumeExpansion: true
{{- end }}
Then, add the following to values.yaml:
storage:
className: apim-logcollector-sc
createStorageClass: true
provisioner: pd.csi.storage.gke.io # Change for your cloud (see examples below)
Step 4: Deploy
Once your values.yaml is configured, and any custom StorageClass or chart edits are done, deploy your application using Helm.
helm upgrade --install <release-name> ./deploy -f values.yaml
Steps for Plain Kubernetes Deployments
Step 1: Verify your StorageClass
Before deploying your application on Kubernetes, ensure that your cluster has the correct StorageClass configured. Run the following command to check the available StorageClass:
kubectl get storageclass <name> -o yaml | grep volumeBindingMode
Step 2: (Optional) Create a custom StorageClass
If you need a custom StorageClass instead of the default, create a file storageclass.yaml:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: apim-logcollector-sc
provisioner: pd.csi.storage.gke.io # Change for your cloud
volumeBindingMode: WaitForFirstConsumer
reclaimPolicy: Delete
allowVolumeExpansion: true
Deploy your application to the cluster using kubectl apply:
kubectl apply -f storageclass.yaml
Step 3: Update your StatefulSet
Before applying your StatefulSet, ensure that its volumeClaimTemplates references the correct StorageClass.
volumeClaimTemplates:
- metadata:
name: log-storage
spec:
storageClassName: "standard-rwo" # Your StorageClass
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 1Gi
Step 4: Deploy or Apply changes
After verifying your StorageClass and updating your StatefulSet, apply all changes to deploy your application to the cluster.
kubectl apply -f statefulset.yaml
Examples
Helm values.yaml
Sample Example
storage:
type: dynamic
className: standard-rwo # GKE - change for your cloud
size: 1Gi
accessMode: ReadWriteOnce
Custom StorageClass by Cloud Provider
GKE (Google Cloud) Example
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: apim-logcollector-sc
provisioner: pd.csi.storage.gke.io
volumeBindingMode: WaitForFirstConsumer
reclaimPolicy: Delete
allowVolumeExpansion: true
parameters:
type: pd-balanced
EKS (AWS) Example
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: apim-logcollector-sc
provisioner: ebs.csi.aws.com
volumeBindingMode: WaitForFirstConsumer
reclaimPolicy: Delete
allowVolumeExpansion: true
parameters:
type: gp3
fsType: ext4
AKS (Azure) Example
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: apim-logcollector-sc
provisioner: disk.csi.azure.com
volumeBindingMode: WaitForFirstConsumer
reclaimPolicy: Delete
allowVolumeExpansion: true
parameters:
skuName: StandardSSD_LRS
OpenShift Example
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: apim-logcollector-sc
provisioner: kubernetes.io/cinder
volumeBindingMode: WaitForFirstConsumer
reclaimPolicy: Delete
allowVolumeExpansion: true
Longhorn (Rancher) Example
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: apim-logcollector-sc
provisioner: driver.longhorn.io
volumeBindingMode: WaitForFirstConsumer
reclaimPolicy: Delete
allowVolumeExpansion: true
parameters:
numberOfReplicas: "3"
Plain Kubernetes StatefulSet
Sample Example
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: logcollector
spec:
serviceName: logcollector-svc
replicas: 3
selector:
matchLabels:
app: logcollector
template:
metadata:
labels:
app: logcollector
spec:
tolerations:
- key: "dedicated"
operator: "Equal"
value: "system"
effect: "NoSchedule"
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: logcollector
containers:
- name: logcollector
image: your-image:tag
volumeMounts:
- name: log-storage
mountPath: /mnt/data/access
volumeClaimTemplates:
- metadata:
name: log-storage
spec:
storageClassName: "standard-rwo"
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 1Gi
Troubleshooting
If you encounter issues during your Kubernetes deployment, verify the following:
Step 1: Verify StorageClass binding mode
Ensure your StorageClass has the correct volumeBindingMode.
kubectl get storageclass -o custom-columns=NAME:.metadata.name,BINDING:.volumeBindingMode
Step 2: Check PersistentVolumeClaim (PVC) status
Verify that your PVCs are bound to a PersistentVolume.
kubectl get pvc -l app=logcollector
kubectl describe pvc <pvc-name>
Step 3: Check pod scheduling
If your pods are not starting as expected, check if any pod is in a Pending state.
kubectl describe pod <pod-name>
kubectl get events --field-selector reason=FailedScheduling