Skip to main content

Operating Kubernetes

Production-grade Kubernetes cluster operations covering resource management, advanced scheduling, networking, storage, security hardening, and autoscaling patterns.

When to Use

Use when:

  • Deploying workloads to Kubernetes clusters
  • Configuring resource requests/limits and QoS classes
  • Implementing NetworkPolicies for zero-trust security
  • Setting up autoscaling (HPA, VPA, KEDA)
  • Managing persistent storage with StorageClasses
  • Troubleshooting pod issues (CrashLoopBackOff, Pending)
  • Configuring RBAC with least privilege
  • Spreading pods across availability zones

Key Features

Resource Management

  • Quality of Service (QoS) classes: Guaranteed, Burstable, BestEffort
  • Resource quotas and LimitRanges for multi-tenancy
  • Vertical Pod Autoscaler for automated rightsizing
  • Container resource requests and limits (CPU/memory)

Advanced Scheduling

  • Node affinity for targeted pod placement
  • Taints and tolerations for workload isolation
  • Topology spread constraints for high availability
  • Pod priority and preemption

Networking

  • NetworkPolicies with default-deny security
  • Ingress vs. Gateway API for traffic management
  • Service mesh integration patterns
  • DNS-based service discovery

Storage Operations

  • StorageClasses for performance tiers (SSD, HDD)
  • PersistentVolumeClaims and access modes (RWO, ROX, RWX)
  • CSI drivers for cloud-native storage
  • Volume snapshots and backups

Security

  • RBAC (Role-Based Access Control)
  • Pod Security Standards (Restricted, Baseline, Privileged)
  • Policy enforcement with Kyverno/OPA Gatekeeper
  • Secrets management integration

Autoscaling

  • Horizontal Pod Autoscaler (HPA) for replica scaling
  • Vertical Pod Autoscaler (VPA) for resource optimization
  • KEDA for event-driven autoscaling (queues, cron, metrics)
  • Cluster autoscaler for node provisioning

Quick Start

# High-availability deployment with resource management
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-server
spec:
replicas: 3
template:
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app: api-server
topologyKey: topology.kubernetes.io/zone
containers:
- name: api
image: api-server:v1.2.0
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
livenessProbe:
httpGet:
path: /health/live
port: 8080
initialDelaySeconds: 30
readinessProbe:
httpGet:
path: /health/ready
port: 8080
initialDelaySeconds: 10
---
# Horizontal Pod Autoscaler
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api-server-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api-server
minReplicas: 3
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70

Troubleshooting Common Issues

Pod Stuck in Pending:

kubectl describe pod <pod-name>
# Check: Insufficient resources, node selector mismatch, PVC not bound, taint intolerance

CrashLoopBackOff:

kubectl logs <pod-name>
kubectl logs <pod-name> --previous
# Check: Application crash, missing env vars, liveness probe failing, OOMKilled

Service Not Accessible:

kubectl get endpoints <service-name>
# If empty: Service selector doesn't match pod labels, pods not ready, NetworkPolicies blocking

References

  • Full Skill Documentation
  • Kubernetes Official Docs
  • Resource Management: references/resource-management.md
  • Scheduling Patterns: references/scheduling-patterns.md
  • Networking: references/networking.md
  • Security: references/security.md
  • Troubleshooting: references/troubleshooting.md