Platform Engineering
Build Internal Developer Platforms (IDPs) that provide self-service infrastructure, reduce cognitive load, and accelerate developer productivity through golden paths and platform-as-product thinking.
When to Use
Trigger this skill when:
- Building or improving an internal developer platform
- Designing a developer portal (Backstage, Port, or commercial IDP)
- Implementing golden paths and software templates
- Establishing or restructuring a platform engineering team
- Measuring and improving developer experience (DevEx)
- Integrating IDP with infrastructure, CI/CD, observability, or security tools
- Driving platform adoption across an engineering organization
- Assessing platform maturity and identifying capability gaps
Core Concepts
Platform as Product
Treat internal platforms with the same rigor as customer-facing products:
Product Management Approach:
- Define platform vision, strategy, and roadmap
- Identify developer "customers" and their pain points
- Measure success via adoption metrics, satisfaction surveys, and business impact
- Iterate based on feedback loops and usage analytics
- Balance new capabilities with platform reliability and support
Key Differences from Traditional DevOps:
- DevOps focuses on delivery pipelines; platform engineering builds comprehensive developer experiences
- Platform teams operate as product teams (product managers, UX designers, engineers)
- Success measured by developer productivity and satisfaction, not just infrastructure metrics
- Self-service is the primary interface, not ticket queues
Internal Developer Platform (IDP) Architecture
Three-Layer Architecture:
1. Developer Portal (Frontend)
- Service catalog: Inventory of services with ownership, dependencies, health status
- Software templates: Project scaffolding with best practices baked in
- Documentation hub: Centralized, searchable, version-controlled docs
- Self-service workflows: Environment provisioning, deployments, access requests
2. Platform Orchestration (Backend)
- Infrastructure provisioning: Multi-cloud resource management
- Environment management: Dev, staging, production lifecycle
- Deployment automation: GitOps-based continuous delivery
- Configuration management: Separation of app and infrastructure concerns
3. Integration Layer (Glue)
- CI/CD integration: Pipeline visibility and triggering
- Observability: Metrics, logs, traces surfaced in portal
- Security: Vulnerability scanning, policy enforcement, secrets management
- FinOps: Cost visibility, budgets, optimization recommendations
Golden Paths and Scaffolding
Golden Path Principle: Provide opinionated templates that handle 80% of use cases while allowing escape hatches for the remaining 20%.
Template Components:
- Repository structure and boilerplate code
- Infrastructure as code (Kubernetes manifests, Terraform)
- CI/CD pipeline configurations
- Observability instrumentation (metrics, logging, tracing)
- Security configurations (RBAC, network policies, secrets)
- Documentation templates (README, runbooks, architecture diagrams)
Constraint Mechanisms:
- Policy-as-code enforcement (OPA, Kyverno) for security and compliance
- Resource limits and quotas to prevent over-provisioning
- Required health checks and observability instrumentation
- Approved base images and dependency scanning
Platform Maturity Assessment
Assess current platform capabilities using a 5-level maturity model:
- Level 0: Ad-Hoc - Manual provisioning, no standardization
- Level 1: Basic Automation - Some IaC and CI/CD, limited self-service
- Level 2: Paved Paths - Golden path templates, early portal, limited coverage
- Level 3: Self-Service Platform - Comprehensive portal, 80%+ self-service
- Level 4: Product-Driven Platform - Data-driven, product team structure, FinOps integration
- Level 5: AI-Augmented Platform - AI-assisted troubleshooting, predictive optimization
Decision Framework: Build vs Buy IDP
Choose Open Source (Backstage) when:
- Large enterprise (1000+ engineers)
- Dedicated platform team available (5-10 engineers)
- Deep customization required
- Open-source ecosystem preferred
- Long-term investment (3+ year horizon)
Choose Commercial IDP (Port, Humanitec, Cortex) when:
- Mid-size organization (100-1000 engineers)
- Faster time-to-value needed (3-6 months vs. 6-12 months)
- Prefer managed solution with vendor support
- Limited platform engineering resources (<5 engineers)
- Standard use cases (web apps, microservices, CI/CD)
Choose Hybrid Approach when:
- Large organization needing both flexibility and speed
- Complex infrastructure requiring orchestration backend
- Want best-in-class portal + orchestration components
- Willing to integrate multiple systems (e.g., Backstage + Humanitec)
Tool Recommendations
Developer Portals
Backstage (Open Source, CNCF)
- Trust Score: 78.7/100, 8,876 code snippets
- Software catalog, scaffolder, TechDocs, plugin ecosystem
- Recommended for: Enterprises with platform teams
Port (Commercial)
- Managed platform, modern UI/UX, faster time-to-value
- Recommended for: Mid-size orgs (100-1000 engineers)
Cortex (Commercial SaaS)
- Enterprise IDP, compliance focus, engineering standards enforcement
- Recommended for: Regulated industries
Platform Orchestration
Crossplane (Open Source, CNCF)
- Trust Score: 67.4/100, universal control plane for multi-cloud
- Kubernetes-native declarative infrastructure
- Recommended for: Multi-cloud abstractions
Humanitec (Commercial)
- Platform Orchestrator backend, environment and deployment management
- Recommended for: Complex infrastructure, complements portals
Terraform Cloud (Commercial)
- Mature IaC orchestration, workspace management
- Recommended for: Terraform-heavy organizations
GitOps Continuous Delivery
Argo CD (Open Source, CNCF) - RECOMMENDED
- Trust Score: 91.8/100 (HIGHEST)
- Declarative GitOps for Kubernetes, multi-cluster management
- Industry-leading documentation and community
Flux (Open Source, CNCF)
- Toolkit approach, Kubernetes-native
- Good for: GitOps-native operations
Implementation Guide: Bootstrapping a Platform
Foundation Phase (Months 1-3)
- Define platform vision and form platform team (3-5 members)
- Interview developers to identify pain points
- Set up developer portal (Backstage or commercial)
- Create initial service catalog and first golden path template
Pilot Phase (Months 4-6)
- Select 2-3 pilot teams for white-glove onboarding
- Rapid iteration based on feedback
- Expand to 3-5 golden path templates
- Integrate key tools (CI/CD, monitoring, secrets)
Expansion Phase (Months 7-12)
- Scale to 20-50% of engineering teams
- Build self-service documentation and training
- Establish platform SLOs and on-call rotation
- Internal evangelization (demos, champions program)
Maturity Phase (Year 2+)
- 80%+ adoption across organization
- Platform team operates as product team
- Continuous improvement via metrics and feedback
- AI-assisted capabilities, policy-as-code expansion
Creating Golden Path Templates
Template Design Process:
- Identify most common use case (web app, API, data pipeline)
- Define opinionated choices (language, framework, deployment pattern)
- Create repository structure and infrastructure manifests
- Configure CI/CD pipeline with security scanning
- Instrument observability and document usage
- Test with pilot team before broad rollout
Template Categories:
- Full-stack web application (backend API + frontend + database)
- Data pipeline (ETL/ELT with orchestration)
- Machine learning service (model serving, monitoring)
- Event-driven microservice (message broker integration)
- Scheduled job (cron jobs, batch processing)
Key Developer Experience Metrics
DORA Metrics
- Deployment frequency: How often code reaches production
- Lead time for changes: Commit to production duration
- Mean time to recovery: MTTR for incidents
- Change failure rate: Percentage of deployments causing incidents
SPACE Framework
- Satisfaction: Developer happiness via surveys and NPS
- Performance: Throughput and efficiency of work completed
- Activity: Code commits, PRs, deployments (context, not raw counts)
- Communication: Collaboration quality, discoverability
- Efficiency: Minimize interruptions, reduce toil
Platform-Specific Metrics
- Platform adoption rate (percentage of teams using platform)
- Self-service rate (actions completed without platform team tickets)
- Onboarding time (new developer to first production deployment)
- Template usage (which golden paths are adopted)
- Support ticket volume and resolution time
Platform Engineering Checklist
Strategy and Vision
- Platform vision and charter documented
- Platform team formed with clear roles
- Developer pain points identified via interviews
- Success metrics defined (DORA, SPACE, adoption)
IDP Foundation
- Developer portal deployed (Backstage, Port, or commercial)
- Service catalog established (ownership, dependencies, health)
- First golden path template created and validated
- Documentation hub accessible to all engineers
Self-Service Capabilities
- Environment provisioning (dev, staging, production)
- Deployment automation (GitOps with Argo CD or Flux)
- CI/CD integration visible in portal
- Observability dashboards per-service
Security and Compliance
- Policy-as-code enforcement (OPA, Kyverno)
- Secrets management integrated (Vault, cloud providers)
- Vulnerability scanning in pipelines
- RBAC and access controls configured
Operations and Support
- Platform SLOs defined and monitored
- Support channels established (Slack, office hours)
- Incident response playbooks documented
- Feedback loops and usage analytics in place
Common Pitfalls
Building Too Much Upfront: Start small (1 golden path, pilot team) and iterate
Ignoring Developer Feedback: Establish continuous feedback loops, not just quarterly surveys
Over-Standardization: Provide clear escape hatches for advanced use cases
Under-Measuring Success: Track DORA metrics, satisfaction surveys, self-service rates
Treating Platform as IT Project: Platform engineering is product development, not infrastructure provisioning
Related Skills
- Implementing GitOps - GitOps principles, Argo CD / Flux implementation
- Building CI Pipelines - CI/CD pipeline design integrated into platform
- Testing Strategies - Testing best practices enforced through golden paths