After deploying 50+ applications with Docker in production environments, here are the lessons that saved us from outages, security breaches, and 3 AM debugging sessions.
Lesson 1: Multi-Stage Builds Save Resources and Security
Our first Docker images were 2.4GB. Now they're 87MB. Here's how:
123456789101112131415# Before: Single stage (2.4GB)FROM node:18WORKDIR /appCOPY package*.json ./RUN npm installCOPY . .RUN npm run buildEXPOSE 3000CMD ["npm", "start"]# After: Multi-stage build (87MB)# Build stageFROM node:18-alpine AS builderWORKDIR /appCOPY package*.json ./RUN npm ci --only=production && npm cache clean --force
Benefits:
- 96% smaller image size
- No build tools in production image
- Runs as non-root user
- Faster deployments and scaling
Lesson 2: Health Checks Are Critical
Without proper health checks, Kubernetes kept routing traffic to broken containers.
123456# Dockerfile health checkHEALTHCHECK --interval=30s --timeout=3s --start-period=40s --retries=3 \CMD curl -f http://localhost:3000/health || exit 1# Or for Node.js apps without curlHEALTHCHECK --interval=30s --timeout=3s --start-period=40s --retries=3 \CMD node health-check.js
123456789101112131415// health-check.jsconst http = require('http')const options = {hostname: 'localhost',port: 3000,path: '/health',timeout: 2000}const req = http.request(options, (res) => {if (res.statusCode === 200) {process.exit(0)} else {process.exit(1)}})
Lesson 3: Secrets Management Done Right
Never put secrets in environment variables or Dockerfiles. Use Docker Secrets or external secret management.
12345678# Bad: Secrets in environment variablesENV DATABASE_PASSWORD=super_secret_password# Good: Use Docker Secretsdocker service create \--name myapp \--secret db_password \--env DATABASE_PASSWORD_FILE=/run/secrets/db_password \myapp:latest
1234567891011// Reading secrets in Node.jsconst fs = require('fs')function getSecret(secretName) {try {return fs.readFileSync(`/run/secrets/${secretName}`, 'utf8').trim()} catch (error) {// Fallback to environment variable for developmentreturn process.env[secretName.toUpperCase()]}}const dbPassword = getSecret('db_password')
Lesson 4: Resource Limits Prevent Cascading Failures
One container consumed all memory and killed our entire node. Solution: proper resource limits.
12345678910111213# Docker Composeservices:app:image: myapp:latestdeploy:resources:limits:memory: 512Mcpus: '0.5'reservations:memory: 256Mcpus: '0.25'restart: unless-stopped
123456789101112131415161718# Kubernetes deploymentapiVersion: apps/v1kind: Deploymentmetadata:name: myappspec:template:spec:containers:- name: myappimage: myapp:latestresources:requests:memory: "256Mi"cpu: "250m"limits:memory: "512Mi"cpu: "500m"
Lesson 5: Logging Configuration
Default logging filled our disks. Configure log rotation and structured logging.
12345678910111213141516# Docker daemon logging config{"log-driver": "json-file","log-opts": {"max-size": "10m","max-file": "3"}}# Or in docker-compose.ymlservices:app:logging:driver: "json-file"options:max-size: "10m"max-file: "3"
12345678910111213141516171819// Structured logging in Node.jsconst winston = require('winston')const logger = winston.createLogger({level: process.env.LOG_LEVEL || 'info',format: winston.format.combine(winston.format.timestamp(),winston.format.errors({ stack: true }),winston.format.json()),transports: [new winston.transports.Console()]})// Usagelogger.info('User created', { userId: user.id, email: user.email,ip: req.ip })
Lesson 6: Security Scanning in CI/CD
We caught 23 critical vulnerabilities by scanning images before deployment.
123456789101112131415# GitHub Actions security scanname: Docker Security Scanon:push:branches: [main]pull_request:jobs:security-scan:runs-on: ubuntu-lateststeps:- uses: actions/checkout@v4- name: Build Docker imagerun: docker build -t myapp:${{ github.sha }} .- name: Run Trivy vulnerability scanneruses: aquasecurity/trivy-action@master
Lesson 7: Monitoring and Observability
Essential metrics to monitor in production:
123456789101112131415// Custom metrics endpointconst promClient = require('prom-client')const express = require('express')// Create metricsconst httpRequestDuration = new promClient.Histogram({name: 'http_request_duration_seconds',help: 'Duration of HTTP requests in seconds',labelNames: ['method', 'route', 'status']})const memoryUsage = new promClient.Gauge({name: 'nodejs_memory_usage_bytes',help: 'Memory usage in bytes',collect() {const memUsage = process.memoryUsage()this.set({ type: 'rss' }, memUsage.rss)
Lesson 8: Blue-Green Deployments
Zero-downtime deployments with Docker Swarm:
123456789101112131415# deploy.sh#!/bin/bashset -eNEW_VERSION=$1SERVICE_NAME="myapp"echo "Deploying $SERVICE_NAME version $NEW_VERSION"# Update service with new imagedocker service update \--image myapp:$NEW_VERSION \--update-parallelism 1 \--update-delay 30s \--update-failure-action rollback \--update-monitor 60s \$SERVICE_NAME# Wait for deployment to complete
Production Dockerfile Template
Here's our battle-tested Dockerfile template:
123456789101112131415# Multi-stage build for Node.js appFROM node:18-alpine AS baseRUN apk add --no-cache curlWORKDIR /app# Dependencies stageFROM base AS depsCOPY package*.json ./RUN npm ci --only=production && npm cache clean --force# Build stageFROM base AS builderCOPY package*.json ./RUN npm ciCOPY . .RUN npm run build# Production stage
Docker Compose for Development
123456789101112131415# docker-compose.ymlversion: '3.8'services:app:build: .ports:- "3000:3000"environment:- NODE_ENV=development- DATABASE_URL=postgresql://user:pass@db:5432/myappvolumes:- .:/app- /app/node_modulesdepends_on:db:
Key Takeaways
- Use multi-stage builds to minimize image size and attack surface
- Always implement health checks for proper orchestration
- Never put secrets in images or environment variables
- Set resource limits to prevent resource starvation
- Configure log rotation to prevent disk space issues
- Scan images for vulnerabilities in your CI/CD pipeline
- Monitor container metrics and application performance
- Use proper deployment strategies for zero-downtime updates
Docker in production requires attention to security, monitoring, and operational practices. Start with these patterns and adapt them to your specific needs.