Docker and CI/CD: Optimizing Infrastructure with GitLab
Multi-stage Docker builds: Reducing image size up to 10x
Introduction
Infrastructure optimization is not optional in modern development teams. In a recent project, we reduced build time by 60%, deploy time by 50%, and improved team satisfaction by 35% by applying advanced Docker and CI/CD strategies.
Why does it matter? In my experience, teams with slow pipelines lose momentum. A developer who waits 15 minutes to see if their PR passes tests loses context and productivity. With the techniques I'll share, we took our pipelines from 15 minutes to 5 minutes.
Multi-stage Docker builds
Problem: Huge images
# ❌ BAD: 1.5GB image
FROM node:18
WORKDIR /app
COPY package*.json ./
RUN npm install
COPY . .
RUN npm run build
EXPOSE 3000
CMD ["npm", "start"]
This image includes:
- Development dependencies
- Uncompiled source code
- npm cache
- Result: 1.5GB
Solution: Multi-stage build
# ✅ GOOD: 150MB image (10x smaller)
# Stage 1: Build
FROM node:18-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production --ignore-scripts
COPY . .
RUN npm run build
# Stage 2: Production
FROM node:18-alpine AS production
WORKDIR /app
ENV NODE_ENV=production
# Copy only production dependencies
COPY --from=builder /app/package*.json ./
RUN npm ci --only=production --ignore-scripts
# Copy built application
COPY --from=builder /app/dist ./dist
# Create non-root user
RUN addgroup -g 1001 -S nodejs
RUN adduser -S nodejs -u 1001
USER nodejs
EXPOSE 3000
CMD ["node", "dist/main.js"]
Results:
- Size: 150MB (vs 1.5GB)
- Push/pull time: 90% faster
- Attack surface: Drastically reduced
- Contains only what's needed for production
Advanced example: Fullstack application
# Multi-stage build for React + Node.js app
# Stage 1: Build frontend
FROM node:18-alpine AS frontend-builder
WORKDIR /app/frontend
COPY frontend/package*.json ./
RUN npm ci
COPY frontend/ ./
RUN npm run build
# Stage 2: Build backend
FROM node:18-alpine AS backend-builder
WORKDIR /app/backend
COPY backend/package*.json ./
RUN npm ci
COPY backend/ ./
RUN npm run build
# Stage 3: Production
FROM node:18-alpine AS production
WORKDIR /app
# Copy backend
COPY --from=backend-builder /app/backend/dist ./dist
COPY --from=backend-builder /app/backend/package*.json ./
RUN npm ci --only=production
# Copy frontend build to serve static files
COPY --from=frontend-builder /app/frontend/build ./public
# Security
RUN addgroup -g 1001 -S nodejs && adduser -S nodejs -u 1001
USER nodejs
EXPOSE 3000
CMD ["node", "dist/server.js"]
Layer caching strategies
Leverage layer ordering
# ✅ GOOD: Dependencies are cached if package.json doesn't change
FROM node:18-alpine
WORKDIR /app
# 1. Copy only package files (change infrequently)
COPY package*.json ./
RUN npm ci
# 2. Copy source code (changes frequently)
COPY . .
RUN npm run build
# Dependencies are cached if package.json hasn't changed
BuildKit and cache mount
# syntax=docker/dockerfile:1
FROM node:18-alpine
WORKDIR /app
# Use BuildKit cache mount for npm cache
RUN --mount=type=cache,target=/root/.npm \\
npm install -g npm@latest
COPY package*.json ./
RUN --mount=type=cache,target=/root/.npm \\
npm ci
COPY . .
RUN npm run build
To use:
DOCKER_BUILDKIT=1 docker build .
Docker Compose for development
Optimized configuration
# docker-compose.yml
version: '3.8'
services:
# Backend API
api:
build:
context: ./backend
dockerfile: Dockerfile.dev
ports:
- "3000:3000"
volumes:
# Hot reload
- ./backend/src:/app/src:ro
- /app/node_modules
environment:
- NODE_ENV=development
- DATABASE_URL=postgresql://postgres:postgres@db:5432/myapp
- REDIS_URL=redis://redis:6379
depends_on:
db:
condition: service_healthy
redis:
condition: service_started
networks:
- app-network
# Frontend
frontend:
build:
context: ./frontend
dockerfile: Dockerfile.dev
ports:
- "5173:5173"
volumes:
- ./frontend/src:/app/src:ro
- /app/node_modules
environment:
- VITE_API_URL=http://localhost:3000
networks:
- app-network
# Database
db:
image: postgres:15-alpine
ports:
- "5432:5432"
environment:
- POSTGRES_USER=postgres
- POSTGRES_PASSWORD=postgres
- POSTGRES_DB=myapp
volumes:
- postgres_data:/var/lib/postgresql/data
- ./init.sql:/docker-entrypoint-initdb.d/init.sql
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 5s
timeout: 5s
retries: 5
networks:
- app-network
# Redis
redis:
image: redis:7-alpine
ports:
- "6379:6379"
volumes:
- redis_data:/data
networks:
- app-network
volumes:
postgres_data:
redis_data:
networks:
app-network:
driver: bridge
Dockerfile for development with hot reload
# Dockerfile.dev
FROM node:18-alpine
WORKDIR /app
# Install dependencies
COPY package*.json ./
RUN npm install
# Copy source
COPY . .
# Expose port
EXPOSE 3000
# Development command with hot reload
CMD ["npm", "run", "dev"]
Optimized GitLab CI/CD pipeline: From 15 minutes to 5 minutes
Optimized GitLab CI/CD Pipeline
Complete .gitlab-ci.yml configuration
# .gitlab-ci.yml
stages:
- test
- build
- deploy
variables:
DOCKER_DRIVER: overlay2
DOCKER_TLS_CERTDIR: "/certs"
IMAGE_TAG: $CI_REGISTRY_IMAGE:$CI_COMMIT_SHORT_SHA
CACHE_IMAGE: $CI_REGISTRY_IMAGE:cache
# Cache npm dependencies
cache:
key: ${CI_COMMIT_REF_SLUG}
paths:
- .npm/
- node_modules/
# Template for Docker jobs
.docker_template: &docker_template
image: docker:24
services:
- docker:24-dind
before_script:
- echo $CI_REGISTRY_PASSWORD | docker login -u $CI_REGISTRY_USER --password-stdin $CI_REGISTRY
# Lint and unit tests
test:unit:
stage: test
image: node:18-alpine
script:
- npm ci --cache .npm --prefer-offline
- npm run lint
- npm run test:unit -- --coverage
coverage: '/Statements\\s*:\\s*(\\d+\\.\\d+)%/'
artifacts:
reports:
coverage_report:
coverage_format: cobertura
path: coverage/cobertura-coverage.xml
paths:
- coverage/
expire_in: 30 days
# Integration tests
test:integration:
stage: test
image: docker/compose:latest
services:
- docker:24-dind
script:
- docker-compose -f docker-compose.test.yml up -d
- docker-compose -f docker-compose.test.yml run api npm run test:integration
- docker-compose -f docker-compose.test.yml down
only:
- merge_requests
- main
# Build Docker image
build:
<<: *docker_template
stage: build
script:
# Pull cache image
- docker pull $CACHE_IMAGE || true
# Build with cache
- >
docker build
--cache-from $CACHE_IMAGE
--build-arg BUILDKIT_INLINE_CACHE=1
--tag $IMAGE_TAG
--tag $CACHE_IMAGE
.
# Push both tags
- docker push $IMAGE_TAG
- docker push $CACHE_IMAGE
only:
- main
- develop
- tags
# Deploy to staging
deploy:staging:
stage: deploy
image: alpine:latest
before_script:
- apk add --no-cache curl
script:
- |
curl -X POST https://portainer.example.com/api/webhooks/$STAGING_WEBHOOK \\
-H "Content-Type: application/json" \\
-d '{"image": "'$IMAGE_TAG'"}'
environment:
name: staging
url: https://staging.example.com
only:
- develop
# Deploy to production
deploy:production:
stage: deploy
image: alpine:latest
before_script:
- apk add --no-cache curl
script:
- |
curl -X POST https://portainer.example.com/api/webhooks/$PRODUCTION_WEBHOOK \\
-H "Content-Type: application/json" \\
-d '{"image": "'$IMAGE_TAG'"}'
environment:
name: production
url: https://example.com
when: manual
only:
- main
- tags
Key optimizations
1. Dependency caching
cache:
key: ${CI_COMMIT_REF_SLUG}
paths:
- .npm/
- node_modules/
Savings: 2-3 minutes per pipeline
2. Docker layer caching
script:
- docker pull $CACHE_IMAGE || true
- docker build --cache-from $CACHE_IMAGE ...
Savings: 5-8 minutes in builds
3. Parallel jobs
GitLab executes jobs from the same stage in parallel automatically. Separate unit and integration tests:
test:unit:
stage: test
# ...
test:integration:
stage: test
# ...
Savings: 50% of testing time
Deployment strategies: Blue-Green and Canary for zero-downtime
Deployment strategies
1. Blue-Green Deployment
# docker-compose.blue-green.yml
services:
# Blue (current production)
app-blue:
image: myapp:v1.0
labels:
- "traefik.enable=true"
- "traefik.http.routers.app-blue.rule=Host(`example.com`)"
- "traefik.http.routers.app-blue.priority=1"
# Green (new version)
app-green:
image: myapp:v2.0
labels:
- "traefik.enable=false" # Start disabled
# Load balancer
traefik:
image: traefik:v2.10
command:
- "--providers.docker=true"
ports:
- "80:80"
- "443:443"
volumes:
- /var/run/docker.sock:/var/run/docker.sock
Switch script:
#!/bin/bash
# Deploy green
docker-compose up -d app-green
# Wait for health check
until docker-compose exec app-green curl -f http://localhost:3000/health; do
sleep 2
done
# Switch traffic to green
docker-compose exec traefik \\
curl -X PUT http://localhost:8080/api/providers/docker \\
-d '{"labels": {"traefik.enable": "true"}}'
# Keep blue running for rollback
echo "Green is live. Blue is on standby."
2. Canary Deployment
# docker-compose.canary.yml
services:
app-stable:
image: myapp:stable
deploy:
replicas: 9
labels:
- "traefik.http.services.app.loadbalancer.weight=90"
app-canary:
image: myapp:canary
deploy:
replicas: 1
labels:
- "traefik.http.services.app.loadbalancer.weight=10"
traefik:
image: traefik:v2.10
command:
- "--providers.docker.swarmMode=true"
ports:
- "80:80"
10% of traffic goes to canary. If metrics are good, gradually increase.
Monitoring and metrics
Docker stats with Prometheus
# docker-compose.monitoring.yml
services:
prometheus:
image: prom/prometheus
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
- prometheus_data:/prometheus
ports:
- "9090:9090"
grafana:
image: grafana/grafana
volumes:
- grafana_data:/var/lib/grafana
ports:
- "3000:3000"
environment:
- GF_SECURITY_ADMIN_PASSWORD=admin
cadvisor:
image: gcr.io/cadvisor/cadvisor
volumes:
- /:/rootfs:ro
- /var/run:/var/run:ro
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
ports:
- "8080:8080"
volumes:
prometheus_data:
grafana_data:
Prometheus configuration
# prometheus.yml
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'docker'
static_configs:
- targets: ['cadvisor:8080']
- job_name: 'app'
static_configs:
- targets: ['app:3000']
Security best practices
1. Vulnerability scanning
# In .gitlab-ci.yml
security:scan:
stage: test
image: aquasec/trivy:latest
script:
- trivy image --severity HIGH,CRITICAL $IMAGE_TAG
allow_failure: false
2. Multi-stage with non-root user
# Create user in final stage
RUN addgroup -g 1001 -S nodejs
RUN adduser -S nodejs -u 1001
# Change ownership
RUN chown -R nodejs:nodejs /app
# Switch to non-root user
USER nodejs
3. Secrets management
# docker-compose.yml with secrets
services:
app:
image: myapp:latest
secrets:
- db_password
- api_key
secrets:
db_password:
file: ./secrets/db_password.txt
api_key:
file: ./secrets/api_key.txt
// In the application
const dbPassword = fs.readFileSync('/run/secrets/db_password', 'utf8')
Real results
After implementing these optimizations in a project with 50+ developers:
Performance
- Build time: 15min → 5min (-67%)
- Deploy time: 10min → 5min (-50%)
- Image size: 1.2GB → 180MB (-85%)
- CI/CD cost: -40% (fewer runner minutes)
Team
- Developer satisfaction: +35%
- Deployment frequency: 2x/week → 10x/day
- Mean time to recovery: 2h → 15min
- Failed deployment rate: 15% → 3%
Common mistakes
❌ Not using .dockerignore
# .dockerignore
node_modules
npm-debug.log
.git
.env
*.md
coverage
.vscode
Impact: Builds 3-5x faster
❌ Installing development dependencies in production
# ❌ BAD
RUN npm install
# ✅ GOOD
RUN npm ci --only=production
❌ Not leveraging layer caching
# ❌ BAD: Invalidates cache if any file changes
COPY . .
RUN npm install
# ✅ GOOD: npm cache persists if package.json doesn't change
COPY package*.json ./
RUN npm ci
COPY . .
Conclusion
Optimizing Docker and CI/CD is not a luxury, it's a necessity in modern teams. The benefits go beyond time saved: they improve team morale, reduce costs, and enable more frequent and secure deployments.
Key takeaways:
- Use multi-stage builds for small images
- Leverage layer caching strategically
- Implement CI/CD with caching and parallelization
- Use advanced deployment strategies (blue-green, canary)
- Monitor and measure everything
- Never compromise security for speed
Additional resources
Have you optimized your infrastructure recently? What results did you get? Share your experience on LinkedIn.