DevOps

Docker and CI/CD: Infrastructure Optimization

Advanced Docker strategies and CI/CD pipelines with GitLab.

FC

Fernando Caravaca

FullStack Developer

September 10, 2024
16 min read
CI/CD pipeline with Docker

Docker and CI/CD: Optimizing Infrastructure with GitLab

Docker Multi-stage Multi-stage Docker builds: Reducing image size up to 10x

Introduction

Infrastructure optimization is not optional in modern development teams. In a recent project, we reduced build time by 60%, deploy time by 50%, and improved team satisfaction by 35% by applying advanced Docker and CI/CD strategies.

Why does it matter? In my experience, teams with slow pipelines lose momentum. A developer who waits 15 minutes to see if their PR passes tests loses context and productivity. With the techniques I'll share, we took our pipelines from 15 minutes to 5 minutes.

Multi-stage Docker builds

Problem: Huge images

# ❌ BAD: 1.5GB image
FROM node:18

WORKDIR /app
COPY package*.json ./
RUN npm install
COPY . .
RUN npm run build

EXPOSE 3000
CMD ["npm", "start"]

This image includes:

  • Development dependencies
  • Uncompiled source code
  • npm cache
  • Result: 1.5GB

Solution: Multi-stage build

# ✅ GOOD: 150MB image (10x smaller)

# Stage 1: Build
FROM node:18-alpine AS builder

WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production --ignore-scripts
COPY . .
RUN npm run build

# Stage 2: Production
FROM node:18-alpine AS production

WORKDIR /app
ENV NODE_ENV=production

# Copy only production dependencies
COPY --from=builder /app/package*.json ./
RUN npm ci --only=production --ignore-scripts

# Copy built application
COPY --from=builder /app/dist ./dist

# Create non-root user
RUN addgroup -g 1001 -S nodejs
RUN adduser -S nodejs -u 1001
USER nodejs

EXPOSE 3000
CMD ["node", "dist/main.js"]

Results:

  • Size: 150MB (vs 1.5GB)
  • Push/pull time: 90% faster
  • Attack surface: Drastically reduced
  • Contains only what's needed for production

Advanced example: Fullstack application

# Multi-stage build for React + Node.js app

# Stage 1: Build frontend
FROM node:18-alpine AS frontend-builder
WORKDIR /app/frontend
COPY frontend/package*.json ./
RUN npm ci
COPY frontend/ ./
RUN npm run build

# Stage 2: Build backend
FROM node:18-alpine AS backend-builder
WORKDIR /app/backend
COPY backend/package*.json ./
RUN npm ci
COPY backend/ ./
RUN npm run build

# Stage 3: Production
FROM node:18-alpine AS production
WORKDIR /app

# Copy backend
COPY --from=backend-builder /app/backend/dist ./dist
COPY --from=backend-builder /app/backend/package*.json ./
RUN npm ci --only=production

# Copy frontend build to serve static files
COPY --from=frontend-builder /app/frontend/build ./public

# Security
RUN addgroup -g 1001 -S nodejs && adduser -S nodejs -u 1001
USER nodejs

EXPOSE 3000
CMD ["node", "dist/server.js"]

Layer caching strategies

Leverage layer ordering

# ✅ GOOD: Dependencies are cached if package.json doesn't change
FROM node:18-alpine

WORKDIR /app

# 1. Copy only package files (change infrequently)
COPY package*.json ./
RUN npm ci

# 2. Copy source code (changes frequently)
COPY . .
RUN npm run build

# Dependencies are cached if package.json hasn't changed

BuildKit and cache mount

# syntax=docker/dockerfile:1

FROM node:18-alpine

WORKDIR /app

# Use BuildKit cache mount for npm cache
RUN --mount=type=cache,target=/root/.npm \\
    npm install -g npm@latest

COPY package*.json ./

RUN --mount=type=cache,target=/root/.npm \\
    npm ci

COPY . .
RUN npm run build

To use:

DOCKER_BUILDKIT=1 docker build .

Docker Compose for development

Optimized configuration

# docker-compose.yml
version: '3.8'

services:
  # Backend API
  api:
    build:
      context: ./backend
      dockerfile: Dockerfile.dev
    ports:
      - "3000:3000"
    volumes:
      # Hot reload
      - ./backend/src:/app/src:ro
      - /app/node_modules
    environment:
      - NODE_ENV=development
      - DATABASE_URL=postgresql://postgres:postgres@db:5432/myapp
      - REDIS_URL=redis://redis:6379
    depends_on:
      db:
        condition: service_healthy
      redis:
        condition: service_started
    networks:
      - app-network

  # Frontend
  frontend:
    build:
      context: ./frontend
      dockerfile: Dockerfile.dev
    ports:
      - "5173:5173"
    volumes:
      - ./frontend/src:/app/src:ro
      - /app/node_modules
    environment:
      - VITE_API_URL=http://localhost:3000
    networks:
      - app-network

  # Database
  db:
    image: postgres:15-alpine
    ports:
      - "5432:5432"
    environment:
      - POSTGRES_USER=postgres
      - POSTGRES_PASSWORD=postgres
      - POSTGRES_DB=myapp
    volumes:
      - postgres_data:/var/lib/postgresql/data
      - ./init.sql:/docker-entrypoint-initdb.d/init.sql
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 5s
      timeout: 5s
      retries: 5
    networks:
      - app-network

  # Redis
  redis:
    image: redis:7-alpine
    ports:
      - "6379:6379"
    volumes:
      - redis_data:/data
    networks:
      - app-network

volumes:
  postgres_data:
  redis_data:

networks:
  app-network:
    driver: bridge

Dockerfile for development with hot reload

# Dockerfile.dev
FROM node:18-alpine

WORKDIR /app

# Install dependencies
COPY package*.json ./
RUN npm install

# Copy source
COPY . .

# Expose port
EXPOSE 3000

# Development command with hot reload
CMD ["npm", "run", "dev"]

GitLab Pipeline Optimized GitLab CI/CD pipeline: From 15 minutes to 5 minutes

Optimized GitLab CI/CD Pipeline

Complete .gitlab-ci.yml configuration

# .gitlab-ci.yml

stages:
  - test
  - build
  - deploy

variables:
  DOCKER_DRIVER: overlay2
  DOCKER_TLS_CERTDIR: "/certs"
  IMAGE_TAG: $CI_REGISTRY_IMAGE:$CI_COMMIT_SHORT_SHA
  CACHE_IMAGE: $CI_REGISTRY_IMAGE:cache

# Cache npm dependencies
cache:
  key: ${CI_COMMIT_REF_SLUG}
  paths:
    - .npm/
    - node_modules/

# Template for Docker jobs
.docker_template: &docker_template
  image: docker:24
  services:
    - docker:24-dind
  before_script:
    - echo $CI_REGISTRY_PASSWORD | docker login -u $CI_REGISTRY_USER --password-stdin $CI_REGISTRY

# Lint and unit tests
test:unit:
  stage: test
  image: node:18-alpine
  script:
    - npm ci --cache .npm --prefer-offline
    - npm run lint
    - npm run test:unit -- --coverage
  coverage: '/Statements\\s*:\\s*(\\d+\\.\\d+)%/'
  artifacts:
    reports:
      coverage_report:
        coverage_format: cobertura
        path: coverage/cobertura-coverage.xml
    paths:
      - coverage/
    expire_in: 30 days

# Integration tests
test:integration:
  stage: test
  image: docker/compose:latest
  services:
    - docker:24-dind
  script:
    - docker-compose -f docker-compose.test.yml up -d
    - docker-compose -f docker-compose.test.yml run api npm run test:integration
    - docker-compose -f docker-compose.test.yml down
  only:
    - merge_requests
    - main

# Build Docker image
build:
  <<: *docker_template
  stage: build
  script:
    # Pull cache image
    - docker pull $CACHE_IMAGE || true

    # Build with cache
    - >
      docker build
      --cache-from $CACHE_IMAGE
      --build-arg BUILDKIT_INLINE_CACHE=1
      --tag $IMAGE_TAG
      --tag $CACHE_IMAGE
      .

    # Push both tags
    - docker push $IMAGE_TAG
    - docker push $CACHE_IMAGE
  only:
    - main
    - develop
    - tags

# Deploy to staging
deploy:staging:
  stage: deploy
  image: alpine:latest
  before_script:
    - apk add --no-cache curl
  script:
    - |
      curl -X POST https://portainer.example.com/api/webhooks/$STAGING_WEBHOOK \\
        -H "Content-Type: application/json" \\
        -d '{"image": "'$IMAGE_TAG'"}'
  environment:
    name: staging
    url: https://staging.example.com
  only:
    - develop

# Deploy to production
deploy:production:
  stage: deploy
  image: alpine:latest
  before_script:
    - apk add --no-cache curl
  script:
    - |
      curl -X POST https://portainer.example.com/api/webhooks/$PRODUCTION_WEBHOOK \\
        -H "Content-Type: application/json" \\
        -d '{"image": "'$IMAGE_TAG'"}'
  environment:
    name: production
    url: https://example.com
  when: manual
  only:
    - main
    - tags

Key optimizations

1. Dependency caching

cache:
  key: ${CI_COMMIT_REF_SLUG}
  paths:
    - .npm/
    - node_modules/

Savings: 2-3 minutes per pipeline

2. Docker layer caching

script:
  - docker pull $CACHE_IMAGE || true
  - docker build --cache-from $CACHE_IMAGE ...

Savings: 5-8 minutes in builds

3. Parallel jobs

GitLab executes jobs from the same stage in parallel automatically. Separate unit and integration tests:

test:unit:
  stage: test
  # ...

test:integration:
  stage: test
  # ...

Savings: 50% of testing time

Deployment Strategies Deployment strategies: Blue-Green and Canary for zero-downtime

Deployment strategies

1. Blue-Green Deployment

# docker-compose.blue-green.yml

services:
  # Blue (current production)
  app-blue:
    image: myapp:v1.0
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.app-blue.rule=Host(`example.com`)"
      - "traefik.http.routers.app-blue.priority=1"

  # Green (new version)
  app-green:
    image: myapp:v2.0
    labels:
      - "traefik.enable=false"  # Start disabled

  # Load balancer
  traefik:
    image: traefik:v2.10
    command:
      - "--providers.docker=true"
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock

Switch script:

#!/bin/bash

# Deploy green
docker-compose up -d app-green

# Wait for health check
until docker-compose exec app-green curl -f http://localhost:3000/health; do
  sleep 2
done

# Switch traffic to green
docker-compose exec traefik \\
  curl -X PUT http://localhost:8080/api/providers/docker \\
  -d '{"labels": {"traefik.enable": "true"}}'

# Keep blue running for rollback
echo "Green is live. Blue is on standby."

2. Canary Deployment

# docker-compose.canary.yml

services:
  app-stable:
    image: myapp:stable
    deploy:
      replicas: 9
      labels:
        - "traefik.http.services.app.loadbalancer.weight=90"

  app-canary:
    image: myapp:canary
    deploy:
      replicas: 1
      labels:
        - "traefik.http.services.app.loadbalancer.weight=10"

  traefik:
    image: traefik:v2.10
    command:
      - "--providers.docker.swarmMode=true"
    ports:
      - "80:80"

10% of traffic goes to canary. If metrics are good, gradually increase.

Monitoring and metrics

Docker stats with Prometheus

# docker-compose.monitoring.yml

services:
  prometheus:
    image: prom/prometheus
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus_data:/prometheus
    ports:
      - "9090:9090"

  grafana:
    image: grafana/grafana
    volumes:
      - grafana_data:/var/lib/grafana
    ports:
      - "3000:3000"
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin

  cadvisor:
    image: gcr.io/cadvisor/cadvisor
    volumes:
      - /:/rootfs:ro
      - /var/run:/var/run:ro
      - /sys:/sys:ro
      - /var/lib/docker/:/var/lib/docker:ro
    ports:
      - "8080:8080"

volumes:
  prometheus_data:
  grafana_data:

Prometheus configuration

# prometheus.yml

global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'docker'
    static_configs:
      - targets: ['cadvisor:8080']

  - job_name: 'app'
    static_configs:
      - targets: ['app:3000']

Security best practices

1. Vulnerability scanning

# In .gitlab-ci.yml
security:scan:
  stage: test
  image: aquasec/trivy:latest
  script:
    - trivy image --severity HIGH,CRITICAL $IMAGE_TAG
  allow_failure: false

2. Multi-stage with non-root user

# Create user in final stage
RUN addgroup -g 1001 -S nodejs
RUN adduser -S nodejs -u 1001

# Change ownership
RUN chown -R nodejs:nodejs /app

# Switch to non-root user
USER nodejs

3. Secrets management

# docker-compose.yml with secrets

services:
  app:
    image: myapp:latest
    secrets:
      - db_password
      - api_key

secrets:
  db_password:
    file: ./secrets/db_password.txt
  api_key:
    file: ./secrets/api_key.txt
// In the application
const dbPassword = fs.readFileSync('/run/secrets/db_password', 'utf8')

Real results

After implementing these optimizations in a project with 50+ developers:

Performance

  • Build time: 15min → 5min (-67%)
  • Deploy time: 10min → 5min (-50%)
  • Image size: 1.2GB → 180MB (-85%)
  • CI/CD cost: -40% (fewer runner minutes)

Team

  • Developer satisfaction: +35%
  • Deployment frequency: 2x/week → 10x/day
  • Mean time to recovery: 2h → 15min
  • Failed deployment rate: 15% → 3%

Common mistakes

❌ Not using .dockerignore

# .dockerignore
node_modules
npm-debug.log
.git
.env
*.md
coverage
.vscode

Impact: Builds 3-5x faster

❌ Installing development dependencies in production

# ❌ BAD
RUN npm install

# ✅ GOOD
RUN npm ci --only=production

❌ Not leveraging layer caching

# ❌ BAD: Invalidates cache if any file changes
COPY . .
RUN npm install

# ✅ GOOD: npm cache persists if package.json doesn't change
COPY package*.json ./
RUN npm ci
COPY . .

Conclusion

Optimizing Docker and CI/CD is not a luxury, it's a necessity in modern teams. The benefits go beyond time saved: they improve team morale, reduce costs, and enable more frequent and secure deployments.

Key takeaways:

  1. Use multi-stage builds for small images
  2. Leverage layer caching strategically
  3. Implement CI/CD with caching and parallelization
  4. Use advanced deployment strategies (blue-green, canary)
  5. Monitor and measure everything
  6. Never compromise security for speed

Additional resources


Have you optimized your infrastructure recently? What results did you get? Share your experience on LinkedIn.

#Docker#CI/CD#GitLab#DevOps#Infrastructure

Did you like this article?

Share your thoughts on LinkedIn or contact me if you want to discuss these topics.

Fernando Caravaca - FullStack Developer