Automated GitLab CE Backups to Google Drive: A Complete Guide
GitlabDevOps

Automated GitLab CE Backups to Google Drive: A Complete Guide

Ihor ChyshkalaIhor Chyshkala

In today's world of continuous development and deployment, having a reliable backup strategy for your GitLab instance is crucial. This guide will walk you through creating an automated backup system that stores your GitLab CE backups securely on Google Drive, providing an off-site backup solution that's both reliable and cost-effective.

Understanding the Components

Before diving into the implementation, let's understand the key components of our backup solution:

  1. GitLab's built-in backup tool: GitLab provides gitlab-backup command that creates comprehensive backups of your entire instance, including repositories, issues, and metadata.
  2. rclone: A powerful command-line tool that handles cloud storage synchronization. We'll use it to securely transfer our backups to Google Drive.
  3. Automation script: A bash script that orchestrates the entire backup process, handles errors, and manages retention policies.

Setting Up Google Drive Access

The first crucial step is setting up secure access to Google Drive. This involves creating appropriate credentials in the Google Cloud Console and configuring rclone to use them.

Creating Google Cloud Credentials

  1. Visit the Google Cloud Console (console.cloud.google.com)
  2. Create a new project or select an existing one
  3. Enable the Google Drive API for your project
  4. Configure the OAuth consent screen:
    • Choose "External" user type
    • Fill in the required application details
    • Add your email as a test user
  5. Create OAuth credentials:
    • Select "Create Credentials" > "OAuth client ID"
    • Choose "Desktop app" as the application type (this is crucial!)
    • Note down the client ID and client secret

The "Desktop app" application type is particularly important because it automatically configures the correct redirect URIs that rclone expects for its OAuth flow. This prevents the common redirect_uri_mismatch error that occurs when using other application types.

Configuring rclone

With your Google Cloud credentials in hand, configure rclone using the following steps:

rclone config create gdrive drive

During the configuration:

  • Enter your client ID and client secret
  • Choose full drive access scope
  • Leave root_folder_id and service_account_file empty
  • Complete the OAuth authorization process by opening the provided URL in your browser

For automated scripts, it's recommended to configure rclone without encryption password. While encryption adds an extra security layer, it complicates automation and may be redundant given that:

  • OAuth tokens already provide secure authentication
  • Server-level security (SSH keys, firewall) protects access to the configuration
  • File system permissions restrict access to the rclone configuration file

Creating the Backup Script

Our backup solution uses a comprehensive bash script that handles:

  • Creating GitLab backups
  • Backing up configuration files
  • Uploading to Google Drive
  • Managing backup retention
  • Error handling and notifications
  • Logging all operations

The script organizes backups on Google Drive by date, making it easy to locate and manage specific backups. It also implements a retention policy to automatically clean up old backups both locally and on Google Drive.

#!/bin/bash

# Configuration parameters with detailed comments
BACKUP_DIR="/var/opt/gitlab/backups"
RETENTION_DAYS=7
NOTIFICATION_EMAIL="your-email@domain.com"
GDRIVE_REMOTE_NAME="gdrive"
GDRIVE_BACKUP_DIR="gitlab-backups"

# Performance tuning for large files
MAX_TRANSFERS=4           # Number of concurrent file transfers
MAX_CHECKERS=8           # Number of concurrent check operations
BANDWIDTH_LIMIT="40M"    # Bandwidth limit (40MB/s is a good balance)
MIN_FREE_SPACE_GB=10     # Minimum required free space in GB

# Create logging directory and set up log file with timestamp
mkdir -p /var/log/gitlab-backup
LOG_FILE="/var/log/gitlab-backup/backup-$(date +%Y%m%d_%H%M%S).log"

# Enhanced notification function with support for different severity levels
send_notification() {
    local subject="$1"
    local message="$2"
    
    # Only attempt to send mail if mailutils is installed
    if command -v mail &> /dev/null; then
        echo "$message" | mail -s "$subject" "$NOTIFICATION_EMAIL"
    fi
    
    # Always log notifications
    log_message "NOTIFICATION - $subject: $message"
}

# Enhanced logging function with timestamp and log levels
log_message() {
    local message="$1"
    local timestamp=$(date '+%Y-%m-%d %H:%M:%S')
    echo "$timestamp - $message" | tee -a "$LOG_FILE"
}

# Function to check system requirements
check_requirements() {
    # Check if rclone is installed
    if ! command -v rclone &> /dev/null; then
        log_message "ERROR: rclone is not installed"
        send_notification "GitLab Backup - Error" "rclone is not installed on the server"
        exit 1
    fi
    
    # Check available disk space
    local free_space
    free_space=$(df -BG "$BACKUP_DIR" | awk 'NR==2 {gsub("G",""); print $4}')
    if [ "${free_space%.*}" -lt "$MIN_FREE_SPACE_GB" ]; then
        log_message "ERROR: Insufficient disk space (less than ${MIN_FREE_SPACE_GB}GB available)"
        send_notification "GitLab Backup - Error" "Insufficient disk space for backup"
        exit 1
    fi
}

# Function to find the most recent backup file
get_latest_backup_file() {
    find "$BACKUP_DIR" -maxdepth 1 -type f -name "*_gitlab_backup.tar" -printf '%T@ %p\n' | \
    sort -nr | \
    head -1 | \
    cut -d' ' -f2-
}

# Function to handle the backup process
perform_backup() {
    log_message "Starting GitLab backup creation..."
    
    if ! gitlab-backup create STRATEGY=copy 2>>"$LOG_FILE"; then
        log_message "ERROR: Failed to create GitLab backup"
        send_notification "GitLab Backup - Error" "Backup creation failed"
        exit 1
    fi
    
    # Allow system to finish writing backup file
    sleep 10
    
    # Find the newly created backup file
    local backup_file
    backup_file=$(get_latest_backup_file)
    
    if [ -z "$backup_file" ]; then
        log_message "ERROR: No backup file found after backup creation"
        send_notification "GitLab Backup - Error" "Backup file not found"
        exit 1
    fi
    
    local backup_size
    backup_size=$(du -h "$backup_file" | cut -f1)
    log_message "Backup created successfully: $backup_file (Size: $backup_size)"
    
    return 0
}

# Function to prepare backup files for upload
prepare_upload() {
    local timestamp=$1
    local backup_file=$2
    local temp_dir="$BACKUP_DIR/temp_$timestamp"
    
    # Create temporary directory for staging files
    mkdir -p "$temp_dir"
    
    # Copy configuration files
    cp /etc/gitlab/gitlab.rb "$temp_dir/gitlab.rb.$timestamp"
    cp /etc/gitlab/gitlab-secrets.json "$temp_dir/gitlab-secrets.json.$timestamp"
    
    # Copy the actual backup file
    cp "$backup_file" "$temp_dir/"
    
    # Set appropriate permissions
    chmod 600 "$temp_dir"/*
    chown -R git:git "$temp_dir"
    
    echo "$temp_dir"
}

# Function to upload to Google Drive with progress monitoring
upload_to_gdrive() {
    local source_dir=$1
    local timestamp=$2
    
    log_message "Starting upload to Google Drive..."
    local total_size
    total_size=$(du -sh "$source_dir" | cut -f1)
    log_message "Total upload size: $total_size"
    
    if rclone sync "$source_dir" "$GDRIVE_REMOTE_NAME:$GDRIVE_BACKUP_DIR/$timestamp" \
        --transfers "$MAX_TRANSFERS" \
        --checkers "$MAX_CHECKERS" \
        --bwlimit "$BANDWIDTH_LIMIT" \
        --progress 2>>"$LOG_FILE"; then
        
        log_message "Upload completed successfully"
        return 0
    else
        log_message "ERROR: Upload to Google Drive failed"
        return 1
    fi
}

# Function to clean up old backups
cleanup_old_backups() {
    local timestamp=$1
    
    # Clean local backups
    log_message "Cleaning up old local backups..."
    find "$BACKUP_DIR" -name "*_gitlab_backup.tar" -mtime +"$RETENTION_DAYS" -delete
    find "$BACKUP_DIR" -name "gitlab.rb.*" -mtime +"$RETENTION_DAYS" -delete
    find "$BACKUP_DIR" -name "gitlab-secrets.json.*" -mtime +"$RETENTION_DAYS" -delete
    
    # Clean Google Drive backups
    log_message "Cleaning up old Google Drive backups..."
    rclone delete --min-age "${RETENTION_DAYS}d" "$GDRIVE_REMOTE_NAME:$GDRIVE_BACKUP_DIR" 2>>"$LOG_FILE"
    
    # Clean old logs
    find /var/log/gitlab-backup -name "backup-*.log" -mtime +"$RETENTION_DAYS" -delete
}

# Main execution flow
main() {
    log_message "Starting backup process..."
    
    # Check system requirements
    check_requirements
    
    # Create the backup
    perform_backup
    
    # Get the latest backup file
    BACKUP_FILE=$(get_latest_backup_file)
    TIMESTAMP=$(date +%Y%m%d)
    
    # Prepare files for upload
    TEMP_DIR=$(prepare_upload "$TIMESTAMP" "$BACKUP_FILE")
    
    # Upload to Google Drive
    if upload_to_gdrive "$TEMP_DIR" "$TIMESTAMP"; then
        cleanup_old_backups "$TIMESTAMP"
        rm -rf "$TEMP_DIR"
        send_notification "GitLab Backup - Success" "Backup completed and uploaded successfully"
    else
        rm -rf "$TEMP_DIR"
        send_notification "GitLab Backup - Error" "Backup upload failed"
        exit 1
    fi
}

# Execute main function
main

Security Considerations

The backup system's security is built on multiple layers:

  1. Google Drive Authentication: OAuth 2.0 provides secure access to Google Drive without storing permanent credentials.
  2. File System Security: The rclone configuration and backup files are protected by Unix file permissions.
  3. Server Security: The overall server security (firewall rules, SSH configuration, system updates) forms the foundation of the backup system's security.

Testing and Verification

Before implementing the automated backup system, it's crucial to verify each component:

1. Test rclone configuration:

rclone about gdrive:

2. Create a test backup directory:

rclone mkdir gdrive:gitlab-backups

3. Perform a test backup upload:

echo "Test backup file" > test_backup.txt
rclone copy test_backup.txt gdrive:gitlab-backups/test/

Exploring Alternative Backup Destinations

While our guide focuses on Google Drive as the backup destination, rclone's versatility allows you to adapt this solution for virtually any cloud storage provider. This flexibility is particularly valuable for organizations with specific compliance requirements or existing cloud infrastructure preferences.

Supporting Cloud Providers

Rclone supports an impressive array of storage providers and protocols, including:

  • Amazon S3 and S3-compatible storage (MinIO, Wasabi, DigitalOcean Spaces)
  • Microsoft Azure Blob Storage
  • Backblaze B2
  • OpenStack Swift
  • FTP/SFTP servers
  • WebDAV
  • And many others

This broad compatibility means you can easily modify our backup solution to work with your preferred storage provider without changing the core backup logic.

Setting Up S3 Backup Alternative

Amazon S3 and S3-compatible storage services are particularly popular for backup solutions due to their reliability and cost-effectiveness. Here's how to adapt our solution for S3:

  1. Configure rclone for S3:
rclone config create s3backup s3

During configuration, you'll need to provide:

  • Access key ID
  • Secret access key
  • Region
  • Bucket name
  • Storage class (e.g., STANDARD, STANDARD_IA, or GLACIER)

2. Modify the backup script by changing the remote name and path:

# Instead of
GDRIVE_REMOTE_NAME="gdrive"
# Use
S3_REMOTE_NAME="s3backup"
S3_BUCKET_PATH="gitlab-backups"

# And update the sync command
rclone sync "$TEMP_BACKUP_DIR" "$S3_REMOTE_NAME:$S3_BUCKET_PATH/$TIMESTAMP"

Cost Optimization Strategies

Different storage providers offer various storage tiers and pricing models. Here's how to optimize costs for different providers:

  1. Amazon S3:
    • Use lifecycle policies to automatically transition older backups to cheaper storage tiers
    • Consider STANDARD_IA for backups older than 30 days
    • Use GLACIER for long-term archival storage
  2. Google Drive:
    • Take advantage of workspace storage pooling
    • Use shared drives for better storage management
  3. S3-Compatible Alternatives:
    • Consider Wasabi or Backblaze B2 for potentially lower storage costs
    • Use MinIO for self-hosted object storage

Automating with Cron

Once everything is tested and working, schedule the backup script using cron:

sudo crontab -e

Add a line to run the backup during off-peak hours:

0 2 * * * /usr/local/bin/gitlab-backup.sh

Monitoring and Maintenance

To ensure your backup system remains reliable:

  1. Regularly check backup logs for errors
  2. Periodically verify that backups can be restored
  3. Monitor Google Drive space usage
  4. Keep the GitLab instance and backup script updated

Conclusion

This backup solution provides a robust, automated way to secure your GitLab data. By leveraging GitLab's built-in backup functionality and combining it with rclone's cloud storage capabilities, we create a reliable off-site backup system that requires minimal maintenance while ensuring your data's safety.

The implementation balances security with automation, making thoughtful trade-offs where necessary. For instance, while rclone configuration encryption is available, we chose to prioritize reliable automation given the existing security layers provided by OAuth tokens and system-level protections.

Remember to regularly test your backup restoration process and monitor the system's operation to ensure it continues to meet your data protection needs.

About the Author

Ihor Chyshkala

Ihor Chyshkala

Code Alchemist: Transmuting Ideas into Reality with JS & PHP. DevOps Wizard: Transforming Infrastructure into Cloud Gold | Orchestrating CI/CD Magic | Crafting Automation Elixirs