Automated GitLab CE Backups to Google Drive: A Complete Guide
In today's world of continuous development and deployment, having a reliable backup strategy for your GitLab instance is crucial. This guide will walk you through creating an automated backup system that stores your GitLab CE backups securely on Google Drive, providing an off-site backup solution that's both reliable and cost-effective.
Understanding the Components
Before diving into the implementation, let's understand the key components of our backup solution:
- GitLab's built-in backup tool: GitLab provides
gitlab-backup
command that creates comprehensive backups of your entire instance, including repositories, issues, and metadata. - rclone: A powerful command-line tool that handles cloud storage synchronization. We'll use it to securely transfer our backups to Google Drive.
- Automation script: A bash script that orchestrates the entire backup process, handles errors, and manages retention policies.
Setting Up Google Drive Access
The first crucial step is setting up secure access to Google Drive. This involves creating appropriate credentials in the Google Cloud Console and configuring rclone to use them.
Creating Google Cloud Credentials
- Visit the Google Cloud Console (console.cloud.google.com)
- Create a new project or select an existing one
- Enable the Google Drive API for your project
- Configure the OAuth consent screen:
- Choose "External" user type
- Fill in the required application details
- Add your email as a test user
- Create OAuth credentials:
- Select "Create Credentials" > "OAuth client ID"
- Choose "Desktop app" as the application type (this is crucial!)
- Note down the client ID and client secret
The "Desktop app" application type is particularly important because it automatically configures the correct redirect URIs that rclone expects for its OAuth flow. This prevents the common redirect_uri_mismatch
error that occurs when using other application types.
Configuring rclone
With your Google Cloud credentials in hand, configure rclone using the following steps:
rclone config create gdrive drive
During the configuration:
- Enter your client ID and client secret
- Choose full drive access scope
- Leave root_folder_id and service_account_file empty
- Complete the OAuth authorization process by opening the provided URL in your browser
For automated scripts, it's recommended to configure rclone without encryption password. While encryption adds an extra security layer, it complicates automation and may be redundant given that:
- OAuth tokens already provide secure authentication
- Server-level security (SSH keys, firewall) protects access to the configuration
- File system permissions restrict access to the rclone configuration file
Creating the Backup Script
Our backup solution uses a comprehensive bash script that handles:
- Creating GitLab backups
- Backing up configuration files
- Uploading to Google Drive
- Managing backup retention
- Error handling and notifications
- Logging all operations
The script organizes backups on Google Drive by date, making it easy to locate and manage specific backups. It also implements a retention policy to automatically clean up old backups both locally and on Google Drive.
#!/bin/bash
# Configuration parameters with detailed comments
BACKUP_DIR="/var/opt/gitlab/backups"
RETENTION_DAYS=7
NOTIFICATION_EMAIL="your-email@domain.com"
GDRIVE_REMOTE_NAME="gdrive"
GDRIVE_BACKUP_DIR="gitlab-backups"
# Performance tuning for large files
MAX_TRANSFERS=4 # Number of concurrent file transfers
MAX_CHECKERS=8 # Number of concurrent check operations
BANDWIDTH_LIMIT="40M" # Bandwidth limit (40MB/s is a good balance)
MIN_FREE_SPACE_GB=10 # Minimum required free space in GB
# Create logging directory and set up log file with timestamp
mkdir -p /var/log/gitlab-backup
LOG_FILE="/var/log/gitlab-backup/backup-$(date +%Y%m%d_%H%M%S).log"
# Enhanced notification function with support for different severity levels
send_notification() {
local subject="$1"
local message="$2"
# Only attempt to send mail if mailutils is installed
if command -v mail &> /dev/null; then
echo "$message" | mail -s "$subject" "$NOTIFICATION_EMAIL"
fi
# Always log notifications
log_message "NOTIFICATION - $subject: $message"
}
# Enhanced logging function with timestamp and log levels
log_message() {
local message="$1"
local timestamp=$(date '+%Y-%m-%d %H:%M:%S')
echo "$timestamp - $message" | tee -a "$LOG_FILE"
}
# Function to check system requirements
check_requirements() {
# Check if rclone is installed
if ! command -v rclone &> /dev/null; then
log_message "ERROR: rclone is not installed"
send_notification "GitLab Backup - Error" "rclone is not installed on the server"
exit 1
fi
# Check available disk space
local free_space
free_space=$(df -BG "$BACKUP_DIR" | awk 'NR==2 {gsub("G",""); print $4}')
if [ "${free_space%.*}" -lt "$MIN_FREE_SPACE_GB" ]; then
log_message "ERROR: Insufficient disk space (less than ${MIN_FREE_SPACE_GB}GB available)"
send_notification "GitLab Backup - Error" "Insufficient disk space for backup"
exit 1
fi
}
# Function to find the most recent backup file
get_latest_backup_file() {
find "$BACKUP_DIR" -maxdepth 1 -type f -name "*_gitlab_backup.tar" -printf '%T@ %p\n' | \
sort -nr | \
head -1 | \
cut -d' ' -f2-
}
# Function to handle the backup process
perform_backup() {
log_message "Starting GitLab backup creation..."
if ! gitlab-backup create STRATEGY=copy 2>>"$LOG_FILE"; then
log_message "ERROR: Failed to create GitLab backup"
send_notification "GitLab Backup - Error" "Backup creation failed"
exit 1
fi
# Allow system to finish writing backup file
sleep 10
# Find the newly created backup file
local backup_file
backup_file=$(get_latest_backup_file)
if [ -z "$backup_file" ]; then
log_message "ERROR: No backup file found after backup creation"
send_notification "GitLab Backup - Error" "Backup file not found"
exit 1
fi
local backup_size
backup_size=$(du -h "$backup_file" | cut -f1)
log_message "Backup created successfully: $backup_file (Size: $backup_size)"
return 0
}
# Function to prepare backup files for upload
prepare_upload() {
local timestamp=$1
local backup_file=$2
local temp_dir="$BACKUP_DIR/temp_$timestamp"
# Create temporary directory for staging files
mkdir -p "$temp_dir"
# Copy configuration files
cp /etc/gitlab/gitlab.rb "$temp_dir/gitlab.rb.$timestamp"
cp /etc/gitlab/gitlab-secrets.json "$temp_dir/gitlab-secrets.json.$timestamp"
# Copy the actual backup file
cp "$backup_file" "$temp_dir/"
# Set appropriate permissions
chmod 600 "$temp_dir"/*
chown -R git:git "$temp_dir"
echo "$temp_dir"
}
# Function to upload to Google Drive with progress monitoring
upload_to_gdrive() {
local source_dir=$1
local timestamp=$2
log_message "Starting upload to Google Drive..."
local total_size
total_size=$(du -sh "$source_dir" | cut -f1)
log_message "Total upload size: $total_size"
if rclone sync "$source_dir" "$GDRIVE_REMOTE_NAME:$GDRIVE_BACKUP_DIR/$timestamp" \
--transfers "$MAX_TRANSFERS" \
--checkers "$MAX_CHECKERS" \
--bwlimit "$BANDWIDTH_LIMIT" \
--progress 2>>"$LOG_FILE"; then
log_message "Upload completed successfully"
return 0
else
log_message "ERROR: Upload to Google Drive failed"
return 1
fi
}
# Function to clean up old backups
cleanup_old_backups() {
local timestamp=$1
# Clean local backups
log_message "Cleaning up old local backups..."
find "$BACKUP_DIR" -name "*_gitlab_backup.tar" -mtime +"$RETENTION_DAYS" -delete
find "$BACKUP_DIR" -name "gitlab.rb.*" -mtime +"$RETENTION_DAYS" -delete
find "$BACKUP_DIR" -name "gitlab-secrets.json.*" -mtime +"$RETENTION_DAYS" -delete
# Clean Google Drive backups
log_message "Cleaning up old Google Drive backups..."
rclone delete --min-age "${RETENTION_DAYS}d" "$GDRIVE_REMOTE_NAME:$GDRIVE_BACKUP_DIR" 2>>"$LOG_FILE"
# Clean old logs
find /var/log/gitlab-backup -name "backup-*.log" -mtime +"$RETENTION_DAYS" -delete
}
# Main execution flow
main() {
log_message "Starting backup process..."
# Check system requirements
check_requirements
# Create the backup
perform_backup
# Get the latest backup file
BACKUP_FILE=$(get_latest_backup_file)
TIMESTAMP=$(date +%Y%m%d)
# Prepare files for upload
TEMP_DIR=$(prepare_upload "$TIMESTAMP" "$BACKUP_FILE")
# Upload to Google Drive
if upload_to_gdrive "$TEMP_DIR" "$TIMESTAMP"; then
cleanup_old_backups "$TIMESTAMP"
rm -rf "$TEMP_DIR"
send_notification "GitLab Backup - Success" "Backup completed and uploaded successfully"
else
rm -rf "$TEMP_DIR"
send_notification "GitLab Backup - Error" "Backup upload failed"
exit 1
fi
}
# Execute main function
main
Security Considerations
The backup system's security is built on multiple layers:
- Google Drive Authentication: OAuth 2.0 provides secure access to Google Drive without storing permanent credentials.
- File System Security: The rclone configuration and backup files are protected by Unix file permissions.
- Server Security: The overall server security (firewall rules, SSH configuration, system updates) forms the foundation of the backup system's security.
Testing and Verification
Before implementing the automated backup system, it's crucial to verify each component:
1. Test rclone configuration:
rclone about gdrive:
2. Create a test backup directory:
rclone mkdir gdrive:gitlab-backups
3. Perform a test backup upload:
echo "Test backup file" > test_backup.txt
rclone copy test_backup.txt gdrive:gitlab-backups/test/
Exploring Alternative Backup Destinations
While our guide focuses on Google Drive as the backup destination, rclone's versatility allows you to adapt this solution for virtually any cloud storage provider. This flexibility is particularly valuable for organizations with specific compliance requirements or existing cloud infrastructure preferences.
Supporting Cloud Providers
Rclone supports an impressive array of storage providers and protocols, including:
- Amazon S3 and S3-compatible storage (MinIO, Wasabi, DigitalOcean Spaces)
- Microsoft Azure Blob Storage
- Backblaze B2
- OpenStack Swift
- FTP/SFTP servers
- WebDAV
- And many others
This broad compatibility means you can easily modify our backup solution to work with your preferred storage provider without changing the core backup logic.
Setting Up S3 Backup Alternative
Amazon S3 and S3-compatible storage services are particularly popular for backup solutions due to their reliability and cost-effectiveness. Here's how to adapt our solution for S3:
- Configure rclone for S3:
rclone config create s3backup s3
During configuration, you'll need to provide:
- Access key ID
- Secret access key
- Region
- Bucket name
- Storage class (e.g., STANDARD, STANDARD_IA, or GLACIER)
2. Modify the backup script by changing the remote name and path:
# Instead of
GDRIVE_REMOTE_NAME="gdrive"
# Use
S3_REMOTE_NAME="s3backup"
S3_BUCKET_PATH="gitlab-backups"
# And update the sync command
rclone sync "$TEMP_BACKUP_DIR" "$S3_REMOTE_NAME:$S3_BUCKET_PATH/$TIMESTAMP"
Cost Optimization Strategies
Different storage providers offer various storage tiers and pricing models. Here's how to optimize costs for different providers:
- Amazon S3:
- Use lifecycle policies to automatically transition older backups to cheaper storage tiers
- Consider STANDARD_IA for backups older than 30 days
- Use GLACIER for long-term archival storage
- Google Drive:
- Take advantage of workspace storage pooling
- Use shared drives for better storage management
- S3-Compatible Alternatives:
- Consider Wasabi or Backblaze B2 for potentially lower storage costs
- Use MinIO for self-hosted object storage
Automating with Cron
Once everything is tested and working, schedule the backup script using cron:
sudo crontab -e
Add a line to run the backup during off-peak hours:
0 2 * * * /usr/local/bin/gitlab-backup.sh
Monitoring and Maintenance
To ensure your backup system remains reliable:
- Regularly check backup logs for errors
- Periodically verify that backups can be restored
- Monitor Google Drive space usage
- Keep the GitLab instance and backup script updated
Conclusion
This backup solution provides a robust, automated way to secure your GitLab data. By leveraging GitLab's built-in backup functionality and combining it with rclone's cloud storage capabilities, we create a reliable off-site backup system that requires minimal maintenance while ensuring your data's safety.
The implementation balances security with automation, making thoughtful trade-offs where necessary. For instance, while rclone configuration encryption is available, we chose to prioritize reliable automation given the existing security layers provided by OAuth tokens and system-level protections.
Remember to regularly test your backup restoration process and monitor the system's operation to ensure it continues to meet your data protection needs.