12 Jan 2011

Cron tip: prevent running multiple copies of the same job in HA environment

All cron jobs should prevent themselves from started in multiple copies!

This can be done using file locks (do not just create files, normal files left after script crash can prevent it from starting again and executing). File locks can also serialize jobs from many servers or even limit concurrency between machines.

In High Availability environments where are many servers, jobs must be run on certain time to ensure that jobs should be queued on all servers in cluster. But there is catch - when we configure our environment there can be situation where on every machine one copy of our script will be run on every server - that is bad. To prevent this all servers should have one shared filesystem where you can store your lockfiles and acquire lock for processes.

File locks from Bash

  • flock -xn /var/lock/run.sh.lock -c run.sh
  • flock -x /var/lock/run.sh.lock -c run.sh
  • flock -x -w 30 /var/lock/run.sh.lock -c run.sh

man flock

-x, -e, --exclusive
Obtain an exclusive lock, sometimes called a write lock. This is the default.
-n, --nb, --nonblock
Fail (with an exit code of 1) rather than wait if the lock can‐ not be immediately acquired.
-w, --wait, --timeout seconds
Fail (with an exit code of 1) if the lock cannot be acquired within seconds seconds. Decimal fractional values are allowed.

The hard way - playing with some internals (Perl)

#!/usr/bin/perl

# Author: Tomasz Gawęda
# Date:   2011-01-12

use strict;
use warnings;

use Fcntl qw(:flock);

# http://perldoc.perl.org/functions/flock.html
sub lock {
   my ($fh) = @_;
   flock($fh, LOCK_EX | LOCK_NB) or die "Cannot lock - $!\n";
}


sub unlock {
   my ($fh) = @_;
   flock($fh, LOCK_UN) or die "Cannot unlock - $!\n";
}

sub sysrun {
   my $r = join( ' ', @_ );
   system($r);
   if ( $? != 0 ) {
    die 'Command "'.$r.'" '.(($? < 0)? "not found ($? - $!)" : 'returned '.($?>>8)) ;
   }
}


# main
my ( $progName ) = ( $0 =~ m/\/?([^\/]*)$/ ) ;

die $progName." lockFile command with arguments" if ($#ARGV < 1 );
open(my $lock, ">>", $ARGV[0]) or die "Open failed $1 => $!\n";
lock($lock);
sysrun(@ARGV[1 .. $#ARGV]);
unlock($lock);

Bonus: Allow program to be run N times in parallel using ps command

Solution below is not perfect and shouldn't be used in production code.
my ( $progName ) = ( $0 =~ m/\/?([^\/]*)$/ ) ;
sub maxRuns {
   my ($maxRuns) = @_;
   my @running = grep(/perl.*?$progName/i, split(/\n/,`ps -o pid,cmd -u $<`) );
   die "Proces $progName is already running\n pid  cmd\n".join("  \n", @running)."\n" if scalar @running > $maxRuns;
}

# usage
maxRuns(2);

Additional information

No comments:

Post a Comment