28 Sep 2009

Howto execute system commands in Perl and possible danger

There are various ways to run system subproces in Perl. I will mention only 7 - few native (exec(), system, qx{}/``) and few which use additional libraries (Open("|"), IPC::Open2, IPC::Open3, IPC::Cmd) which are in fact in standard Perl distribution so they can be used without worries.

Introduction

Most people think that running system command from Perl is only done by system() or exec(), but there are many ways to achieve this task - some are better some are worse. Each of them has different performance, even specific usage of function could increase/decrease performance. This post is written only to help programmer choose right solution for task (solution secure, flexible and with best performance).

Note: I am using in this article some (quite much) text which is copied from PerlDoc - it will be in tag: <cite>.

Executing system command - possible ways

  1. exec() - PerlDoc Page
  2. system() - PerlDoc Page
  3. qx{}/`` - PerlDoc Page
  4. Open(' |') - PerlDoc Page
  5. IPC::Open2 - PerlDoc Page
  6. IPC::Open3 - PerlDoc Page
  7. IPC::Cmd - PerlDoc Page
  8. IPC::Run - PerlDoc Page - not covered in this article (it's not part of standard Perl distribution) on Unix/Linux - AFAIK
If you don't want to scroll to summary or conclusion click hyperlink.

Running sub-process via exec()

exec LIST
exec PROGRAM LIST
The "exec" function executes a system command and never returns back to executing script which had run exec!!! It fails and returns false only if the command does not exist and it is executed directly instead of via system command shell.

#!/usr/bin/perl

use strict;
use warnings;

my    @args = ( "echo", "Hello world" );

# Example 1.
exec join(" ", @args);                     # Very insecure! This will be always splited 
                                           # by \s+ and checked for shell metacharacter
                                           # and this can be ran using "sh -c".
                                           # PLEASE DON'T USE THIS!!!

# Example 2
exec @args or die "No echo";               # More secure - escape to shell is only 
                                           # done when scalar @args == 1
                                           # Better NOT to use this!

# Example 3 - The most secure!
exec { $args[0] } @args or die "No echo";  # The most secure example
                                           # It is safe even with one-arg list
                                           # I recommend using this!

# Examle 4 - Phail!
my @secTest = ( join(" ", @args) );
exec { $secTest[0] } @secTest;             # System will try to run "echo Hello world" 
                                           # program. Variable $? will be set to non zero

# This will not be reached
print "After exec";
If there is more than one argument in LIST, or if LIST is an array with more than one value, calls execvp(3) with the arguments in LIST. If there is only one scalar argument or an array with one element in it, the argument is checked for shell metacharacters, and if there are any, the entire argument is passed to the system’s command shell for parsing (this is "/bin/sh -c" on Unix platforms, but varies on other platforms). This means that if you are not using shell redirects (>&, >>, <, >, |) it is better to pass an LIST to exec() but this can make you application vulnerable to metacharacter shell attack. Using an indirect object (like this: exec {'/bin/csh'} '-sh';) with "exec" or "system" is also more secure. This usage also works fine with system() - it forces interpretation of the arguments as a multivalued list. Notes:
  • Perl will attempt to flush all files opened for output before the exec, but this may not be supported on some platforms (see perlport). To be safe, you may need to set $| ($AUTOFLUSH in English) or call the "autoflush()" method of "IO::Handle" on any open handles in order to avoid lost output.
  • Note that "exec" will NOT call your "END" blocks, nor will it call any "DESTROY" methods in your objects.
  • User should be very careful using "exec()", when after exec call there is a some code Perl (if using warnings;) will print this message:
    Statement unlikely to be reached at script.pl line XX.
    (Maybe you meant system() when you said exec()?)
    For information how to get rid of warning read perldoc -f exec
How execution of exec() will be seen by other users
  • Before execution of exec() in Perl script
      |-gnome-terminal,7000
      |   |-bash,7002
      |   |   `-perl,7588 test.pl
      |   |-bash,7378
      |   |   `-pstree,7626 -a -c -p
      |   |-gnome-pty-helpe,7001
      |   `-{gnome-terminal},7003
  • After execution of exec() in Perl script
      |-gnome-terminal,7000
      |   |-bash,7002
      |   |   `-bash,7588
      |   |-bash,7378
      |   |   `-pstree,7650 -a -c -p
      |   |-gnome-pty-helpe,7001
      |   `-{gnome-terminal},7003

Running sub-process via system()

system LIST
system PROGRAM LIST
Does exactly the same thing as "exec LIST", except that a fork is done first, and the parent process waits for the child process to complete (it is blocked for execution time of command run in system()). Note that argument processing varies depending on the number of arguments. If there is more than one argument in LIST, or if LIST is an array with more than one value, starts the program given by the first element of the list with arguments given by the rest of the list. If there is only one scalar argument, the argument is checked for shell metacharacters, and if there are any, the entire argument is passed to the system’s command shell for parsing (this is "/bin/sh -c" on Unix platforms, but varies on other platforms). If there are no shell metacharacters in the argument, it is split into words and passed directly to "execvp", which is more efficient.
  • The return value is the exit status of the program as returned by the "wait" call
  • To get the actual exit value, shift right by eight
  • Return value of -1 indicates a failure to start the program or an error of the wait(2) system call (inspect $! for the reason)
How to run external command using system()
@args = ("command", "arg1", "arg2");
system(@args) == 0 or die "system @args failed: $?"
if ($? == -1) {
 print "failed to execute: $!\n";
} elsif ($? & 127) {
 printf "child died with signal %d, %s coredump\n", ($? & 127),  ($? & 128) ? 'with' : 'without';
} else {
 printf "child exited with value %d\n", $? >> 8;
}
Notes:
  • Perl will attempt to flush all files opened for output before the exec, but this may not be supported on some platforms (see perlport). To be safe, you may need to set $| ($AUTOFLUSH in English) or call the "autoflush()" method of "IO::Handle" on any open handles in order to avoid lost output.
  • "SIGINT" and "SIGQUIT" are ignored during the execution of "system"

Running sub-process via qx{}/``

qx/STRING/
`STRING`
qx{} is a string which is (possibly) interpolated and then executed as a system command with "/bin/sh" or its equivalent. Shell wildcards, pipes, and redirections will be honored.(so be very careful). The collected standard output of the command is returned; standard error is unaffected. In scalar context, it comes back as a single (potentially multi-line) string, or undef if the command failed. In list context, returns a list of lines (however you’ve defined lines with $/ or $INPUT_RECORD_SEPARATOR), or an empty list if the command failed.
  • capture a command’s STDERR and STDOUT together: $output = `cmd 2>&1`;
  • capture only a command’s STDOUT (discard STDERR): $output = `cmd 2>/dev/null`;
  • capture only a command’s STDERR (discard STDOUT): $output = `cmd 2>&1 1>/dev/null`;
  • read both a command’s STDOUT and its STDERR separately:
    system("program args 1>program.stdout 2>program.stderr");
    open(CMD_STDOUT, '<', program.stdout) or die("...");
    open(CMD_STDERR, '<', program.stderr) or die("...");
    
    # do sth with streams - for example slurp
    
    close(CMD_STDOUT);
    close(CMD_STDERR);
  • Using single-quote as a delimiter protects the command from Perl’s double-quote interpolation, passing it on to the shell instead:
    $perl_info  = qx(ps $$);            # that's Perl's $$
    $shell_info = qx'ps $$';            # that's the new shell's $$
Notes:
  • On most platforms, you will have to protect shell metacharacters if you want them treated literally
  • On some platforms shell may not be capable of dealing with multiline commands
  • There is a way to evaluate many commands in single line (';' on Unix, '&' on Windows CMD) - this potentially can be harmful
  • Perl will attempt to flush all files opened for output before the exec, but this may not be supported on some platforms (see perlport). To be safe, you may need to set $| ($AUTOFLUSH in English) or call the "autoflush()" method of "IO::Handle" on any open handles in order to avoid lost output.
  • Beware that some command shells may place restrictions on the length of the command line (no warning)/cite>
  • Using this operator can lead to programs that are difficult to port, because the shell commands called vary between system/cite>
  • For more information please reffer to perldoc perlop

Running sub-process via open(' |')

open FILEHANDLE,EXPR
open FILEHANDLE,MODE,EXPR
open FILEHANDLE,MODE,EXPR,LIST
open FILEHANDLE,MODE,REFERENCE
open FILEHANDLE
If the filename begins with '|', it is interpreted as a command to which output is to be piped, (writing to this file descriptor will be passed to standard in of command) and if the filename ends with a '|', the filename is interpreted as a command which pipes output to us. For three or more arguments if MODE is '|-', the filename is interpreted as a command to which output is to be piped, and if MODE is '-|', the filename is interpreted as a command which pipes output to us. In the 2-arguments (and 1-argument) form one should replace dash ('-') with the command. In the three-or-more argument form of pipe opens, if LIST is specified (extra arguments after the command name) then LIST becomes arguments to the command invoked if the platform supports it.

If you open a pipe on the command '-', i.e., either '|-' or '-|' with 2-arguments (or 1-argument) form of open(), then there is an implicit fork done, and the return value of open is the pid of the child within the parent process, and 0 within the child process. (Use "defined($pid)" to determine whether the open was successful.) The filehandle behaves normally for the parent, but i/o to that filehandle is piped from/to the STDOUT/STDIN of the child process. In the child process the filehandle isn’t opened--i/o happens from/to the new STDOUT or STDIN. Typically this is used like the normal piped open when you want to exercise more control over just how the pipe command gets executed, such as when you are running setuid, and don’t want to have to scan shell commands for metacharacters. The following triples are more or less equivalent:
# Writing
open(SPOOLER, "| cat -v | lpr -h 2>/dev/null") || die "can't fork: $!";
local $SIG{PIPE} = sub { die "spooler pipe broke" };
print SPOOLER "stuff\n";
close SPOOLER || die "bad spool: $! $?";

# more writing
open(FOO, "|tr '[a-z]' '[A-Z]'");
open(FOO, '|-', "tr '[a-z]' '[A-Z]'");
open(FOO, '|-') || exec 'tr', '[a-z]', '[A-Z]';
open(FOO, '|-', "tr", '[a-z]', '[A-Z]');

# Reading
open(STATUS, "netstat -an 2>&1 |") || die "can't fork: $!";
while () { print ;}
close STATUS || die "bad netstat: $! $?";

# More reading
open(FOO, "cat -n '$file'|");
open(FOO, '-|', "cat -n '$file'");
open(FOO, '-|') || exec 'cat', '-n', $file;
open(FOO, '-|', "cat", '-n', $file);
Notes:
  • On most platforms, you will have to protect shell metacharacters if you want them treated literally. Think about this example:
    $filename =~ s/(.*\.gz)\s*$/gzip -dc < $1|/;
    open(FH, $filename) or die "Can't open $filename: $!";
  • On some platforms shell may not be capable of dealing with multiline commands
  • There is a way to evaluate many commands in single line (';' on Unix, '&' on Windows CMD) - this pottentialy can be harmful
  • Perl will attempt to flush all files opened for output before the exec, but this may not be supported on some platforms (see perlport). To be safe, you may need to set $| ($AUTOFLUSH in English) or call the "autoflush()" method of "IO::Handle" on any open handles in order to avoid lost output.
  • Beware that some command shells may place restrictions on the length of the command line (no warning)
  • Using this type of open function can lead to programs that are difficult to port, because the shell commands called vary between system
  • On systems that support a close-on-exec flag on files, the flag will be set for the newly opened file descriptor as determined by the value of $^F. See "$^F" in perlvar
  • Closing any piped filehandle causes the parent process to wait for the child to finish, and returns the status value in $? and "${^CHILD_ERROR_NATIVE}"
  • Be careful to check both the open() and the close() return values
  • If you’re writing to a pipe, you should also trap SIGPIPE, otherwise, think of what happens when you start up a pipe to a command that doesn’t exist - your program will phail!. Perl can’t know whether the command worked because your command is actually running in a separate process whose exec() might have failed. Therefore, while readers of bogus commands return just a quick end of file, writers to bogus command will trigger a signal they’d better be prepared to handle.
  • For more examples please reffer to perldoc -f open
ps: Remember that using third argument in OPEN function is more secure, due to interpretation of meta characters (AFAIR).

Running sub-process via IPC::Open2

use IPC::Open2;
Warning: The open2() and open3() functions are unlikely to work anywhere except on a Unix system or some other one purporting to be POSIX compliant.
IPC::Open2 is module which allows to open a process for both reading and writing. The open2() function runs the given $cmd and connects $chld_out for reading and $chld_in for writing. open2() is really just a wrapper around open3(), so read informations about open3() which can handle stder also. It’s what you think should work when you try $pid = open(HANDLE, "|cmd args|"); Usage of function open2 is quite easy.
use IPC::Open2;
# Using sub shell - be careful for shell extensions
$pid = open2(\*CHLD_OUT, \*CHLD_IN, 'some cmd and args');
# or without using the shell
$pid = open2(\*CHLD_OUT, \*CHLD_IN, 'some', 'cmd', 'and', 'args');

# or with handle autovivification
my($chld_out, $chld_in);

# Using sub shell - be careful for shell extensions
$pid = open2($chld_out, $chld_in, 'some cmd and args');
# or without using the shell
$pid = open2($chld_out, $chld_in, 'some', 'cmd', 'and', 'args');
Notes:
  • The write filehandle will have autoflush turned on
  • If $chld_out is a string (that is, a bareword filehandle rather than a glob or a reference) and it begins with ">&", then the child will send output directly to that file handle
  • If $chld_in is a string that begins with "<&", then $chld_in will be closed in the parent, and the child will read from it directly. In both cases, there will be a dup(2) instead of a pipe(2) made
  • If either reader or writer is the null string, this will be replaced by an auto generated filehandle. If so, you must pass a valid lvalue in the parameter slot so it can be overwritten in the caller, or an exception will be raised
  • open2() returns the process ID of the child process. It doesn’t return on failure: it just raises an exception matching "/^open2:/". However, "exec" failures in the child are not detected. You’ll have to trap SIGPIPE yourself
  • open2() does not wait for and reap the child process after it exits. Except for short programs where it’s acceptable to let the operating system take care of this, you need to do this yourself. This is normally as simple as calling "waitpid $pid, 0" when you’re done with the process. Failing to do this can result in an accumulation of defunct or "zombie" processes
  • Using open2(0 can be dangerous and make a deadlock if software run have to read whole input at once! Checkout manual for more information! Use the Comm library and two other modules from CPAN: IO::Pty and IO::Stty to fix it (more in manual)

Running sub-process via IPC::Open3

use IPC::Open3
IPC::Open3, open3 is designed to open a process for reading, writing, and error handling. Effect of invoking open3() is extremely similar to open2(), open3() spawns the given $cmd and connects CHLD_OUT for reading from the child, CHLD_IN for writing to the child, and CHLD_ERR for errors. If CHLD_ERR is false, or the same file descriptor as CHLD_OUT, then STDOUT and STDERR of the child are on the same filehandle.
use IPC::Open3;

$pid = open3(\*CHLD_IN, \*CHLD_OUT, \*CHLD_ERR, 'some cmd and args', 'optarg', ...);

my($wtr, $rdr, $err);
$pid = open3($wtr, $rdr, $err, 'some cmd and args', 'optarg', ...);
Notes:
  • The CHLD_IN will have autoflush turned on
  • If CHLD_IN begins with "<&", then CHLD_IN will be closed in the parent, and the child will read from it directly
  • If CHLD_OUT or CHLD_ERR begins with ">&", then the child will send output directly to that filehandle. In both cases, there will be a dup(2) instead of a pipe(2) made.
  • If either reader or writer is the null string, this will be replaced by an autogenerated filehandle. If so, you must pass a valid lvalue in the parameter slot so it can be overwritten in the caller, or an exception will be raised
  • The filehandles may also be integers, in which case they are understood as file descriptors
  • open3() returns the process ID of the child process. It doesn’t return on failure: it just raises an exception matching "/^open3:/". However, "exec" failures in the child (such as no such file or permission denied), are just reported to CHLD_ERR, as it is not possible to trap them
  • If the child process dies for any reason, the next write to CHLD_IN is likely to generate a SIGPIPE in the parent, which is fatal by default. So you may wish to handle this signal
  • open3() does not wait for and reap the child process after it exits. Except for short programs where it’s acceptable to let the operating system take care of this, you need to do this yourself. This is normally as simple as calling "waitpid $pid, 0" when you’re done with the process. Failing to do this can result in an accumulation of defunct or "zombie" processes

Running sub-process via IPC::Cmd

use IPC::Cmd qw[can_run run];
IPC::Cmd is module which helps finding and running system commands even interactively if desired, it is almost platform independent (if using system commands can be platform independent ;)).
The "can_run" function can tell you if a certain binary is installed and if so where - this is exactly the same as which in Bash. The "run" function can actually execute any of the commands you give it and give you a clear return value, as well as adhere to your verbosity settings.
use IPC::Cmd qw[can_run run];

my $full_path = can_run('wget') or warn 'wget is not installed!';

### commands can be arrayrefs or strings ###
my $cmd = "$full_path -b theregister.co.uk";
my $cmd = [$full_path, '-b', 'theregister.co.uk'];

### in scalar context ###
my $buffer;
if( scalar run( command => $cmd,
 verbose => 0,
 buffer  => \$buffer )
   ) {
 print "fetched webpage successfully: $buffer\n";
}


### in list context ###
my( $success, $error_code, $full_buf, $stdout_buf, $stderr_buf ) =
 run( command => $cmd, verbose => 0 );

if( $success ) {
 print "this is what the command printed:\n";
 print join "", @$full_buf;
}

### check for features
print "IPC::Open3 available: "  . IPC::Cmd->can_use_ipc_open3;
# ipc_run will be probably false
print "IPC::Run available: "    . IPC::Cmd->can_use_ipc_run;
print "Can capture buffer: "    . IPC::Cmd->can_capture_buffer;
Notes (How It Works in perldoc):
  • If there is available "IPC::Run", and the variable $IPC::Cmd::USE_IPC_RUN is set to true IPC::Run will be used to run command (full output available in buffers, interactive commands are sure to work and you are guaranteed to have your verbosity settings honored cleanly)
  • Otherwise, if the variable $IPC::Cmd::USE_IPC_OPEN3 is set to true, command will be executed using "IPC::Open3". Buffers will be available on all platforms except "Win32", interactive commands will still execute cleanly, and also your verbosity settings will be adhered to nicely
  • Otherwise, if verbose argument is set to true, module will fallback to simple system() call, capture of buffers can't be done, but interactive commands should still work
  • Otherwise IPC::CMD will try and temporarily redirect STDERR and STDOUT, do a system() call with your command and then re-open STDERR and STDOUT. This is the method of last resort and will still allow you to execute your commands cleanly. However, no buffers will be available
Warnings:
  • Whitespaces - When you provide a string as this argument, the string will be split on whitespace to determine the individual elements of your command. Although this will usually just Do What You Mean, it may break if you have files or commands with whitespace in them, so be careful. If you do not wish this to happen, you should provide an array reference, where all parts of your command are already separated out. Note however, if there’s extra or spurious whitespace in these parts, the parser or underlying code may not interpret it correctly, and cause an error.
    # Bash command: gzip -cdf foo.tar.gz | tar -xf -
    # should be passed as
    
    my $cmd = "gzip -cdf foo.tar.gz | tar -xf -";
    
    # or as
    $cmd = ['gzip', '-cdf', 'foo.tar.gz', '|', 'tar', '-xf', '-'];
    
    # but not as:
    # $cmd = ['gzip -cdf foo.tar.gz', '|', 'tar -xf -']; # WRONG!

Summary

Action exec() system() qx{}/`` open('|') IPC::Open2 IPC::Open3 IPC::Cmd
Capture STDOUT LIMITED* LIMITED* YES YES YES YES YES
Capture STDERR LIMITED* LIMITED* LIMITED* NO NO YES YES
Run interactive commands LIMITED* LIMITED* LIMITED* LIMITED** YES YES NOT TESTED
Get return value NO YES NO NO NO NO YES
Run a sub-shell (sh -c) able to
expand shell meta characters
YES YES YES NO YES YES YES
Present in STD Perl dist YES YES YES YES YES YES YES
  • LIMITED* - via shell redirect
  • LIMITED** - only writing/reading to/from command

Conclusion

In this article I have presented many ways of running subproces from Perl, in my opinion the best is IPC::Cmd because it is the most portable - if running system commands can be portable. IPC::Cmd provides easy to understand interface to interact with host system and if you would like to capture STDOUT and STDERR is the right choice, but it is not probably as fast as IPC::Open3. If you don't like this approach you can use system() function with additional Bash redirects - but it will be much slower.
For interesting informations about interacting Perl and your system you should read PerlFaq.

1 comment:

  1. NIce description what it is really done when forking and running system is here:

    http://alumnus.caltech.edu/~svhwan/prodScript/avoidSystemBackticks.html

    ReplyDelete