R/parallel API : runParallel()R Documentation

Accessible Parallel Computing for Desktop Computers in R

Description

The package R/parallel enables the parallel execution of loops without data dependencies just by adding a single function: runParallel.

Enclosing this function and the target loop within an if/else control structure, R/parallel is able to locate the loop and automate its parallelization.

Usage

runParallel( resultVar, resultOp, cpuLimit, nWorkers, verbose, tempDir )

Arguments

resultVar Vector of character strings with the variable names which values have to be retrieved during the loop iterations to reduce its final value (i.e. to obtain their correct value as if they were being run sequentially).
resultOp Vector of character strings with the operations or functions like +, average, max or rbind, that have to be applied each iteration to each resultVar[] variable, .
cpuLimit (optional) numeric indicating the maximum percentage of processor usage value allowed to run R/parallel. This value is checked by each worker before the beginning of each iteration. During the calculation the processor can reach 100%. In this situations, the processor usage, as long as R/parallel is not preemptive, is regulated at beginning of the next iteration. Valid values are numerics from 1 to 100.
Default value is 100 (i.e. no processor control).
nWorkers (optional) numeric indicating the number of new R instances (i.e. additional processes) that will run the workers. The problem size (i.e. the number of iterations) is split among all the workers. Setting more workers than processor units (e.g. cores) usually does not improve the performance. However, when iterating through long vectors that will be split (i.e. thousands elements), setting more workers can help to speed up the calculation.
Default value is 2.
verbose (optional) character string indicating the verbosity level of R/parallel. Valid values are:
"silent" : No output,
"normal" : Minimum information,
"info" : Additional information, including elapsed time, and,
"debug" : Highest amount of information, only for internal developers.
Default value is "normal".
tempDir (optional) character string indicating the "PATH" to a folder with enough free space to store temporary files used during calculation. If not provided, the R temporary directory, as returned by tempdir(), will be used.
Default value is NULL.

Details

runParallel requests to the R/parallel internal utilities to retrieve the expression enclosed inside the following else section of code (i.e. the 'for' loop to be parallelized). R/parallel, combining the information retrieve from the current execution with the information provided by the user, automates the parallel execution of the loop. Once the parallel execution has finished in the workers it updates the variables indicated by the user.

Value

It returns no value. However it updates the values of the variables indicated within the argument vector resultVar.

Application

The following pseudo-R-code simulates the typical layout of a function which includes a central loop that iterates through one of his arguments to perform a calculation:

  myFunction <- function( arg1, arg2, ... )
  {
    # Initial sequence of statements
    # Initializing variables and checking arguments
    variable1 <- constant1
    variable2 <- otherFunction( arg1 )
   ...

    # loop
    for( index in FirstValue:LastValue )
    {
      (more statements/loops/expressions/function calls/etc)
      ...
      tempVar <- functionA( arg1[ index ], variable1, ... )
      tempVar <- functionB( arg2[ index ], variable1, ... )
      resultVar1 <- resultOP1( resultVar1, tempVar1)
      resultVar2 <- resultOP2( resultVar2, tempVar2)
    }

    # Finalizing calculation. Final sequence of statements
   (more statements/expressions/function calls/etc)
   ...

   return( anyCalculatedValue )
  }
To include R/parallel, an if{} else{} control structure has to be added as shows the following example:

  myFunction <- function( arg1, arg2, ... )
  {
    # Initial sequence of statements
    # Initializing variables and checking arguments
    variable1 <- constant1
    variable2 <- otherFunction( arg1 )
    ...
    if( "rparallel" %in% names( getLoadedDLLs()) )
    {
      runParallel( resultVar=c("resultVar1", "resultVar2" ),
                   resultOp= c("resultOP1", "resultOP2" ) )
    }
    else
    {
      # loop
      for( index in FirstValue:LastValue )
      {
        (more statements/loops/expressions/function calls/etc)
        ...
        tempVar1 <- functionA( arg1[ index ], variable1, ... )
        tempVar2 <- functionB( arg2[ index ], variable1, ... )
        resultVar1 <- resultOP1( resultVar1, tempVar1)
        resultVar2 <- resultOP2( resultVar2, tempVar2)
      }
    }

    # Finalizing calculation. Final sequence of statements
    (more statements/expressions/function calls/etc)
    ...

    return( anyCalculatedValue )
  }

Note

A data dependency occurs when the calculation of a value depends on the result of a previous iteration (e.g. a[n] <- a[n-1] + 1 ). Results when running with data dependencies are unpredicted.

Author(s)

Gonzalo Vera Rodríguez <gonzalo.vera[AT]rparallel.org>

Examples


  ## Not run:
  # 1. Adapt your function, for example:
  qtlMapping <- function( map, genotypes, traits )
  {
    result <- NULL
    if( "rparallel" %in% names( getLoadedDLLs()) )
    {
      runParallel( resultVar="result", resultOp="rbind", nWorkers=4 )
    }
    else
    {
      for( idx in 1:nrow( traits ) )
      {
        tmpResult             <- MQM(map, genotypes, traits[idx,])
        rownames( tmpResult ) <- rownames(traits)[idx]
        result                <- rbind( result, tmpResult )
      }

      return( result )
    }
  }

  #2. Load the library
  library(rparallel)

  #3. Run your function as usual!
  myQTLmap <- qtlMapping( map, genotypes, traits )

  ## End(Not run)

[Top]