1.1. Quick tutorial
The use of the R add-on package rparallel is quite straightforward.
Once you have finished programming and testing your algorithm, to
run it in parallel, you only need to add a few lines as shows the following example:
Original Function:
yourFunctionName <- function( argument1, argument2=NULL )
{
# 1. Initializing Values
internalVar1 <- 0
reduceVar <- NULL
# 2. Start of loop
for(index in 1:nrow(argument1))
{
#Make some calculations
internalVar1 <- someCalculations( argument2 )
tempResult <- someOperations( argument1[ index ], internalVar1 )
reduceVar <- reduceOperation( tempResult, reduceVar )
}
# 3. Finalizing the function
return( reduceVar )
}
Parallelized Function:
yourFunctionName <- function( argument1, argument2=NULL )
{
# 1. Initializing Values
internalVar1 <- 0
reduceVar <- NULL
if( "rparallel" %in% names( getLoadedDLLs()) )
{
runParallel( resultVar="reduceVar", resultOp="reduceOperation" )
}
else
{
# 2. Start of loop
for(index in 1:nrow(argument1))
{
#Make some calculations
internalVar1 <- someCalculations( argument2 )
tempResult <- someOperations( argument1[ index ], internalVar1 )
reduceVar <- reduceOperation( tempResult, reduceVar )
}
}
# 3. Finalizing the function
return( reduceVar )
}
If you have loaded the library rparallel, every time you run your function,
it will run in parallel!
That's it!!
1.2. The introduced lines, explained.
The added lines materialize the following statement:
"if rparallel has been loaded and there is a loop enclosed, run it in parallel,
otherwise run it without changes"
To implement it, we obviously need an if / else control structure where we enclose
our loop. Therefore, the first line added, just before the loop starts, is an
if statement, with the following content:
if( "rparallel" %in% names( getLoadedDLLs()) )
The above lines checks whether the library has been loaded in R or not.
If so, it is understood that the parallel execution should go on.
It is important to remark that the use of rparallel is optional and must not
force its use to anyone.
Therefore, the original code is kept intact.
In the case the library is loaded, the next added line will be reached by R:
runParallel( resultVar="reduceVar", resultOp="reduceOperation" )
The function runParallel() is the whole Application Programming Interface (i.e. API),
provided with R/parallel.
The only required arguments of this function
are resultVar and resultOp. The first variable indicates which
variables within the enclosed loop will store the calculation results after each iteration,
and the second how these variables have to be 'operated' or 'reduced'. As long as
each iteration is independent from the others, each iteration will generate an independent set of result variables.
All these variable sets have to be reduced to a single one with the values that it will be obtained
in the case the loop was run sequentially. Examples of reduce operations (i.e. values of resultOp) are:
max,'+' or rbind.
There are optional arguments that enable addititional features like the CPU usage control or number of workers.
As long as the R/parallel projects are developed, more arguments will be added.
Details about these arguments are documented within the R help files, as well as with its (online) pdf version.
Once runParallel has been informed about the result variables, the next step is to look for the loop to parallelize.
else
{
# 2. Start of loop
for(index in 1:nrow(argument1))
{
#Make some calculations
internalVar1 <- someCalculations( argument2 )
tempResult <- someOperations( argument1[ index ], internalVar1 )
reduceVar <- reduceOperation( tempResult, reduceVar )
}
}
Due to the fact that the loop is enclosed in a if/else structure, it is easy for the function runParallel
to locate it. Internally, the R package rparallel takes care of splitting the iterations of the loop,
create and run as many working processes (i.e. workers) as needed or indicated, and collect all the partial results once
the workers have finished processing their jobs.
# 3. Finalizing the function
return( reduceVar )
}
Using the information provided with its arguments, runParallel obtains the value
of the final result variables (indicated with the string vector resultVar) and
updates its values in the corresponding R environment.
Therefore, the values obtained are the same, no matter if run in parallel or sequentially,
and any of the reduced variables, like in this example, can be returned by the outer function.