Parallelism challenge

In recent project after refactoring some improperly implemented functionality (don’t worry, that functionality was implemented by previous consulters :)), we realized that we need to delete about 50 million sysobjects, the bad news is documentum performs deletes very slowly – in single-thread operation it’s capable to delete about 50 sysobject per second, i.e. in our case it would take about 12 days – seems not to be a good perspective 😦 The obvious solution was to use multithreading, but I prefer do not code in Java when it comes to perform administration routines. So, i implemented a very basic STDIN multiplexer using bourne again shell:

#!/bin/bash

# amount of processes to spawn
PROCS=$1

# command to spawn
shift
CMD=$@

# opened descriptors
declare -a DESCRIPTORS
# spawned pids
declare -a PROCESSES

# closes all desriptors opened previously
close() {
 for d in ${DESCRIPTORS[*]}; do
  eval "exec $d<&-"
 done
}

# waits all spawned processes
# wait call does not work because
# processes are spawned in subshell
waitall() {
 local pid
 while :; do
  for pid in $@; do
   shift
   kill -0 $pid 2>/dev/null
   if [ "x0" = x$? ]; then
    set -- $@ $pid
   fi
  done
  (("$#" > 0)) || break
  sleep 5
 done
}

# find process's children
findchild() {
 local pid=$1
 while read p pp; do
  if [ "x$pid" = "x$pp" ]; then
   echo $p
  fi
 done < <(ps -eo pid,ppid)
}


# terminates process's tree
killtree() {
 local pid=$1
 local sig=${2-TERM}
 kill -STOP $pid 2>/dev/null
 if [ "x0" = x$? ]; then
  for child in `findchild $pid`; do
   killtree $child $sig
  done
  kill -$sig $pid 2>/dev/null
  kill -CONT $pid 2>/dev/null
 fi
}

# closes all descriptors and
# terminates all spawned processes
abort() {
 local pid
 close
 for pid in ${PROCESSES[*]}; do
  killtree $pid TERM
  killtree $pid KILL
 done
}

# emergency exit: close all descriptors
# and terminate spawned processes
trap "abort" 0

for (( i=0; i<=$PROCS-1; i+=1 )); do
 # spawning command
 exec {FD}> >($CMD) || exit $?
 # storing pid of spawned process
 PROCESSES[$i]=$!
 # storing opened descriptor
 DESCRIPTORS[$i]=$FD
done

while read line; do
 i=$(((i + 1) % $PROCS))
 echo $line >&${DESCRIPTORS[i]}
done

# normal exit: close all descriptors
# and wait spawned processes
close
waitall ${PROCESSES[*]}

trap - 0

Now I’m able to perform deletes in parallel:

 ~$] parallel.sh 40 iapi docbase -Udmadmin -Ppassword -X \
> < delete_objects.api > delete_objects.log

4 thoughts on “Parallelism challenge

  1. Добрый день, Андрей
    Посмотри, пожалуйста, по-моему что то пропустил

    exec ‘{FD}’
    ++ iapi dmprod -Udmadmin -P -X
    ./parallel.sh: line 82: exec: {FD}: not found

    Like

  2. Pingback: Workflow throughput | Documentum in a (nuts)HELL

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s