algorithm - Quick and Merge sort for multiple CPUs -

March 15, 2013

both merge sort , quick sort can work in parallel. each time split problem in 2 sub-problems can run sub-problems in parallel. looks sub-optimal.

suppose have 4 cpus. on 1st iteration split problem in 2 sub-problems , 2 cpus idle. on 2nd iteration cpus busy on 3d iteration not have enough cpus. so, should adapt algorithm case when cpus << log(n).

does make sense? how adapt sorting algorithms these cases?

first off, best parallel implementation depend highly on environment. factors consider:

shared memory (a 4-core computer) vs. not shared (4 single-core computers)
size of data sort
speed of comparing 2 elements
speed of swapping/moving 2 elements
memory available
is each computer/core identical or there differences in speeds, network latency communicate between parts, cache effects, etc.
fault tolerance: if 1 computer/core broke down in middle of operation.

etc.

now moving theoretical:

suppose have 1024 cards, , 7 other people me sort them.

merge sort

i split stack 8 sections of equal size. won't equal since going fast. since friends can start sorting part section, should give first friend stack bigger rest , smaller towards end.

each person sorts part sequentially. (radix sort, quick sort, merge sort, etc.)

now hard part ... merging.

in real life have first 2 people ready form pair , start merging decks together. perhaps work together, 1 person merging front , other back. perhaps both work front while calling numbers out.

soon enough other people done individual sorting, , can start merging. have them form pairs find convenient , keep going until cards merged.

quick sort

the real trick here try parallelize partitioning, since rest pretty easy do.

i start breaking stack 8 parts, , hand 1 part out each friend. while doing this, choose 1 of cards looks might end towards middle of sorted deck. call out number.

each of friends partition smaller stack 3 piles, less called out number, equal called out number, , greater called out number. if 1 friend faster others, he/she can steal cards neighboring friend.

when finished that, collect less thans 1 pile , give friends 0 through 3, set aside equal to's, , give greater's friends 4 through 7.

friends 0 through 3, divide stack 4 equal parts, choose card partition around, , repeat process amongst themselves.

this repeats until each friend has own stack.

(note if partitioning card wasn't chosen well, rather dividing work 50-50, maybe assign 2 friends work on less thans, , let other 6 work on greater thans.)

at end, collect of stacks in right order, along partition cards.

conclusion

while true approaches faster on computer in real life, think preceding start. different computers or cores or threads perform work @ different speeds, unless implementing sort in hardware. (if are, might want "sorting networks" , or "optimal sorting networks").

if sorting numbers, need large dataset helped paralellizing it.

however, if sorting images comparing sum manhattan distance between corresponding pixel red green blue values. find less difficult speed-up of less k times k cpu's.

lastly, want time sequential version(s), , compare go along, since, cache effects, memory usage, network costs, etc, might might make difference.

Search This Blog

Error