c++ - Fast way to "down-scale" a three-dimensional tensor index -

August 15, 2014

this bit twiddling question c or c++. running gcc 4.6.3 under ubuntu 12.04.2.

i have memory access index p three-dimensional tensor has form:

p = (i<<(2*n)) + (j<<n) + k

here 0 <= i,j,k < (1<<n) , n positive integer.

now want compute "down-scaled" memory access index i>>s, j>>s, k>>s 0 < s < n, be:

q = ((i>>s)<<(2*(n-s))) + ((j>>s)<<(n-s)) + (k>>s)

what fastest way compute q p (without knowing i,j,k beforehand)? can assume 0 < n <= 10 (i.e. p 32 bit integer). interested in fast approach n=8 (i.e. i,j,k 8 bit integers). n , s both compile time constants.

an example n=8 , s=4:

unsigned int p = 240407; // (3<<16) + (171<<8) + 23; unsigned int q = 161; // (0<<8) + (10<<4) + 1

straightforward way, 8 operations (others operations on constants):

m = (1<<(n-s)) - 1;                     // mask s lowest bits. q = (  ((p & (m<<(2*n+s))) >> (3*s))    // mask 'i', shift new position.      + ((p & (m<<(  n+s))) >> (2*s))    // likewise 'j'.      + ((p & (m<<     s))  >>    s));   // likewise 'k'.

looks complicated, isn't, not easy (to me @ least) constants correct.

to create formula less operations, observe shifting numbers u bits left same multiplying 1<<u. thus, due multiplication distributivity, multiplying ((1<<u1) + (1<<u2) + ...) same shifting left u1, u2, ... , adding together.

so, try mask needed portions of i, j , k, "shift" them correct positions relative each other 1 multiplication , shift result right, final destination. gives 3 operations compute q p.

unfortunately, there limitations, case try 3 @ once. when add numbers (indirectly, adding several multipliers), have make sure bits can set in 1 number, else we'll wrong result. if try add (indirectly) 3 shifted numbers @ once, have this:

iiiii...........jjjjj...........kkkkk.......  n-s      s      n-s      s      n-s .....jjjjj...........kkkkk................  n-s  n-s      s      n-s ..........kkkkk...............  n-s  n-s  n-s

note farther left in second , third numbers bits of i , j, ignore them. this, assume multiplication works on x86: multiplying 2 types t gives number of type t, lowest bits of actual result (equal result if there no overflow).

so, make sure k bits third number not overlap j bits first, need 3*(n-s) <= n, i.e. s >= 2*n/3 n = 8 limits s >= 6 (just 1 or 2 bits per component after shifting; don't know if ever use low precision).

however, if s >= 2*n/3, can use 3 operations:

// constant multiplier perform 3 shifts @ once. f = (1<<(32-3*n)) + (1<<(32-3*n+s)) + (1<<(32-3*n+2*s)); // mask, shift/combine multipler, right shift destination. q = (((p & ((m<<(2*n+s)) + (m<<(n+s)) + (m<<s))) * f)      >> (32-3*(n-s)));

if constraint s strict (which is), can combine first , second formula: compute i , k second approach, add j first formula. here need bits don't overlap in following numbers:

iiiii...............kkkkk.......  n-s   s   n-s   s   n-s ..........kkkkk...............  n-s  n-s  n-s

i.e. 3*(n-s) <= 2*n, gives s >= n / 3, or, n = 8 less strict s >= 3. formula follows:

// constant multiplier perform 2 shifts @ once. f = (1<<(32-3*n)) + (1<<(32-3*n+2*s)); // mask, shift/combine multipler, right shift destination // , add 'j' straightforward formula. q = ((((p & ((m<<(2*n+s)) + (m<<s))) * f) >> (32-3*(n-s)))      + ((p & (m<<(n+s))) >> (2*s)));

this formula works example s = 4.

whether faster straightforward approach depends on architecture. also, have no idea if c++ guarantees assumed multiplication overflow behavior. finally, need make sure values unsigned , exactly 32 bit formulas work.

Search This Blog

Error

c++ - Fast way to "down-scale" a three-dimensional tensor index -

Comments

Post a Comment

Popular posts from this blog

basic authentication with http post params android -

c++ - End of file on pipe magic during open -

vb.net - Virtual Keyboard commands -