Problems about accelerating the speed of inference stage #23

haopo2005 · 2018-03-01T04:41:02Z

Hi,
I'd like to run the inference model at embed device. Due to the limitation of computing resources, I have some questions as followed:

I've tried to convert the float computing int integer computing. That is to say, after parsing the cascade binary file, the luts and thresholds matrix will be converted to integer.

`int i,j;
FILE* file;
file = fopen("jst_headcascade", "rb");
if(!file)
return 0;

fread(&version, sizeof(int32_t), 1, file);
fread(&bbox[0], sizeof(int8_t), 4, file);
fread(&tdepth, sizeof(int), 1, file);
fread(&ntrees, sizeof(int), 1, file);

for(i=0; i<ntrees; ++i)
{
	fread(&tcodes[i][0], sizeof(int32_t), (1<<tdepth)-1, file);
	fread(&luts[i][0], sizeof(float), 1<<tdepth, file);
	fread(&thresholds[i], sizeof(float), 1, file);
}
fclose(file);

//convert lut and thr to int data
for(i=0;i<ntrees;i++)
{
	int_thresholds[i] = (int)(thresholds[i]*PERCISON);
	for(j=0;j<(1<<tdepth);j++)
	{
		int_luts[i][j] = (int)(*(&luts[0][0]+i*1024+j)*PERCISON);
	}
}`

However, I've met some problems in the "run_cascade" function. The index of vppixels array is out of range.
So I'd like to know the structure of cascade file. The declarations of these matrixs are not fully used during the inference stage.

`int32_t version = 3;

int tdepth;
int ntrees=0;

int8_t bbox[4]; // (r_min, r_max, c_min, c_max)

int32_t tcodes[4096][1024];
float luts[4096][1024];

float thresholds[4096];`

I cant understand the parsing of following code, especially the tcodes and lut
`offset = ((1<<tdepth)-1)sizeof(int32_t) + (1<<tdepth)sizeof(float) + 1sizeof(float);
ptree = (int8_t)cascade + 2sizeof(float) + 2sizeof(int);

*o = 0.0f;

for(i=0; i<ntrees; ++i)
{
	//
	tcodes = ptree - 4;
	lut = (float*)(ptree + ((1<<tdepth)-1)*sizeof(int32_t));
	thr = *(float*)(ptree + ((1<<tdepth)-1)*sizeof(int32_t) + (1<<tdepth)*sizeof(float));

	//
	idx = 1;

	for(j=0; j<tdepth; ++j)
		idx = 2*idx + (pixels[(r+tcodes[4*idx+0]*s)/256*ldim+(c+tcodes[4*idx+1]*s)/256]<=pixels[(r+tcodes[4*idx+2]*s)/256*ldim+(c+tcodes[4*idx+3]*s)/256]);

	*o = *o + lut[idx-(1<<tdepth)];

	//
	if(*o<=thr)
		return -1;
	else
		ptree = ptree + offset;
}

//
*o = *o - thr;`

Any response is helpful.
Thanks

The text was updated successfully, but these errors were encountered:

nenadmarkus · 2018-03-05T12:15:07Z

I think you have a problem in the following line: int_luts[i][j] = (int)(*(&luts[0][0]+i*1024+j)*PERCISON);. Why not simply use int_luts[i][j] = (int)(luts[i][j]*PRECISION);? (I didn't put too much thought into this so I might be wrong.)

The offset variable tells us how much memory we have to skip in order to get to the next tree (ptree is a pointer to its beginning).

We use the line tcodes = ptree - 4; because idx starts from 1, as you can see from the code. This simplifies the j-based for loop that iteratesd over the individual binary tests contained in the tree.

haopo2005 · 2018-03-08T14:17:50Z

thanks.
I'd like to know more about the meaning of matrix "tcodes ,lut ,thr" and their space relationship with pixel matrix.
What will happen if the pixel matri I fed into the function is smaller than the trainning period?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problems about accelerating the speed of inference stage #23

Problems about accelerating the speed of inference stage #23

haopo2005 commented Mar 1, 2018 •

edited

Loading

nenadmarkus commented Mar 5, 2018

haopo2005 commented Mar 8, 2018

Problems about accelerating the speed of inference stage #23

Problems about accelerating the speed of inference stage #23

Comments

haopo2005 commented Mar 1, 2018 • edited Loading

nenadmarkus commented Mar 5, 2018

haopo2005 commented Mar 8, 2018

haopo2005 commented Mar 1, 2018 •

edited

Loading