Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems about accelerating the speed of inference stage #23

Open
haopo2005 opened this issue Mar 1, 2018 · 2 comments
Open

Problems about accelerating the speed of inference stage #23

haopo2005 opened this issue Mar 1, 2018 · 2 comments

Comments

@haopo2005
Copy link

haopo2005 commented Mar 1, 2018

Hi,
I'd like to run the inference model at embed device. Due to the limitation of computing resources, I have some questions as followed:

I've tried to convert the float computing int integer computing. That is to say, after parsing the cascade binary file, the luts and thresholds matrix will be converted to integer.

`int i,j;
FILE* file;
file = fopen("jst_headcascade", "rb");
if(!file)
return 0;

fread(&version, sizeof(int32_t), 1, file);
fread(&bbox[0], sizeof(int8_t), 4, file);
fread(&tdepth, sizeof(int), 1, file);
fread(&ntrees, sizeof(int), 1, file);

for(i=0; i<ntrees; ++i)
{
	fread(&tcodes[i][0], sizeof(int32_t), (1<<tdepth)-1, file);
	fread(&luts[i][0], sizeof(float), 1<<tdepth, file);
	fread(&thresholds[i], sizeof(float), 1, file);
}
fclose(file);

//convert lut and thr to int data
for(i=0;i<ntrees;i++)
{
	int_thresholds[i] = (int)(thresholds[i]*PERCISON);
	for(j=0;j<(1<<tdepth);j++)
	{
		int_luts[i][j] = (int)(*(&luts[0][0]+i*1024+j)*PERCISON);
	}
}`

However, I've met some problems in the "run_cascade" function. The index of vppixels array is out of range.
So I'd like to know the structure of cascade file. The declarations of these matrixs are not fully used during the inference stage.

`int32_t version = 3;

int tdepth;
int ntrees=0;

int8_t bbox[4]; // (r_min, r_max, c_min, c_max)

int32_t tcodes[4096][1024];
float luts[4096][1024];

float thresholds[4096];`

I cant understand the parsing of following code, especially the tcodes and lut
`offset = ((1<<tdepth)-1)sizeof(int32_t) + (1<<tdepth)sizeof(float) + 1sizeof(float);
ptree = (int8_t
)cascade + 2sizeof(float) + 2sizeof(int);

*o = 0.0f;

for(i=0; i<ntrees; ++i)
{
	//
	tcodes = ptree - 4;
	lut = (float*)(ptree + ((1<<tdepth)-1)*sizeof(int32_t));
	thr = *(float*)(ptree + ((1<<tdepth)-1)*sizeof(int32_t) + (1<<tdepth)*sizeof(float));

	//
	idx = 1;

	for(j=0; j<tdepth; ++j)
		idx = 2*idx + (pixels[(r+tcodes[4*idx+0]*s)/256*ldim+(c+tcodes[4*idx+1]*s)/256]<=pixels[(r+tcodes[4*idx+2]*s)/256*ldim+(c+tcodes[4*idx+3]*s)/256]);

	*o = *o + lut[idx-(1<<tdepth)];

	//
	if(*o<=thr)
		return -1;
	else
		ptree = ptree + offset;
}

//
*o = *o - thr;`

Any response is helpful.
Thanks

@nenadmarkus
Copy link
Owner

I think you have a problem in the following line: int_luts[i][j] = (int)(*(&luts[0][0]+i*1024+j)*PERCISON);. Why not simply use int_luts[i][j] = (int)(luts[i][j]*PRECISION);? (I didn't put too much thought into this so I might be wrong.)

The offset variable tells us how much memory we have to skip in order to get to the next tree (ptree is a pointer to its beginning).

We use the line tcodes = ptree - 4; because idx starts from 1, as you can see from the code. This simplifies the j-based for loop that iteratesd over the individual binary tests contained in the tree.

@haopo2005
Copy link
Author

thanks.
I'd like to know more about the meaning of matrix "tcodes ,lut ,thr" and their space relationship with pixel matrix.
What will happen if the pixel matri I fed into the function is smaller than the trainning period?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants