[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: image.png (134 KB, 1009x571)
134 KB
134 KB PNG
I need someone that has performed some benchmark on an ready-for-production 'eight point algorithm for fundamental with nonlinear refinement' my implementation takes around 150 ms to converge in an optimal fundamental matrix, which is unviable for real time execution, I tried to compare mine against OpenCV's implementation but they doesn't do any NL optimization after the first approximation, i want to know if the performance issue is skill issue or I have to take another approach
>>
>>106745297
>Ubuntu
>Screenshots his code instead of using proper tags

I bet you use they/them pronouns
>>
I'm following the steps of the book 'An Invitation to 3-D Vision: From Images to Geometric Models ' and it says the best way of getting the optimal fundamental out of some given set of point correspondences is to compute this eight point algorithm with refinement and then get the fundamental that has the most inliners with RanSaC, but I'm starting to fear the most practical way of doint it is just to keep the first aproximation and then obtaining the fundamental with the most inliners with RanSaC, does someone that has already do that can tell?
>>
Also, after obtaining the fundamental, in order to obtain an aproximation of the euclidean reconstruction of the points there are some compute-intensive operations i have to perform, is it really necessary to perform point reconstruction in every frame in order to obtan an stable an reliable estimation of the 3D geometry of the observable space, can it be used some sort of kalman prediction system to aproximate the future movement out of prior movement estimation for the next temporal-close frames?
>>
>>106745418
the maths are way beyond my paygrade
but if you post your code i might have an idea or two on how to optimize it
>>
>>106745331
you sound like an Arch troon who didn't understand anything about OP's question because you don't know shit besides ricing your shitty WM
>>
>>106745331
i can only hope that you understand how retarded you've made yourself look with this reply
>>
>>106745927
I've been doing some tests on the internal code of the function and the main bottlenecks of it are on the SVD of the 'χ' matrix (second point of the book) (around 5ms) and the SVD found in the linear equation solution in the linear optimization, said code of SVD hasn't a lot of ponts of optimation, besides increase the convergence speed by 'wilkinson shift', to be honest I think it's more of a mathematical design problem than a code problem, also If I wanted to show you the code I should give you the entire file and I dont know how to share them here, sorry I'm a newfag
>>
^
>>106745927
PD: Thanks anyways
>>
Seeing that the NL implementation for the computation of the fundamental matriz might be a dead end, ill try to skip it and try to get the best estimation entirely out of the RanSaC 'criterion', ill tell you if I found something relevant
>>
>>106746555
>also If I wanted to show you the code I should give you the entire file and I dont know how to share them here, sorry I'm a newfag
you could dump it into https://catbox.moe/ and share the link

abt opti: i was specifically thinking about using intrinsics, or even rewriting your code into a kernel using opencl or cuda
maybe adjusting the memory layout could improve things. maybe theres allocations that could be avoided
you never know what a second set of eyes can find in your code

that is, if you dont mind sharing it with us
the maths part is completely abstract to me, but code i can work with
>>
>>106746671
if decide to not share code, keep us posted nonetheless
you improve the quality of discourse on this board
>>
>>106746711
Appreciate it!
>>106746698
Thanks for the link, I've heard about it in the past but didn't pay too much attention to it.

This is my math library, feel free to use if you want: files.catbox.moe/mh9du1.c

PD: the function im making reference up in my commentary is 'householderSVD', I don't have that much documentation on the functions (And the little documentation I have is in Spanish) so feel free to ask whatever you want about it
>>
>>106746832
i got the file
im on it. hopefully i can find stuff that can be improved upon

whats your compiler flags?
>>
Also here is the code for the function of eight point algorithm with refinement (EPAWR)(Cuda code due to project requirements): https://files.catbox.moe/l856mj.cu

PD: how can I make so it's possible to directly click the link?
>>
>>106746923
heh, we dont have this on this board afaik
on another note- whats your compiler + compiling script?
i dont recognize this: "[prep.proto_to_header($hpath)]"
>>
>>106746923
and what does cgpf.h realate to?
is it your header or is it a library?
>>
>>106746890
It's hard to compile due to de use of 'prepiler' a pre-compilator I made time ago to handle and automate header creation and other stuff, I can share the soure but there are a lot of custom libraries the progam uses, if you want to make it fast just remove the [prep.x] by just the action the text after the dot describes, the library, after prepiler precompilation is just compiled as an standard 'c' library: " gcc -c -o "${BASE_NAME}.o" "prep_${NAME}" ", and then:
ar rcs "${BASE_NAME}.a" "${BASE_NAME}.o"
>>
>>106746948
cgpf.h stands for 'c general purpose functions', it's basically a little dumbster i have where i put the functions i dont quite know in witch library to put: https://files.catbox.moe/tmx4rk.c
>>
>>106745297
>ubuntu
>float
>useless return
>retarded function naming
>EPAWR
I don't even want to read your code more
>>
>>106746939
The compilation command is this ugly thing, dont worry, you dont need most of the libraries here: nvcc -O3 -Xcompiler "-Wall,-Wextra" systemtc.cu -o test -I/usr/include -L/usr/lib -l:istructs.a -lopencv_core -lopencv_imgproc -lopencv_imgcodecs -lopencv_highgui -lX11 -lstdc++ -l:counter.a -l:cgpf.a -l:cudagpf.a -l:k_debug.a -l:mathlab.a -l:kvc_edged.a -l:kvc_matrixGlobalAlgorithms.a -l:kvc_matrixIndexAlgorithms.a -l:kvc_motion.a -l:kvc_cornerd.a -ljpeg -l:mathlib.a -l:capapi.a -l:displaymanager.a -l:x11manager.a -l:threadcomm.a -lavformat -lavcodec -lavutil -lswscale -lavdevice
>>
>>106747060
>float
float?
>useless return
where
>>
>>106745297
I will take a look at this later. Keep the thread bumped.
>>
>>106747229
Alright, I'll continue with the RanSaC algorithm meanwhile
>>
>>106747077
he might be just a troll
but i noticed with matrixSetValue for instance you check if rows or cols are inferior or equal to 0
if you could use unsigned ints for your rows and cols you could disincentivize the situation where the value is negative. you could also check for that somewhere else in your code to avoid doing that repeatedly, each time you call matrixSetValue.
or just state in your documentation that the values given have to be positive.

then, if your rows and cols are equal to zero, you wont even enter the for loops
and so you could eliminate these two conditionals from your code entirely
also i dont see anywhere in your code where you use the return value of that function
so we could turn it into a void one
something like this:

void matrixSetValue(float *output,
float value,
unsigned int rows, unsigned int cols){

unsigned int offset;
for(int i = 0; i < rows; i++ ){
for(int j = 0; j < cols; j++){

offset = j + (i * cols);
output[offset] = value;
}
}
}

also since youre propagating just one value throughout your whole output buffer
you could turn the function into something as simple as this:
void matrixSetValue(float *output,
float value,
unsigned int rows, unsigned int cols){

unsigned int size = rows * cols;

for(int i = 0; i < size; i++)
output[i] = value;
}
>>
>>106745297
>I need someone
>>>/g/sqt/
>>
>>106747377
go back
>>
>>106747077
>>106747341 cont
you also should declare that function as inline methinks
function calls *do* add up
>>
>>106746555
Two thoughts
1. Looks like you only want the last singular vector from the first svd. I don't know off hand if there's a way to calculate that without performing the full svd but that's something to look into.
2. SVD for solving linear systems is often slow and typically favored for numerical stability, consider if QR would work instead.

Also unclear what implementation of SVD youre using, sometimes for small problems general purpose solvers like those in LAPACK might be slow
>>
>>106747341
You are right,the reason I do this is because old and obsolete conventions I used to have, I always tell myself to refactorize but I never find time to do it
void matrixSetValue(float *output,
float value,
size_t rows, size_t cols){

size_t size = rows * cols;

for(int i = 0; i < size; i++)
output[i] = value;
}


>>106747462
I would but, as far as i know, gcc and gpp can do the 'inline' by their own, the 'inline' key word is just a sort of recomendation, I prefer to keep it like that because not being inlined doesn't consume a practically nothing at all and it just keeps the code cleaner
>>
>>106747476
1. you are right, at first i tried to do that but i couldn't find anything doing a fast search on internet and doing some asks to chatGPT, so i tried doing it by the standard way, but now the algorithm is finished i can take a deeper look to se what i find

2. I'll look on this too but i'm not very confident that ill find something because in the reference im following it says explicitly computing SVD, but due to the last row of Vt is the one associated to the smaller eigenvalue of the matrix maybe it has something to do with the epipolar constrain of the fundamental matrix and i can get some properties of it that allows me to simplify the ecuation

3. Im using one of the (theoretically) best methods of obtaining the SVD in terms of stability and performance, given a matrix mxn, it's SVD can be obtained via bidiagonalization using householder reflections and diagonalizing the result using givens rotations, the only thing it can be improved here is the use of 'wilkinson shifts' for enhanced convergence speed
>>
File: 1751656798904997.png (261 KB, 630x574)
261 KB
261 KB PNG
>>106745297
Okay, so, basically, you are:
1. Estimating the solution using linear methods
2. Improving this solution with non-linear methods

In general, I suggest limiting the number of iterations (reduce precision) in during the non-linear phase.

>>106746555
>I've been doing some tests on the internal code of the function and the main bottlenecks of it are on the SVD of the 'χ' matrix (second point of the book) (around 5ms) and the SVD found in the
What algorithm are you using? You can solve SVD using either exact or iterative methods. This shouldn't be your main bottleneck.

By the way:
>686: int maxGivensIters = 10000;
I think this is too much but I'm not familiar with the SVD algorithm you are using (Householder QR?).
>>
>>106747537
>You are right,the reason I do this is because old and obsolete conventions I used to have, I always tell myself to refactorize but I never find time to do it
its a thing to look into
theres certain functions like in transposeMatrix where you check the rows and cols count
only to check it again in copyMatrix

and conditionals can be quite expensive if mispredicted. which shouldnt be the case here but its still instructions you can shave off your code

>inlines
youre right with it, its a recommendation to the compiler but that one has a mind of its own, so it might be optional.
might be. i use force inlines so i never actually bothered to explore the behaviour of the compiler with a non-forced inline

id say, by the looks of it, inlines + removing redundant conditionals could improve your runtime by, idk, something like 10-20% depending on the data youre giving to it, but if i were you id test removing redundant checks, redundant returns, and use inline in functions you use very often, to see if theres a difference

so yeah. youre probably right in that the solution to your problem is to find more efficient maths
rewriting elements of your code into discrete kernels wont make much sense because launching a kernel also comes with overhead
>>
>>106747537
>and use inline in functions you use very often, to see if theres a difference
and simplify your loops where you can
im not sure how the compiler optimizes that, i tend to be as explicit as possible in my code so i didnt check these things
but if i were you id simplify my loops by hand to make sure the compiler will produce something as fast as possible
>>
>>106747878
>What algorithm are you using? You can solve SVD using either exact or iterative methods. This shouldn't be your main bottleneck.

The third point of >>106747691 superficially explains the algorithm tha I use to get the SVD, the mayor bottle neck point lies on the givens diagonalization, it has to do small local rotations to get rid of the super/sub diagonal of the bidiagonal matrix created via householder reflections, it's a bottleneck because it requires a remarkable amount of iterations to converge in a diagonal matrix (around 20 - 50). Also, as far as I know exact methods are only suitable for small matrices to be decomposed, the first matrix (χ) is 8x9 which exceeds the max size allowed for the exact methods i know

>I think this is too much but I'm not familiar with the SVD algorithm you are using (Householder QR?).
1000 iterations are just the max iterations the algorithm is allow to perform, if it considers the bidiagonal matrix has converged to diagonal it breaks the iteration loop around 20-50 should be enough as i said)
>>
>>106747950
>conditionals can be quite expensive if mispredicted. which shouldnt be the case here but its still instructions you can shave off your code
I'll look if there is a substancial performance degradation risk in the conditionals, but i can tell by experience that these kind of details doesn't impact in a way can be reliably measured

>so yeah. youre probably right in that the solution to your problem is to find more efficient maths
rewriting elements of your code into discrete kernels wont make much sense because launching a kernel also comes with overhead
The little kernels are for code readability and it's performance degradation should be nothing comared to the process time of the SVD,
I think the best way of facing it is to removing the non-linear optimization (keeping the function as EPA) and using RanSaC to both improve the estimation quality and get the better picture of the general motion happening in a given image patch, joining this with two-views projective reconstruction and multiple-view structure estimation i could get a reasonable fist
approximation of the multple estructure and motion of the scene, this in turn could be used for future estimations to be more fast and precise/robust
>>
>>106748311
My response to the last quote begins in >The little kernels are for code readability and it's performance degradation
Sorry for the bad formatting
>>
>>106748037
I really have a bad time balancing between optimal code for performance and optimal code for readability, making a wrong step too deep in one side could mean hours of refactoring and, or, micro-optimizating the code... im tired boss
>>
>>106748311
(1/2)
>I'll look if there is a substancial performance degradation risk in the conditionals
there can be
>but i can tell by experience that these kind of details doesn't impact in a way can be reliably measured
because branch prediction is based on heuristics, controlled by the processor. which most often is dependent on the data.
but its more or less based on common sense
if you have branches that never get taken, like when error checking, i think its safe to assume they will be predicted correctly.
but still, its instructions, even when predicted correctly these come at a cost, so its best to avoid them...

>The little kernels are for code readability and it's performance degradation should be nothing comared to the process time of the SVD
if you meant function, not kernel, then you still have to pay attention how often you call them compared to the body of your code, especially when you use functions as a measure to organize your code
thats why i use force inlines btw. but force inlines are not standard to c (total fucking crime) so... yeah. that would ruin your portability if thats your goal. but its not that important. although these things do add up
a sum of small inefficiencies could increase your runtime quite significantly.
>>
>>106748311
(2/2)
but if you mean kernel as in cuda kernel, then you definitely have to look into that. because the overhead from launching a kernel can exceed the cuda runtime by 10x, so 90% of your runtime would be launching kernels

what you can do, although that heavily depends on the structure of your program, is to compose kernels
you write discrete kernels, ofc, but then you compose them into a bigger kernel, and then compile them. and only then run it, to make the overhead irrelevant.
thats what i do in my opencl. this method allows you to replace variables with literals, and coalesce reads into local or even private variables.
but then yeah.i have no clue what SVD EPA or RanSaC does, so in the end, you know better how that stacks up
im just saying that if you call a multitude of mini kernels, then a good portion of your runtime can go into their overhead
and with cuda that would be messy because you would need a helper program that would compose the kernel, and THEN compile it
opencl is better on that front because you compile the kernel during runtime, as opposed to compiletime as with cuda. afaik (im not exactly fluent in cuda, even if many things are is the same as with opencl, only named differently)
>>
>>106748402
this might be a trap you set for yourself
im not gonna lie, your writing style is very hard to read
your function and variable names should be exhaustively explicit
you should be consistent with your indentations
you use macros only when really needed, because they make everything harder to read
i recommend keeping one function per file, so that the subdivision works like bookmarks for your code
and i recommend using the allman style for your code blocks

heres an example of how my code looks like
its a parser for a config file
pretty much everything is contained in a function because i use force inlines so the abstractions come at zero cost
and everything is explicitly named to lessen cognitive overhead
>>
>>106748402
>>106749083
oh, and comment everything that looks even vaguely ambiguous
i have done my share of rewrites because after leaving a project for a while i come back and cant comprehend what, why, and how i wrote my program
>>
>>106748982
>especially when you use functions as a measure to organize your code
>thats why i use force inlines btw
I meant as functions.
I believe the compiler almost never inlines functions that are relatively huge, that would mean inlining the sub-functions inside the main one would do little to nothing in terms of improving performance, and unpack them is not an option
>>
>>106748997
>but if you mean kernel as in cuda kernel, then you definitely have to look into that. because the overhead from launching a kernel can exceed the cuda runtime by 10x, so 90% of your runtime would be launching kernels
That's true if the sub-functions (__device__ functions) you are calling are from outside of your '__global__' function's file/library, that's because of the loss of optimization opportunities available to the compiler.

This is fixed by my pre-compilator 'prepiler' by inyecting the '__device__' functions inside the file it uses it, along with the use of 'static' on the function's declaration so it uses all optimization opportunities avaliable.

I've have tested this method and there is almost no loss on performance
>>
>>106745297
I know long descriptive variables names are all the rage these days, but 2bh I think we need to go back to using short symbols like what you'd use if you were writing out an equation by hand. Long ass lines of gibberish text are so tiresome.
Besides which, with modern tooling you could just hover over a symbol you don't remember and get a description of what it is.
>>
>>106749258
>I believe the compiler almost never inlines functions that are relatively huge
'depends on the settings of the compiler. it can be parametrized. in gcc at least
and if you force inline, then the compiler throws an error if for some reason it is unable to inline a function marked this way. again, with gcc

>that would mean inlining the sub-functions inside the main one would do little to nothing in terms of improving performance, and unpack them is not an option
thats not true
especially if you have a lot of very short functions
if theyre not inlined, you do a jump to the parent function, then another one into each child, possibly even loading a page, most likely with losses because its a given your function's boundaries wont coincide with those of your page.
and then theres the function call overhead. iirc its something on the order of 5-15 cycles. thats 5-15 additions.
if you use functions to organize your code you end up with tons of short tidbits of code where the function call might take longer than your instructions proper

>>106749396
a-hah
didnt know you can do that with cuda.
thats interesting.
>>
>>106749412
>we need to go back to using short symbols like what you'd use if you were writing out an equation by hand

Say no more my brother
>>
>>106749495
>if theyre not inlined, you do a jump to the parent function, then another one into each child, possibly even loading a page, most likely with losses because its a given your function's boundaries wont coincide with those of your page.
>and then theres the function call overhead. iirc its something on the order of 5-15 cycles. thats 5-15 additions.
>if you use functions to organize your code you end up with tons of short tidbits of code where the function call might take longer than your instructions proper
Is it really something to worry about with current hardware?
I have a fairly powerful cpu/gpu and a fast and big RAM
>>
>>106749592
in performance-critical applications, yeah
its not about the absolute time it takes
its about the proportion the function call takes vs the rest of the function

its 5-15 cycles.
if your function takes 150 of them thats 10% of the runtime in just the function call
if your function takes 4 cycles
thats 60% of the runtime
its unrealistically high bc your program wont consist exclusively of micro functions, but added up, these things can realistically account for 10% of the runtime
add to it another 10% from another source
and another 10-20 from another, and you end up with your program running 2x slower than it could for the equivalent code
if your program sits idle for 90% of the runtime like with a gaym or a idk, text editor, it doesnt matter
but if youre crunching numbers and you constantly saturate the compute capabilities of your machine, then it does



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.