Thanks for the feedback :) Indeed, these points are at the core of what makes this problem…

1 min readDec 31, 2020

Thanks for the feedback :) Indeed, these points are at the core of what makes this problem difficult. However, I would add that they tacke the issue in very different directions.

Not knowing how many objects there are a priori is not a solvable problem, we can’t design algorithms that know the future, it is a nuisance, something we ought to elegantly handle. Yet, there is a hidden problem: how many objects are relevant? See, if you show an image to several people and ask them to enumerate its objects, you will get a different answer every time, as people tend to consider some objects more relevant than others and to unknowingly skip large background objects. Some will say only the most striking objects while others might more thoroughly list what is in there.

The second issue is more technical: how to effectively train dynamic networks. While many architectures already do this to some extent, the main issue, as far as I know, is getting the “end-to-end” part right. Although I believe this can be solved by an elegant design, I believe the current tech stack behind neural networks today is too focused on rigid, predictable models to really tackle this properly.

Written by Ygor Serpa

No responses yet