Abstract: Video-text retrieval is a crucial task in numerous computer vision applications. In this paper, we focus on video-text retrieval involving complex action compositions, where a single video ...
Abstract: This study addresses the problem of generalized category discovery (GCD), an advanced and challenging semi-supervised learning scenario that deals with unlabeled data from both known and ...
Learning image representations that respect the intrinsic geometry of data is crucial for capturing hierarchical semantic structure, yet generative transport is typically performed in Euclidean spaces ...