Cambridge, MA — In a groundbreaking collaboration, researchers from the Massachusetts Institute of Technology (MIT) and Harvard University have unveiled a pioneering open-source framework, FAn, to revolutionize real-time object detection, tracking, and following. The team’s paper, titled “Follow Anything: Open-set detection, tracking, and following in real-time,” showcases a system that promises to eliminate the limitations of existing robotic object-following systems.
MIT breaks barriers with FAn: an open-set approach to object following
The core challenge addressed by FAn is the adaptability of robotic systems to new objects. Conventional systems are confined by a closed-set structure, only capable of handling a predefined range of object categories. FAn defies this constraint, introducing an open-set approach that can detect, segment, track, and follow any object in real-time. Notably, it can dynamically adapt to new objects through inputs such as text, images, or click queries.
A multimodal marvel: FAn’s fusion of ViT models and real-time processing
One of the pivotal features of FAn is its multimodal functionality, enabling inputs from diverse sources like text descriptions, images, and queries. The researchers have ingeniously amalgamated cutting-edge ViT (Vision Transformer) models into a cohesive unit, rendering FAn capable of processing data in real-time. The framework’s versatility extends to its applicability in robotics, particularly on micro aerial vehicles, promising practical deployment.
Critical to FAn’s efficiency is its re-detection mechanism, which addresses situations where tracked objects become occluded or tracking is interrupted. This is an advancement over existing systems that struggle in such scenarios, enhancing the framework’s robustness in real-world scenarios.
The research team defines FAn’s primary objective as keeping an object of interest within a robotic system’s field of view. Achieving this involves complex orchestration of various models and techniques. The segment anything model (SAM) undertakes segmentation tasks, while models like DINO and CLIP efficiently assimilate visual concepts from natural language. To ensure lightweight yet effective detection and semantic segmentation, the researchers have devised an innovative scheme. Furthermore, FAn leverages models like (Seg)AOT and SiamMask for real-time object tracking. A notable addition is the introduction of a lightweight visual servoing controller designed for precise object tracking.
Unveiling FAn’s potential: zero-shot detection, tracking, and real-time performance
The team’s experiments showcased FAn’s prowess in zero-shot detection, tracking, and following across a variety of objects. The system seamlessly adhered to the research objectives, demonstrating its ability to perform effectively in real-time scenarios.
The implications of FAn’s capabilities are substantial. Offering an end-to-end solution for object following, FAn’s open-set design allows it to accommodate a broad spectrum of object categories. This, coupled with its multimodal inputs and real-time processing proficiency, positions FAn as a versatile tool with adaptability to new environments. Furthermore, the research team’s commitment to openness is evident in their decision to release the FAn framework as an open-source resource. This move is expected to foster innovation and collaboration across a multitude of real-world applications.
Those interested in exploring the FAn framework can access its code on the project’s GitHub repository. The comprehensive insights into FAn’s design, implementation, and results can be found in the research paper “Follow Anything: Open-set detection, tracking, and following in real-time,” available on arXiv.
MIT and Harvard’s FAn system represents a monumental stride in the realm of robotics and object tracking. Its open-set design, multimodal functionality, real-time processing capabilities, and adaptability to new environments set it apart from conventional closed-set systems. By pushing the boundaries of what robotic systems can achieve, FAn opens up new avenues for innovation and practical applications. Its release as an open-source framework signifies a collaborative approach towards advancing technology, making the FAn system a significant contribution to the field of robotics.