Aperture is an independent lab studying visual agents. We teach machines to see, reason, and act — converging perception and decision into a single aperture.
Aperture 是一个研究视觉智能体的独立实验室。我们让机器学会观看、推理与行动——把感知与决策,收束进一道光圈。
Explore Research了解研究 ↗Founded in 2024, Aperture Lab is a small research team devoted to visual agents — closing the loop between perception and decision. We're committed to open research: open code, transparent thinking. Aperture Lab 成立于 2024 年,是一支专注于视觉智能体的小型研究团队。我们关注感知与决策的闭环——如何让模型从像素中理解世界,并据此采取有意义的行动。我们坚持开放研究:代码开源、思考透明。
Help models see scenes like we do: open-vocabulary detection, video understanding, and stable long-context visual memory.让模型像人一样看懂场景:开放词表检测、视频时序理解、以及在长上下文里稳定地"记住"看过的东西。
Wire perception to decision: agents that plan multi-step tasks, call tools, and reflect when they fail.把感知接到决策上:能规划多步任务、调用工具、并在失败时自我反思的智能体框架与评测基准。
Let agents rehearse the future in their heads — interactive video world models for imagination, planning, and risk-free trial.让智能体在脑中"预演"未来:可交互的视频世界模型,用于想象、规划与无风险的策略试错。
Put agents into real bodies: vision-language-action models that drive robot arms and desktop GUIs alike.把智能体放进真实身体:视觉-语言-动作(VLA)模型,让机械臂与桌面 GUI 都能被自然地操控。
Detection, segmentation, video understanding and generation — making machines truly see. The starting point of everything.检测、分割、视频理解与生成——让机器从像素中真正"看见",是一切的起点。
Planning, tool use, memory and self-reflection — wiring perception to action so models become agents that get things done.规划、工具调用、记忆与自我反思,把感知接到行动上,让模型成为会做事的智能体。
High-throughput, low-latency LLM inference and serving, so research actually runs — fast and affordable.高吞吐、低延迟的大模型推理与部署,让研究成果真正能跑起来、用得起。
Open the black box — understand why a model decides, so agents are not only effective but worthy of trust.打开黑盒——理解模型为何这样决策,让智能体不只是有效,也值得被信任。
A SOTA-level memory framework for agents. Retrievable, forgettable, evolving long-term memory — remembering you across sessions, never going blank mid-task, making "remembering" a first-class citizen.一套 SOTA 级别的智能体记忆框架。让 Agent 拥有可检索、可遗忘、可演化的长程记忆——跨会话记住你,在长任务里不再失忆,把"记得"做成一等公民。
A vertical algorithm framework for passerby removal. Built for travel shots, street scenes and footage cleanup — erase passersby and clutter in one tap, auto-inpaint a clean background. Works on photos and video.一个垂直领域的「路人消除」专用算法框架。专为旅拍、街景与素材清理打造——一键抹除画面里的路人与杂物,自动补全干净背景,照片与视频都能用。
We're not accepting applications at the moment — but the lab is growing fast. Follow along, and new openings will appear right here.我们暂时不接受投递。但实验室在快速成长,欢迎持续关注——新的职位很快会出现在这里。