Aperture* APERTURE LAB 光 圈 实 验 室

Aperture is an independent lab studying visual agents. We teach machines to see, reason, and act — converging perception and decision into a single aperture.

Aperture 是一个研究视觉智能体的独立实验室。我们让机器学会观看、推理与行动——把感知与决策,收束进一道光圈。

Explore Research了解研究
PERCEPTION · REASONING · ACTION · WORLD MODELS · EMBODIED AI · MULTIMODAL · AUTONOMOUS AGENTS · INTERPRETABILITY  PERCEPTION · REASONING · ACTION · WORLD MODELS · EMBODIED AI · MULTIMODAL · AUTONOMOUS AGENTS · INTERPRETABILITY 
感知 · 推理 · 行动 · 世界模型 · 具身智能 · 多模态 · 自主智能体 · 可解释性  感知 · 推理 · 行动 · 世界模型 · 具身智能 · 多模态 · 自主智能体 · 可解释性 
Our Belief我们相信,真正的智能始于观看, 成于行动。让智能体不止于回答, 而能在真实世界里 看见、思考、并改变它。

Founded in 2024, Aperture Lab is a small research team devoted to visual agents — closing the loop between perception and decision. We're committed to open research: open code, transparent thinking. Aperture Lab 成立于 2024 年,是一支专注于视觉智能体的小型研究团队。我们关注感知与决策的闭环——如何让模型从像素中理解世界,并据此采取有意义的行动。我们坚持开放研究:代码开源、思考透明。

RESEARCH04 directions04 个方向

What We Research我们在研究什么

From low-level perception to high-level autonomy, we work along one thread: see → understand → act.从底层的视觉感知,到顶层的自主决策,我们沿着「看见 → 理解 → 行动」这条线索展开工作。
01
👁️

Visual Perception视觉感知

VISUAL PERCEPTION

Help models see scenes like we do: open-vocabulary detection, video understanding, and stable long-context visual memory.让模型像人一样看懂场景:开放词表检测、视频时序理解、以及在长上下文里稳定地"记住"看过的东西。

  • Open-Vocabulary
  • Video Understanding
  • 3D Grounding
02
🧠

Autonomous Agents自主智能体

AUTONOMOUS AGENTS

Wire perception to decision: agents that plan multi-step tasks, call tools, and reflect when they fail.把感知接到决策上:能规划多步任务、调用工具、并在失败时自我反思的智能体框架与评测基准。

  • Planning
  • Tool Use
  • Self-Reflection
03
🌐

World Models世界模型

WORLD MODELS

Let agents rehearse the future in their heads — interactive video world models for imagination, planning, and risk-free trial.让智能体在脑中"预演"未来:可交互的视频世界模型,用于想象、规划与无风险的策略试错。

  • Interactive Video
  • Latent Dynamics
  • Imagination
04
🤖

Embodied Interaction具身交互

EMBODIED INTERACTION

Put agents into real bodies: vision-language-action models that drive robot arms and desktop GUIs alike.把智能体放进真实身体:视觉-语言-动作(VLA)模型,让机械臂与桌面 GUI 都能被自然地操控。

  • VLA Models
  • GUI Agents
  • Manipulation
RESEARCH MAP04 fields that meet04 个交汇的领域

Research Map领域图谱

Our work doesn't sit on a single point, but where several fields meet — vision, agents, inference engines and interpretability, pulling on each other.我们的工作不在某个孤立的点上,而在几个领域彼此交汇之处——视觉、智能体、推理引擎与可解释性,互相牵引。
Aperture Vision计算机视觉 Agent vLLM Interpretability可解释性

Computer Vision计算机视觉

COMPUTER VISION

Detection, segmentation, video understanding and generation — making machines truly see. The starting point of everything.检测、分割、视频理解与生成——让机器从像素中真正"看见",是一切的起点。

Agent智能体 Agent

AUTONOMOUS AGENTS

Planning, tool use, memory and self-reflection — wiring perception to action so models become agents that get things done.规划、工具调用、记忆与自我反思,把感知接到行动上,让模型成为会做事的智能体。

vLLM · ServingvLLM 推理引擎

EFFICIENT SERVING

High-throughput, low-latency LLM inference and serving, so research actually runs — fast and affordable.高吞吐、低延迟的大模型推理与部署,让研究成果真正能跑起来、用得起。

Interpretability可解释性学习

INTERPRETABILITY

Open the black box — understand why a model decides, so agents are not only effective but worthy of trust.打开黑盒——理解模型为何这样决策,让智能体不只是有效,也值得被信任。

3Global Branches全球分部
4Research Areas研究领域
2+Products Coming产品即将发布
2024Founded实验室成立
LOCATIONSThree cities, one timezone-agnostic lab三座城市,一个时区无关的实验室

Global Locations全球分部

A distributed team working in relay along the sun — headquartered in Los Angeles, reaching into Europe and Asia-Pacific.我们是一支分布式的研究团队,沿着太阳的轨迹接力工作——总部坐落在洛杉矶,触角延伸至欧洲与亚太。
Global HQ全球总部
🇺🇸

Los Angeles洛杉矶

LOS ANGELES · USA
Global HQ — the hub of research, engineering and coordination. Most of our core work starts here.全球总部 — 研究、工程与协调中枢。我们大部分的核心工作从这里出发。
34.05°N, 118.24°WPST
🇬🇧

London伦敦

LONDON · UK
Europe Branch — interpretability and foundational research, linked into Europe's academic network.欧洲分部 — 聚焦可解释性学习与基础研究,连接欧洲的学术网络。
51.51°N, 0.13°WGMT
🇸🇬

Singapore新加坡

SINGAPORE · SG
APAC Branch — embodied intelligence and efficient inference, connected to Asia-Pacific industry and compute.亚太分部 — 面向具身智能与高效推理,对接亚太的产业与算力。
1.35°N, 103.82°ESGT
WHAT WE'RE BUILDINGComing soon即将发布

What We're Building我们正在造的东西

Two works on the way. One gives agents real long-term memory; the other perfects a single vertical scene.两件正在路上的作品。一个让智能体拥有真正的长程记忆,一个把垂直场景做到极致。
SOTA · COMING SOONSOTA · 即将发布
🧠

Engram

AGENT MEMORY FRAMEWORK

A SOTA-level memory framework for agents. Retrievable, forgettable, evolving long-term memory — remembering you across sessions, never going blank mid-task, making "remembering" a first-class citizen.一套 SOTA 级别的智能体记忆框架。让 Agent 拥有可检索、可遗忘、可演化的长程记忆——跨会话记住你,在长任务里不再失忆,把"记得"做成一等公民。

Long-term Memory长程记忆 Retrieval-Augmented检索增强 Cross-session跨会话 SOTA
Progress研发进度88% · Internal beta88% · 内测中
NEW · IN DEVELOPMENTNEW · 研发中
🫥

Vanish

PASSERBY REMOVAL · VERTICAL

A vertical algorithm framework for passerby removal. Built for travel shots, street scenes and footage cleanup — erase passersby and clutter in one tap, auto-inpaint a clean background. Works on photos and video.一个垂直领域的「路人消除」专用算法框架。专为旅拍、街景与素材清理打造——一键抹除画面里的路人与杂物,自动补全干净背景,照片与视频都能用。

Video Inpainting视频修复 Instance Segmentation实例分割 Background Inpainting背景补全 Vertical垂域专用
Progress研发进度32% · Prototype32% · 原型阶段
VANISH · LIVE PREVIEWDrag to compare拖动对比
Drag the slider to see the same shot before and after passerby removal.拖动滑块,看同一张画面在路人消除前后的对比。
After · removed已消除 ✓
Before消除前
JOIN APERTURE

Come teach machines to see, with us. 来和我们一起,教机器学会观看。

We're not accepting applications at the moment — but the lab is growing fast. Follow along, and new openings will appear right here.我们暂时不接受投递。但实验室在快速成长,欢迎持续关注——新的职位很快会出现在这里。

Explore Research了解研究 Applications closed for now暂不接受投递