Microsoft Copilot Vision is here, letting AI see what you do online

Microsoft Copilot Vision is here, letting AI see what you do online


Subscribe to our daily and weekly newsletters to receive the latest updates and exclusive content on industry-leading AI reporting. Learn more


Microsoft Copilot is getting smarter every day. The Satya Nadella-led company just announced that its AI assistant now has “vision” capabilities that allow it to surf the web with users.

While the feature was first announced in October this year, the company is now rolling it out to a select group of Pro subscribers. According to Microsoft, these users will be able to trigger Copilot Vision on web pages opened in their Edge browser and interact with it regarding the content visible on the screen.

The feature is still in early development and is fairly limited, but once fully developed, it could prove to be a game-changer for Microsoft’s enterprise customers, helping them analyze and make decisions when they interact with products that the company offers offers its ecosystem (OneDrive, Excel, SharePoint, etc.).

In the long term, it will also be interesting to see how Copilot Vision compares to more open and powerful agent offerings, such as those from Anthropic and Emergence AI, which allow developers to integrate agents to see, reason about, and take different actions across applications seize providers.

What to expect with Copilot Vision

When a user opens a website, they may or may not have a specific goal in mind. However, when this is the case, for example when researching for an academic paper, the process of completing the desired task is to go through the website, read all of its content, and then access it (e.g. whether the website’s content is available as reference should be used). the paper or not). The same goes for other everyday web tasks like shopping.

With the new Copilot Vision experience, Microsoft aims to simplify this process. Essentially, the user now has an assistant that sits at the bottom of the browser and can be accessed at any time to read the website content, cover all text and images, and help with decision making.

It can instantly scan, analyze and provide all the necessary information while taking into account the user’s intended goal – just like a second pair of eyes.

The feature has far-reaching benefits – it can speed up your workflow in no time – and significant impact, as the agent reads and evaluates everything you’re browsing. However, Microsoft has assured that all context and information shared by users will be deleted once the Vision session is closed. It was also noted that the websites’ data is not collected/stored for training the underlying models.

“In short, we prioritize copyright, creators, and our users’ privacy and security (sic) – and put them all first,” the Copilot team wrote in a blog post announcing the feature’s preview.

Expansion based on feedback

Currently, a select group of Copilot Pro subscribers in the US who are enrolled in the early access Copilot Labs program can use Vision features in their Edge browser. The feature will be optional, meaning they won’t have to worry about the AI ​​constantly reading their screens.

Additionally, the feature currently only works with select websites. Microsoft says it will gather feedback from these early users and gradually improve functionality while expanding support to more Pro users and other sites.

In the long term, the company could even expand these features to other products in its ecosystem like OneDrive and Excel, allowing business users to work and make decisions more easily. However, there is no official confirmation yet. Not to mention, given the cautious approach signaled here, it may take some time to become a reality.

Microsoft’s move to release the preview of Copilot Vision comes at a time when competition is raising the bar in the agent AI space. Salesforce has already introduced AgentForce in its Customer 360 offerings to automate workflows in areas such as sales, marketing, and service.

Meanwhile, Anthropic has introduced Computer Use, a feature that allows developers to integrate Claude to interact with a computer desktop environment and perform tasks previously only done by human employees, such as opening applications , interacting with interfaces and filling out forms.

Leave a Reply

Your email address will not be published. Required fields are marked *