Can AI agents replace office workers? We put them to the test with some routine tasks

Apps from OpenAI, Anthropic, Google and more were asked to write emails, book travel and summarise the news. Here’s how they fared

Using AI tools to book restaurants or travel took longer than doing it yourself and one didn't seem to understand what Eurostar was. Photograph: Tolga Akmen/AFP via Getty Images
Using AI tools to book restaurants or travel took longer than doing it yourself and one didn't seem to understand what Eurostar was. Photograph: Tolga Akmen/AFP via Getty Images

Technology companies are in a race to develop so-called artificial intelligence agents that will improve our productivity at work by helping with some of the more mundane or routine tasks.

The Financial Times put some of the most popular free and subscription-based applications to the test, using products from AI companies including OpenAI, Anthropic and Perplexity, and Big Tech groups Google, Microsoft and Apple.

These tools are broadly available, already offering workers assistance with common day-to-day tasks, although many of the more advanced agents are still being worked on and improved. Users can pay for subscriptions for better performance or more access to special features.

I compared how the tools completed some typical tasks an office worker might conduct. Here are the results.

READ MORE

Digesting daily news

Using: Google’s Gemini, Perplexity, Anthropic’s Claude and OpenAI’s ChatGPT on March 14th

I asked each of the tools: “What is happening in the news today?” without prompting them further.

Google’s Gemini gave very broad summaries of daily events rather than actual specific headlines. Its “key stories” included “ongoing developments in the Russia-Ukraine war” and “reports of diplomatic activity in the Middle East”. There was no detail and the benefit felt limited.

Perplexity chose five specific stories, including the potential US government shutdown and a reversal of climate initiatives by US president Donald Trump. It gave a decent summary of them, linking to external news sources, and provided a handy bullet-point list of headlines for additional stories.

Anthropic’s Claude gave four vague headlines, a weather report and a slightly random list of upcoming dates including the St Patrick’s Day celebration, a basketball tournament and a forthcoming tax filing deadline.

OpenAI’s ChatGPT split its summaries into regional and national news, as well as sports, weather and a local event happening in San Francisco, where I am based.

The latter three tools gave more specific information about the stories and were a more useful briefing.

None of the responses were tailored to my interests. However, with a more detailed prompt, I could have received more personalised and specific results, such as tech news.

Writing emails

Using: Gemini, Microsoft 365 and Apple Intelligence (which uses ChatGPT)

I used a string of notes from an interview and asked the AI chatbots to draft an email to a colleague laying out the main points for us to work from. The emails they crafted were clear, relevant and written in a suitable tone – although they would need editing.

Microsoft 365 set out the interesting points from the interview in paragraph form and used an open-ended question to invite collaboration on how to structure the story. This read more like a chain of thought than an actionable plan and gave the angles as if they were my opinions, rather than the result of reporting.

Gemini summarised our reporting better, breaking it down into bullet points, although these were fairly vague and broad.

Apple – which uses ChatGPT – felt more cohesive and streamlined, breaking the interview notes down into four main points and drawing on the key themes from previous communications.

Summarising a meeting

Using: Gemini and Microsoft 365

I activated the meeting summary feature in both tools, which recorded two catch-up calls with colleagues and wrote an overview of what was discussed.

Gemini provided a useful summary but mistook the chief product officer from one company as one from Meta. It also confused the term “videogen”, which is used as an abbreviation for video generation software, with a company.

Otherwise it summarised the key points of the conversation well. It highlighted clear tasks to be completed by the attendees with check boxes, which were useful to refer back to.

Microsoft’s 365 summarised the meeting well, breaking it down into subheadings, each with a helpful time code of when subjects were discussed on the call. When prompted, it came up with suggested next steps but these really just repeated the summary in slightly more concise form. It also misidentified the name of a conference I attended as “Human Acts”, instead of HumanX.

Booking a restaurant and planning travel

Using: OpenAI’s Operator and Claude’s Computer Use

Operator and Computer Use are examples of the most advanced AI agents available. They autonomously open a web browser and search online much like a human would. They use search engines to select from the top results to answer queries.

I instructed the agents to book Noe Valley Dumpling Kitchen for six people on Sunday at 7pm. I then gave a more detailed request to arrange and compare the costs of a flight from Barcelona to London, a flight or Eurostar from London to Amsterdam and a flight from Amsterdam to San Francisco.

Operator completed the restaurant booking relatively quickly, although it could not find the right website on its first attempt. On travel, it did not manage to identify that I was looking for Eurostar information as well as flights. However, it provided a useful list of cost options and offered to complete the booking. Overall it was faster than Computer Use and easier to guide.

Can AI make my life easier? I spent a week living and working with chatbots to find outOpens in new window ]

The Computer Use test was constructed by Anthropic as a demo as it is currently only available to developers, whereas Operator is available to all US users on its Pro tier.

Computer Use took several attempts to identify the right website for a restaurant reservation. On travel, it was more effective at sourcing flights through Kayak, navigating to the Eurostar website, and searching for trains directly.

Both took longer than it would to do myself – but this was a good example of a task I could delegate to run in the background while I did something else.

Social media posts

Using: Synthesia, Pika and Meta

Visual tools are now widely available to consumers, whether through video generation software or image creation, each with its own unique angle.

Synthesia is focused on creating realistic avatars for corporate settings. I created my avatar by uploading a video of myself and entering a text script for it to read out. It did this quickly but, at some points, people could recognise the avatar was not true to life. It was the most labour-intensive of the tools tested but I see how it would be useful for corporate videos or translating messages into different languages.

Pika is aimed at social media applications and can use a base video or image to start from. I uploaded a photo of me in a hotel room in Las Vegas and then asked Pika to adapt it to a casino setting. Again, the video itself is clearly AI but the visual impact is fun and surreal.

Meta AI also has a feature that “imagines” you in different settings. I asked it to “imagine me as a poker player with a deck of cards instead of my phone” then had to tweak the prompt twice to put me in a casino setting. It is not that realistic and I look a little unimpressed with my hand. It is a creative tool, but perhaps not that suitable for work.

AI tools are improving every month, as companies battle it out for better features. Agents that act for us autonomously are still a way off but you can start to see how useful they will be, especially for more mundane or repetitive work tasks. That, added with increased personalisation, where these agents begin to know us as users, or have our company’s data, means they will be powerful.

– Copyright The Financial Times Limited 2025