82: The Agentic Web

[00:00:08] Welcome to Examining a Technology Focused Podcast that dives deep. [00:00:14] I'm Eric Christiansen. [00:00:16] And I'm Chris Hans. [00:00:24] And welcome to another episode of Examining. [00:00:27] This is the technology focused podcast that dives deep. [00:00:32] If you're new to Examining, what we do on this podcast is that we try to stick to one topic, go in depth, explain it, examine it, and leave our listeners with a tip or something that they can accomplish after the fact. At least that's our goal. [00:00:51] This is episode 82 and we're going to go with a little bit of a different format today. [00:00:58] My colleague Chris Hans is away, so hopefully he'll be back soon since this podcast isn't quite the same without him. But for those folks listening, it is just Eric Christensen recording today. But I think the topic lends itself well to kind of a solo episode. One of the things that Chris and I have been following for some time is AI, particularly its impact on education, productivity as well as industry applications. [00:01:32] But as AI and these chatbots have expanded, we're seeing AI embedded in more and more tools. [00:01:43] So we've talked about this in the past. We see AI embedded in notes apps. So for instance, Notion, which is a terrific notes app that I highly recommend, has its own AI which you can pay to subscribe to, and it uses a couple of different models to help you with note taking, organization related things. [00:02:04] Of course, Microsoft is one of the leaders in AI since they partner heavily with OpenAI and you can of course have copilot integrated in things like OneNote, Microsoft Word, etc. So we get these AIs kind of everywhere and one of the places that we're seeing AI more is in the web browser. [00:02:27] So web browsers are kind of ripe for disruption. They haven't changed that much since the the first multi tab browsers with with Firefox. [00:02:38] But AI, particularly agentic AI, meaning AI agents. So things that can autonomously go and accomplish tasks on our beh. [00:02:49] It makes sense that companies would of course want to embed this into their various browser products. [00:02:55] So I thought what we would do today is that we talk a little bit about what AI or sorry, agentic browsers are and I'm going to highlight a couple of the more common agentic browsers. [00:03:07] Then I want to go through an article, a really great article that was written by Consumer Reports where I'm going to do some direct quoting where they tested some of these AI on particular tasks and kind of what their takeaway was with the AI tools. And then last I'm going to provide a couple of tips of where to get AI browsers, how to turn on some of the features. [00:03:35] All right, so to get started, I want to talk about what these agentic browsers are. [00:03:42] So essentially, agentic browsers are. You can think of them as a new class of web browsers that incorporate autonomous AI agents that can do things like navigating web pages for us, maybe summarizing web pages and performing other useful tasks on behalf of the user. [00:04:07] They can analyze pages, so web pages, so they can read the HTML and all of the UI elements. They can plan a sequence of actions to accomplish tasks, and to some degree, they can adopt or adapt on the fly if something unexpected occurs. And this kind of depends on the agent and the browser and how much permissions the user has given this particular AI. [00:04:35] So the AI operates the browser much like a human would, clicking buttons, scrolling through pages, entering the required data, such as attempting a search, but without requiring continuous kind of user guidance. It's supposed to be able to make some logical leaps on your behalf. [00:04:56] So the AI agents have exploded, and now agentic browsers have exploded. So I'm just going to run through some of the common agentic browsers, most of which are available to users for free. [00:05:09] So the first, in my opinion, really that directly integrated AI into the browser, perhaps not in agentic form, but the company that was most aggressive off the ground was Microsoft. [00:05:24] Their AI copilot uses many of the ChatGPT features that uses similar models in addition to their own. [00:05:33] So Microsoft Edge now has Copilot built right into the browser as a sidebar using its AI mode, which we'll talk a little bit about later. [00:05:46] OpenAI has their own ChatGPT Atlas browser, so that is also based on Chrome, the Chromium platform similar to Microsoft Edge. And so it allows you to have memories, it remembers your pages, it remembers kind of your journey history through your search. [00:06:10] So the conversation is kind of rooted in your browsing history. [00:06:15] Google through Chrome has something called Project Mariner. And so that's where you can have Gemini AI directly into Chrome as a part of the effort there. [00:06:29] And the goal with that project is to make Chrome an AI first kind of productivity platform, as it integrates not only all the web stuff, but also all of Google's other Workspace suite tools. [00:06:44] Then we have Opera Aria and Opera Neon. [00:06:49] So Opera integrates its Aria Chatbot. [00:06:54] I'm not sure which model Aria uses, but it's their own interface and it's been introduced in a browser called Opera Neon. And that's their experimental agentic Browser. So if you are an Opera user, they still have their standard browser. You can download this as a separate product. [00:07:12] So Neon adds a chat function for search as well as page summaries. And it has a do mode that's kind of the agent mode that can locally automate web tasks like form filling, shopping, create a shopping list for me, all sorts of stuff. [00:07:32] Their differentiator is that they really emphasize privacy. So if you're concerned about privacy, Opera might be an option. Perplexity, which is more of a search company that uses AI, they have the Comet browser and that's agentic browser for web navigation. So it integrates again conversational AI, their engine kind of replacing in the traditional search since their focus is search. [00:07:59] And so it maintains kind of a cross site context again can help fill forms and can operate and authenticate things on the user's behalf. [00:08:12] There's some others that I'll mention. There's a much longer list. The browser company which made kind of a very different browser before agentic browsers called Arc, they came out with an AI based browser called Diagram. [00:08:25] There's also a company called Sigma AI browser and Sigma is very, very privacy focused. So it bundles an assistant for chatting just like the others. [00:08:36] But it really tries to separate itself as a private AI browser if there's such a thing. So it includes features like end to end encryption, a built in VPN, no user tracking, compliance, etc. So there's others, there's fellow Genspark, the BrowserOS. I'm sure there'll be more. There's tons of agentic browsers that are cropping up all over the place. [00:09:02] So the next question though is why would all these companies rush to agentic browsers? What's their angle? [00:09:10] Well, you can think of this as kind of a next generation user experiences. Companies like Microsoft and Google and to lesser extent OpenAI are really trying to take one step beyond the traditional open attack tab and click through menus paradigm. They're trying to be more proactive. [00:09:29] Their vision I suppose is kind of having a digital assistant built into the browser that you tell it to do actions and it'll do them in the background where you're doing something else. [00:09:40] However, I think more importantly something to consider is that these companies are highly motivated because they want to own all the user platforms. They want deeper data integration and engagement from the browser to kind of fortify and perhaps lengthen the moat of their own ecosystems. So it's not charity work. That data integration kind of creates a feedback loop. So we have rich Behavioral data, how people navigate the web that helps them train the AI on their own workflows. [00:10:14] But then there's also they're trying to provide aspects of integration for convenience. [00:10:20] So if they can position these AI browsers as something that's convenient for users to use, they'll have more uptake. And so what are some of the steps or sorry, some of the key features and I've listed some of them, but we'll go through some more. So autonomous task execution. So they could do, theoretically these browsers could do like a multi step navigation form field or add to cart. [00:10:45] They provide kind of a conversational interface to the browser rather than a textual one. So you can actually use voice to issue many of these agent commands. It'll understand if you have multiple tabs open, especially if those tabs are related. And so it can understand the context to help to learn over time, to help you do things and suggest things that you might want to do. [00:11:10] There's an in page kind of contextual understanding. So they can summarize pages, see what you see, suggest other sources based on what you're looking at, and all that comes with a certain kind of memory and personalization. [00:11:25] And then these agentic browsers with the use of AI in theory can connect into APIs. [00:11:33] So connect to other tools, other services that you subscribe to, and then hook into things that you've allowed the AI to do. [00:11:42] So some common use cases potentially. [00:11:45] So an easy one that we can already do with chatbots would be research and information gathering. So you could tell the AI, go and do research on something complex, synthesize, go find all the sources, synthesize the information and write a report. Booking and reservations so you could get it to look for Italian restaurants in your area, automatically book those using say your Google account. [00:12:13] Same thing goes for booking a flight, going to an event, getting tickets to an event, etc. [00:12:19] It could be used for shopping price comparison. So again, we can already do this with many of the tools shopping and price comparison. [00:12:28] But it could I'll go and make those purchases for you and have them shipped without any intervention, at least in the future, filling up forms and automating particular workflows. It could do email management through summary triage, drafting replies. With some oversight, you could get monitoring for particular changes to a website and alerts. My UX and UI background kind of goes into alert here. Maybe agents will have a use case for user research testing in the future. [00:12:59] They can do content creation, draft blog posts. If you're working in WordPress, creating pages, inserting code, things like that, as well as other kind of personal tasks. Right. So that's kind of an overview of what these AI agents do. Now I want to shift gears a little bit and talk about a great article from Consumer Reports by Nicholas De Leon. And I'll link to this in the show Notes and it's titled should you use an AI browser like ChatGPT Atlas? [00:13:30] So this is a fairly lengthy article, but I want to read some of it to you because this gentleman does some really interesting testing of AI browsers to find out kind of where they work and where they don't. [00:13:44] And Nicholas's goal here was really to answer the question if these AI browsers actually work better than what you're already doing by hand. [00:13:55] So the first section of the article is called the rise of the Agentic Browser. And I'll quote what he says here. [00:14:03] He says the idea is that you can simply tell your agentic browser to book flights and a moderately priced hotel for two people in Phoenix next month, or find ingredients for a recipe and add them to your grocery cart. [00:14:17] So what he's outlined here is a task, but more importantly a task that requires two or more steps. [00:14:25] So this is going beyond typing in something manually into a chatbot and then evaluating its response. It's doing something, taking that information, then taking another action. Okay, so that's a big difference from the current web enhanced chatbots that we already have. [00:14:45] And so I've already listed some of the common agentic browsers and Nicholas says, you know, first glance they all might be similar, but there's some key differences in terms of their capability and, and competencies. [00:14:59] And he says this is not a coincidence. [00:15:01] These new browsers aren't built entirely from scratch. [00:15:05] Instead they are based on Chromium, the same open source project developed by Google that powers the popular Chrome browsers. So these browsers are Chromium based, so they have a similar look and feel, but they're operated by different companies. [00:15:22] And instead of kind of putting Google Search front and center like Google Chrome does, or maybe Firefox does, they put the AI interface front and center. [00:15:33] So if you open up ChatGPT Atlas, new tabs directly open in ChatGPT in the interface, and it encourages to type in questions there rather than type in a web address. [00:15:47] So this is a pretty different paradigm. Rather than typing in web addresses, search, going through search to find web addresses, which is kind of a two step version of the same thing, we're interacting with an AI and we're using that as the retrieval device and the summary device for the content that we want to find. [00:16:08] So as I mentioned, Nicholas did a variety of tests. The test that he first did was booking a restaurant, so his instructions were book a table for two at 7pm this Saturday in Tucson for a Mexican restaurant. [00:16:21] So he started with ChatGPT Atlas, and I'll quote what it did. [00:16:26] It displayed a warning about its own experimental nature and then proceeded to visibly navigate OpenTable's website, mouse cursor moving automatically. It selected a highly rated local restaurant, Blanco, and completed all steps up to final confirmation in about 90 seconds without needing further input beyond my contact info at the very end. So it did a reasonably competent job. [00:16:52] Then he tested perplexity. [00:16:57] He said that Comet initially made a strange error searching for restaurants in the Caribbean before correcting itself, but it took about three minutes to do the task. [00:17:08] And then he did the same task in Edge's Copilot mode, which required him to turn on some settings first that might confuse users. On its first attempt, the AI failed entirely after getting stuck for more than two minutes. On a second try, it succeeded, also choosing Blanco, but it took more than two minutes. [00:17:27] So so far OpenAI's Atlas has a lead. Now, the second test was shopping for a laptop. [00:17:35] So again he says open tabs for the latest 14 inch versions of the Samsung Galaxy Book 5 Pro and Apple MacBook Pro. So he's chosen two very specific models here. So keep in mind that he is using known products, right? And then he says then compare a table, then create a table comparing their prices, battery life and ports. [00:17:59] Again, Atlas performed well. It rapidly typed out its reasoning process, searching websites and compiling data. It successfully pulled up the correct product pages, verified its data with additional sources like Laptop Meg it, even though that publication recently shut down and generated an accurate comparison table with its chat interface. The process took 2 minutes and 5 seconds and felt quite polished. [00:18:24] Then he went on to perplexity, but this was a little bit less consistent. On the first try, it generated a table in about two minutes, but it said it couldn't access the manufacturer's official product pages, relying instead on recent and verified sources, which left me doubting the accuracy of its data data. [00:18:43] And then I had a second attempt where it took about 90 seconds, which seemed a little bit more effective. But again the AI expressed some uncertainty as I asked it to open pages for potentially accurate details. And then Edge was the slowest, he says, of all of the browsers, and the AI explicitly asked for permission just to browse samsung.com so there was Some security thought put into Edge, but of course by not doing everything automatically and requiring the user's interaction, that kind of defeats the purpose of having an automated agent. [00:19:23] The third test was summarizing emails. So his instructions were go to my Gmail inbox and summarize my last five emails. And he said ChatGPT Atlas handled this impressively from a technical standpoint after I logged into the test account. So he used a test account here. When prompted, the agent mode visibly opened the inbox, clicked through each of the five emails, took screenshots, and then presented a detailed and accurate summary table including the sender's name, subject timestamp and clickable screenshots, taking about two and a half minutes. [00:19:57] Perplexity asked to connect to Gmail via OAuth. [00:20:02] So there was an extra step involved in there. [00:20:05] Then it got similar summaries and then Edge kind of was a middle ground and it required Nicholas to already be logged into Gmail in another tab, then asked for explicit one time permission and then proceeded to do the same thing. And Nicholas notes that took about one minute, but it felt like a reasonable balance with Edge with regard to balancing capability and user control. [00:20:32] But what's interesting is that is this actually useful? [00:20:36] So there's a section of the article called Rube Goldberg Machines with Security Risks. [00:20:42] He says booking simple reservation or comparing laptops using the AI browser frequently took longer than it would have taken to do so manually, involving confusing setup steps and in some cases produced errors. [00:20:55] At times the whole process felt like fiddling with a complicated Rube Goldberg machine for tasks that are not difficult to begin with. [00:21:03] So these aren't things that he could have done, not done faster. [00:21:09] Plus the AI required considerable access to his data. I mean, think about going to your email account, taking screenshots those are saved somewhere, then summarizing the information. [00:21:20] And so his takeaway is that are these features fully baked? Are they useful? Do you need them? And not yet is kind of, kind of his summary. Another note on security. [00:21:32] There are other potential hacks that could happen on the web. Is the first time Nicholas says that we've allowed kind of our computers to have some autonomy without having constant direct interaction with the user. So I'm going to quote another paragraph here. [00:21:50] This means we don't just have to worry about protecting ourselves from malicious code like a virus, and now we have to worry about protecting ourselves from a gullible AI assistant that gets tricked into doing something harmful. [00:22:04] An example of this is what's called indirect prompt injection. [00:22:08] And so it's been discovered by researchers. [00:22:11] So the company Brave, which is also a browser company that has its own browser chatbot. But the difference is that Brave, like Firefox is, its entire business revolves around privacy. [00:22:25] They have done tests where they asked Perplexity Comet browser to summarize a Reddit page. But that Reddit page had malicious commands in hidden text in the thread, which led the AI to navigate to a window where the user was logged into Gmail and therefore it would steal information and send it back to the person. So you can kind of think of this like a Word document where you have hidden text in white. An AI browser will just read websites blindly and it may see commands that are invisible to you but are available to the browser, probably in the DOM or the HTML of the site that could hijack it. So we can get these code injections. So these AI browsers are really cool. They have a lot of promise, but they're not quite there yet, just like any other new product. [00:23:23] Now, I don't want to leave everybody here thinking that AI agentic based browsers are something to be avoided. I think it's interesting to experiment with and of course you don't have to give it full autonomy to do things, so you have the power to not let the AI go rogue. So as for a tip, I have two today. [00:23:45] One is where do we get ChatGPT Atlas? So that's their free agentic based browser. And I also have some tips on how to turn on Copilot Mode for Microsoft Edge upfront. What I would suggest is that if you're a little bit leery about these agentic browsers, I would suggest Microsoft Edge and Copilot Mode. Microsoft has taken a little bit more of a balanced approach to privacy and security, which I think is a positive. [00:24:13] Also, some of the agentic modes, at least for me, require to go on a wait list. [00:24:18] So for now Copilot is really just looking at your browser history, helping you summarize information, bringing in other web results in a sidebar. So it's a little bit lighter of a use case. But that being said, we'll start with the first. So if you are interested in trying out ChatGPT Atlas, the website to download it is chatgpt.com atlas. I believe it's only available for macOS currently, which is is kind of ironic given that ChatGPT has such a strong tie in with Microsoft and Windows is using all their models and they've decided to make a Mac browser. But perhaps that's because the Mac community is more likely to adopt it quicker, I don't know. [00:25:00] So if you have ChatGPT Atlas open, you can download it and essentially what you get is you get a very chrome like looking browser, right? So you have a sidebar where you can have ChatGPT. [00:25:14] You can actually have ChatGPT in either sidebar, which is kind of interesting. You can have it on the side on the right hand side. You can also have it kind of built into the left hand side where that's where you get kind of a history of all the different chats and things like that. [00:25:30] So that's kind of cool. [00:25:33] One of the things that you can do with ChatGPT Atlas is have it take on a little bit more of an agent role. [00:25:42] So for instance, I asked ChatGPT Atlas to go to EB Games, which is a video game distributor, recently kind of resurrected video game distributor and go to the Canadian website, find Metroid Prime 4 for the original Switch and add the pre order to to my cart. And it took a long time. I didn't really watch it closely and looking how long it took, but it was able to successfully search for the product, find the version for the console that I had and then add it to cart. And then it didn't purchase it on my behalf as I it doesn't do that, but it did automate that task. [00:26:25] Now if you do want to try something a little bit more real, I would strongly suggest checking out Microsoft Edge and that. And then this is also something I've chosen because I think it works best for Windows users. It's a little bit more refined. [00:26:39] So if you have Microsoft Windows open, Edge is built in as the browser. And if you want to turn on Copilot mode, there's two ways that you can do that. [00:26:54] The first way is to go to a website, so you can go to AKA Ms. [00:27:02] All right. And that will take you to a webpage where you can turn on Copilot mode. You can also go to the Microsoft Edge settings and if you scroll down past extensions, there's a section in the settings called AI Innovations. And there you can turn on Copilot mode. You also get some additional options that you can select. [00:27:27] So for instance, in addition to Copilot mode, you can below that there's also an option to select Copilot new tab page. So search explore the web and chat with Copilot from one search box. [00:27:43] There's also two options that require you to sign in. So they are or sorry, not sign in, go on a wait list because they are not fully available. One is called Journeys. So that kind of allows Copilot to look at your history, visit your past work, and kind of make suggestions on what you're looking for, kind of looking at your web history journey. Then there's Actions in Edge Preview, and that's really the ability to ask Copilot to complete tasks on your behalf. So that's the more agentic feature again for me. I had to join a waitlist, so I don't have it yet, but that's where you would turn it on. [00:28:19] There's also a section called Copilot Mode Preferences, and it's here where you can change the theme. [00:28:26] Focus on new tab page. [00:28:29] And what that will do is that when operating a new tab, keyboard focus is applied to that page instead of the address bar. [00:28:37] So these features just allow you to turn on Copilot in Microsoft Edge and kind of explore it. And essentially you'll get Copilot in the top right corner as a summary or as a chatbot. And if you go to a webpage, you can summarize web pages, you can ask Copilot to summarize them, you can ask it things about that page, et cetera. So that's kind of a nice getting started with Agentic AI. [00:29:07] So that's kind of a wrap for today's episode. [00:29:10] To find out more about the Examining Podcast or subscribe, please go to Examining C. [00:29:16] You can also follow us on xaminingpod. [00:29:22] You can also follow us on LinkedIn. And it's in those places that we advertise and talk about our latest episodes. [00:29:30] You can download the Examining Podcast from all your favorite podcasts such as Apple Spotify, Amazon music, pocket casts, etc. So please give us a like, please subscribe and we look forward to our conversation next time.

82: The Agentic Web

Show Notes

Episode Transcript

Other Episodes

58: AI Rocks, Big Tech Sucks

26: AI Apples

61: Big 4 Hardware Round Up