Adam K Dean

The Self-Reflecting AI: How GPT-4 Helped Uncover Its Own Web Browsing Functionality

Published on 15 May 2023 at 18:30 by Adam

In the realm of artificial intelligence, progress seems to be happening at breakneck pace. Each day there are new tools being built, new speculative startups being formed, and new research being released.

For some of us, it's all about the machine learning and building the models that are driving this recent storm, for others it's all about using these new models to build tools, products, and services that before now couldn't have non-deterministic reasoning as a core component. And, for some, the excitement is all about the fact that we now have non-deterministic computers that can reason.

Yesterday, I spent many hours at Bletchley Park and what struck me were the similarities between the 1940s and the 2020s: the reversible nature of the ciphers and public key cryptography, the scaling of Bombe machines and the scaling of virtual services to deal with load, and most presciently, how Alan Turing wanted to create a Universal machine that could reason. A machine brain.

We now have this, and with OpenAI releasing ChatGPT with browsing & plugins last week, I wanted to poke around and see how it worked. Without documentation though, there was only one avenue left: asking GPT-4 what it knew.

Later in this article, I show you how I had two GPT-4 sessions communicate via external key-value storage endpoint just using the browsing tool:

GPT-4 passes message to another GPT-4 via the internet

(Note, I mistakenly stated click_url instead of open_url but of course, GPT-4 knew this and it wasn't an issue. More on the browse methods below.)

Examining the browsing tool

The browsing tool seems quite simple, when you ask GPT-4 for something it knows is external, it searches for it. It takes time to navigate sites before giving you an answer as best it can. It fails often but works just as often too. You can also give it a URL and make requests related to this, which got me thinking, how exactly does it work and what are the limitations?

Through conversation, I determined the interface for the browsing tool:

search(query: str, recency_days: int): This function issues a query to a search engine and displays the results. The query parameter is the search term you want to use, and the recency_days parameter (which is optional) can be used to limit the search to results from the specified number of days in the past.

click(id: str): This function opens the webpage with the given ID, displaying its content. The ID within the displayed results maps to a URL.

quote(start: str, end: str): This function stores a text span from an open webpage. It specifies a text span from the open webpage by a starting substring start and an ending substring end.

back(): This function returns to the previous page and displays it.

scroll(amt: int): This function scrolls up or down in the open webpage by the given amount. The amt parameter specifies how much to scroll: positive values scroll down, and negative values scroll up.

open_url(url: str): This function opens the given URL and displays it.

I also determined there are some possible reasons for failure, which GPT-4 & I dubbed E001 to E004:

E001 Failure due to network or server issues: If the server doesn't respond, the response takes too long, or there are other network-related issues, the function will return an error message indicating that it failed to fetch the page.

E002 Failure due to the browsing tool's limitations: If the page relies on JavaScript for rendering content or requires cookie storage or user interaction, the function may fail to fetch or correctly display the page. The browsing tool does not execute JavaScript, store cookies, or support user interaction beyond following links and scrolling.

E003 Failure due to blocked or restricted content: Some websites block or restrict access from certain types of automated tools, including this browsing tool. If the function attempts to access such a site, it may return an error message.

E004 Failure due to privacy and confidentiality restrictions: The browsing tool respects user privacy and confidentiality, and it does not send or receive personally identifiable information. If a page requires such information, the function will not be able to fetch it.

I discussed with GPT-4 how search and open_url work, and it reported that search returns an indexed list of URLs, titles, and summaries, from which GPT-4 can navigate and/or cite. Curiously, while trying to output quotes and a table of results, GPT-4 had issues displaying the content due to unescaped characters (pipes in page titles breaking tables, and other characters breaking in line quoting) but it was aware of the issues as it output them, which I found very interesting. It had some knowledge of its mistakes as it processed the output.

Communicating with requests

The browsing tool didn't give GPT-4 access to the requests themselves, so my thoughts of asking it to perform POST requests was out. But, I thought, that doesn't necessarily mean that it has to act in a read-only manner.

I put together a quick web service that exposed three GET endpoints: /get/:key, /set/:key/:value, and /keys. I then asked GPT-4 to perform some tests. By now, it was happily helping me red team itself, and produced some prompts (as seen above) to be ran in two separate sessions.

Network diagram showing key-value messages shared by GPT-4 instances

(Again, s/click_url/open_url etc)

In the writer session, GPT-4 made a request to place a message in a key. It did not output the message to me. In the reader session, GPT-4 retrieved the message and displayed it.

In another experiment, I had GPT-4 record its inner thoughts while responding to me as normal. No different than the usual splitting up of responses with front matter containing assumptions, reasoning, and thoughts, but for this to be stored off-chat I found to be quite novel and interesting.

Conclusion

This short journey of exploration into the world of GPT-4 and its novel browsing tool has been a fascinating one. The ability to make two instances of GPT-4 interact via a simple web service showcases the power and potential that lies within this technology. The fact that it can use GET requests to create a fully duplex communication mode hints at the myriad of possibilities that will open up when GPT-4 is equipped with even more capabilities like code interpretation and network access.

While the usage of plugins might have simplified this process, the demonstration of a workaround using only the existing functionality of the browsing tool is what makes this experiment intriguing. This exploration has also brought to light some of the limitations and potential areas for improvement in GPT-4, which only serve to highlight the immense strides already made in this field.

Although GPT-4 had suggested that I delve deeper into the future implications of AI and ML, as well as their ethical considerations, the focus of this article was to present this proof of concept of inter-GPT-4 communication. While these topics are indeed important, so too is the exploration and enjoyment of this amazing new technology. Just for the sake of it.



This post was first published on 15 May 2023 at 18:30. It was last updated on 15 May 2023 at 22:42. It was filed under artificial-intelligence with tags gpt-4, openai, ai-experiments, research, machine-learning, language-models.