The researchers increased success rates by as much as 27% with simple prompt engineering.
Getting ChatGPT to operate autonomously within the confines of an operating system has proven a difficult task for numerous reasons, but a team composed of scientists from Microsoft Research and Peking University may have figured out the secret sauce.
The team conducted a study to determine why artificial intelligence (AI) large language models (LLMs) such as GPT-4 fail at tasks requiring the manipulation of an operating system.
State of the art systems such as ChatGPT running on GPT-4 set the benchmark for generative tasks such as drafting an email or writing a poem. But getting them to act as agents within a general environment poses a significant challenge.