Over the recent months, we have seen large language models do a lot of impressive things, like summarizing and writing texts, but they still struggle with accurately interpreting and answering questions from SEC filings, even the best of them all – ChatGPT.
AI Models Struggle to Answer Questions from SEC filings
During a test carried out by researchers from a startup called Patronus AI, the models either refused to answer from the SEC filings or hallucinated figures and facts.
Even the most advanced ChatGPT model, “GPT-4-Turbo,” could only answer 79% of the questions correctly, according to the startup.
“That type of performance rate is just absolutely unacceptable. It has to be much, much higher for it to really work in an automated and production-ready way,” said Patronus AI co-founder Anand Kannappan.
Not everyone has the legal and financial expertise to readily grasp the nuances of SEC filings, and even manually sifting through complex filings is time-consuming and laborious.
AI models that can accurately summarize key points or answer specific questions from SEC filings and important documents are very significant and could save people valuable time to focus on deeper analysis and interpretation, particularly in the competitive financial industry.
However, it appears we aren’t close yet to that level of reliability and accuracy with existing AI models to encourage the use in actual products.
“There just is no margin for error that’s acceptable because, especially in regulated industries, even if the model gets the answer wrong 1 out of 20 times, that’s still not high enough accuracy,” said Patronus AI co-founder Rebecca Qian.
AI Models Are “Nondeterministic”
According to Kannappan, LLMs are nondeterministic, meaning they are not guaranteed to generate the same result every time from the same input. So, companies looking to incorporate AI models would still need to rigorously test to ensure they provide accurate and reliable results and not go off-topic.
Until LLMs become more trusted and reliable, there may be a slow adoption rate in industries where data quality matters most, especially finance.
“We definitely think that the results can be pretty promising,” said Kannappan. “Models will continue to get better over time. We’re very hopeful that in the long term, a lot of this can be automated. But today, you will definitely need to have at least a human in the loop to help support and guide whatever workflow you have.”