Evidence has proven that there are major flaws in how machine learning is applied in scientific research. This problem has been identified in many research papers that cover different fields, but a team of 19 researchers from different disciplines has published AI guidelines on the responsible use of machine learning in science under the supervision of Arvind Narayanan and Sayash Kapoor, both of whom are computer scientists at Princeton University.
AI guidelines for science research
The authors say in the report that their work is an effort to point out this problem of credibility that can prevail over the entire research ecosystem. Narayanan argues that there are no universal standards to protect the integrity of research methods, and machine learning is now applied in all scientific fields, so this can become a more serious problem than the replication crisis that was observed in social psychology a decade ago. He calls the current crisis a reproducibility crisis. As Narayanan said,
“When we graduate from traditional statistical methods to machine learning methods, there are a vastly greater number of ways to shoot oneself in the foot.”
Source: AzoAI.
But the authors from health research, computer science, social science, and mathematics also have some positive news. They say that a set of best practices can help solve the current problem. Computer graduate student Kapoor, who actually organized the research effort to produce the checklist for science work and is working with Narayanan, said that the problem is systematic and the solution for it should also be systematic.
Publishing may slow down, but accuracy will increase
The focus of the new consensus-based checklist is to ensure the authenticity of the research that utilizes machine learning. The evolution of science is based on the reproducibility of results and the fact that claims should be validated independently. Without this, new research in science is not possible on a reliable basis on top of previous work, and the entire system loses credibility.
The new checklist requires researchers to provide detailed information on the use of machine learning models, as they are required to provide the data sets used to train the model, its code, hardware capacities, pilot design, and research goals, along with any constraints on the findings of the study, as the focus is on transparency.
While it is also a possibility that the increased requirements of these new standards may slow down the publication of new research studies, the researchers of the initiative still believe that the adoption of these rules will be helpful in increasing the rate of discovery and innovation at large.
One of the authors of the study, Emily Cantrell, who is a PHD student at Princeton University, said that they do care about the pace of scientific research, but by making sure that the quality of papers that get published is up to par, future papers can rely on them for further research. Kapoor also says that errors are adverse considering their overall impact on a collective basis and waste time, which in turn costs money as they hinder the science research that gets funding and investment.