A new research paper which was just published caught my attention. Its title is: “Is Open-Source There Yet? A Comparative Study on Commercial and Open-Source LLMs in Their Ability to Label Chest X-Ray Reports”.
The paper compared proprietary AI models like GPT3.5 and GPT4 and open-source AI models like LLaMA, Misteral and QWEN1 in the use case of labelling chest X-ray reports. They did that so that they could figure out whether the open source models were as good as the proprietary ones or not.
Results:
I leave with you some key findings from the paper:
In our study, we show that open-source generalist LLMs are able to consistently outperform the CheXbert model, which was specifically tuned for the task of radiology report classification. Furthermore, they come very close to the performance of GPT-4, which is a much larger model.
These results show that open source LLMs can serve as a viable alternative to GPT-4, as they are close in performance and offer several other significant advantages.
Overall, our results demonstrate that open-source LLMs are a viable and valuable alternative to proprietary models for medical tasks such as radiology report classification.
The paper is published in full on arXiv website and can be accessed as a PDF from this link: https://arxiv.org/pdf/2402.12298.pdf
Interesting findings overall! 🙂