All Key Research Areas

LLM Safety and Benchmarking

By releasing detailed benchmarking results, we encourage the broader healthcare and AI communities to adopt best practices, refine existing techniques, and ensure consistent safety standards in LLM deployment.

Overview

Multimodal Large Language Models and Agentic AI are emerging as powerful computational tools capable of transforming clinical workflows and patient care. Our laboratory’s primary goal is to explore and validate the safest, most effective methodologies for integrating these AI-driven solutions into modern medical practice. A key element of our work involves Retrieval-Augmented Generation, a technique designed to ground LLM outputs in reliable and up-to-date medical literature. By ensuring that each response is rooted in authoritative sources, we minimize the risk of misinformation, bias, and hallucinations, thereby fostering greater confidence among clinicians and patients alike.

Back to top

Focus

To realize the full potential of Agentic AI, RAG, Model Fine Tuning, Embeddings and Knowledge Graphs in diverse clinical settings, our team actively develops and tests a range of advanced techniques. These efforts revolve around optimizing model performance while upholding the highest standards of transparency and ethical responsibility. We subject our systems to extensive validation protocols in various medical specialties, including surgery, oncology, and internal medicine, ensuring that our frameworks are versatile and generalizable to distinct domains of patient care. This comprehensive approach helps us pinpoint system vulnerabilities—technical, ethical, or otherwise—and refine our protocols to safeguard the welfare of both practitioners and the public.

Back to top

Testing

Security is at the forefront of our research agenda. Through structured red-teaming exercises, we simulate adversarial conditions that reveal potential threats, such as data breaches or malicious inputs intended to exploit model weaknesses. These scenarios allow us to anticipate real-world challenges, harden our systems against potential attacks, and reinforce protective measures before they are deployed in clinical environments. Our work also incorporates strong principles of algorithmic fairness and responsible data stewardship, reducing the likelihood of biases that could undermine patient trust or compromise clinical decision-making.

Looking beyond immediate performance metrics, we uphold a shared commitment to collaboration and transparency. By releasing detailed benchmarking results, we encourage the broader healthcare and AI communities to adopt best practices, refine existing techniques, and ensure consistent safety standards in LLM deployment. In doing so, we help create an ecosystem where physicians can rely on these tools as reliable complements to their expertise, ultimately enhancing diagnostic accuracy, streamlining workflows, and elevating patient outcomes. Our vision is a future in which AI technologies integrate seamlessly into clinical practice, setting new standards for quality, safety, and innovation in global healthcare research.

Back to top