Skip to content

Latest commit

 

History

History
15 lines (7 loc) · 1.74 KB

File metadata and controls

15 lines (7 loc) · 1.74 KB

Project 4: SPARQL Query Generation for Efficient Scientific Data Access of ELIXIR resources

Abstract

The Swiss Institute of Bioinformatics (SIB/ELIXIR-CH), Database Center for Life Science (DBCLS-Japan) and RIKEN-Japan join efforts to develop an open-source artificial intelligence (AI)-driven system for intuitive querying of scientific datasets to accelerate scientific innovation. We call for contributions in these efforts that align with the BioHackathon's goal of fostering an open-source infrastructure for data integration and addresses the urgent need for effective data retrieval methods.

Our goal is to make it easier for life scientists to use databases by converting their questions into SPARQL queries using large language models (LLMs). We understand the difficulties researchers face with SPARQL's complexity and knowledge base schemas, so we suggest a user interface that combines LLMs and knowledge bases. This will allow for direct data interaction in natural language, simplifying the research process. Our approach will facilitate data discovery and retrieval with the necessary accuracy for scientific research, as it leverages LLMs to generate SPARQL queries grounded in validated scientific data.

Despite LLMs’ abilities in areas like code generation, they often struggle with the semantic accuracy of SPARQL queries. Our project is focused on addressing these limitations, ensuring that conversational AI can accurately interpret and translate research inquiries into precise queries. It aligns with the objectives of the ELIXIR 2024-26 Programme and lays the groundwork for future research collaborations, offering a practical solution for data-driven discovery in the life sciences.

Lead(s)

Tarcisio Mendes de Farias, Julio Rangel