COMPUTATIONAL AND INFORMATICS RESOURCES AND TOOLS FOR GLYCOSCIENCE RESEARCH

Posted in Announcements

Development of an integrated, extendable, and cross-disciplinary resource providing tools and data facilitating the integration of glycoscience knowledge and data with integration with biological knowledge from genetics, proteomics, pathology, and other disciplines.

U01-GM125267: 9/2017-5/2022

Abstract: Although ongoing technical advances are accelerating the pace and sophistication of data acquisition in glycoscience, the transformation of these data to glycobiology knowledge, insight, and understanding is slowed by the limited number of tools that facilitate their integration with biological knowledge from genetics, proteomics, pathology, and other disciplines. Our grant application describes the development of an integrated, extendable, and cross-disciplinary resource providing tools and data to address specific scientific questions that can currently be answered only by extensive literature-based research and manual collection of data from disparate databases and websites. Using insight gained during our planning grant activities, including a workshop focused on evaluating existing resources and community needs, we propose to develop a broadly relevant and sustainable glycoinformatics resource to connect glycoscience with the explosion of data that is revolutionizing biology. We identified critical gaps that need to be filled and challenges that must be overcome to create an enduring and sustainable glycoinformatics resource that goes beyond mapping glycan data to genes and proteins to identify and integrate diverse multidisciplinary knowledge from EMBL-EBI, NCBI, UniProt, UniCarbKB, CAZy, Gene Ontology and other sources. To maximize synergy among these resources, we propose a new glycan array data repository and enhanced ontologies to facilitate integration of glycan and glycoconjugate expression and interaction data with other information. Evaluating these data in the context of knowledge about genetic mutations, gene expression, protein function and other phenomena will provide new opportunities for systems-level understanding of the roles of glycosylation in disease and development. This comprehensive data integration framework will provide unprecedented support for complex queries spanning diverse data types relevant to glycobiology. Technical advances required to implement this framework include evidence tagging of data, ontology and standards development, and new interfaces that enable data mining, sharing, and dissemination. Community engagement, especially with scientists who do not specialize in glycobiology, will be emphasized to maximize the relevance of our resource. We will develop a portal to make all this information publicly available in standard formats supported by NCBI and EMBL-EBI and in new formats we develop, promoting sharing of data and their ultimate integration into these widely used informatics resources.