scHilda: LLM-Knowledge Graph Integration Achieves State-of-the-Art Single-Cell Annotation
Tags AI · OSS
Researchers from Chinese universities published scHilda in PLOS Computational Biology, a framework that deeply integrates a biological knowledge graph into LLM reasoning for single-cell RNA-seq annotation. Using a hierarchical arbitration strategy — first identifying major cell lineages with a global knowledge base, then resolving subtypes with focused subgraph retrieval — scHilda achieves state-of-the-art performance on benchmark datasets. The framework enables lightweight LLMs like Deepseek-V3.2 to match top-tier model performance, reducing the cost of accurate cell annotation. Tests on eight public datasets show scHilda outperforms existing methods across tissues and species.
Technical significance
scHilda's hierarchical approach — constraining LLM reasoning with structured biological knowledge — addresses the hallucination problem that has limited LLM adoption in scientific domains. The fact that lightweight LLMs can match top-tier performance when properly constrained by a knowledge graph suggests that domain-specific AI systems may not require frontier model scale. This has significant cost implications for computational biology workflows.