Biomatrix Club

Why diagnosing rare diseases is so difficult

Diagnosing rare diseases in children is especially challenging. Symptoms are often vague, overlap with more common conditions, or take years to fully manifest. As a result, many young patients face long and uncertain diagnostic journeys before receiving the care they need.

To help address this pressing issue, the Hauner project was launched as a challenge during the TUM AI Hackathon 2024 and has since become the first major effort within BioMatrix.

How Graph Neural Networks help

The project explores how Graph Neural Networks (GNNs) can be used to model the complex relationships between patients, their biological attributes, and associated diseases. By identifying patterns that resemble those of previously diagnosed cases, this approach can help clinicians detect rare diseases earlier and more precisely.

By embedding patients and diseases into a shared feature space, the Hauner GNN project creates a map of biological similarity that can reveal hidden risks before symptoms fully manifest.

The underlying graph

At the heart of the project is a large network composed of patients, their biological features, and disease diagnoses. This network is based on real-world clinical data sourced through our collaboration with the Dr. von Hauner Children's Hospital.

Training process of the GNN

Message passing: Each patient node shares information with its connected neighbors, such as diseases and biological features. The model aggregates this information into a summary vector, providing a clearer view of each patient's context.
Learning patterns: The model then updates its internal rules, like turning the dials on a mixing board, to better understand which patterns are most useful for detecting diseases. These dials help the model focus on the most relevant connections.
Making predictions: After training, the model predicts which patients are likely linked to which diseases, even if the disease has not been diagnosed yet. If the predicted connection is strong enough (above a defined threshold) it is kept; otherwise, it is ignored.

Data privacy and synthetic data

Working with sensitive patient data requires strict privacy protections. The Hauner project therefore follows a carefully designed workflow that allows us to develop and test our models without directly accessing real patient data.

We use synthetic data that resembles real medical data in structure and complexity but does not belong to any real individuals. Once the model performs well on synthetic data, the trained model is sent to the Hauner Children's Hospital, where it is evaluated and fine-tuned on real clinical cases.

After training, the hospital only returns aggregated results, which we then use to further improve the model.