First you cluster by disease, then you sub-cluster by Genotype, then cross-cluster by symptom.
This sounds easy, because you hope to filter in only those patients who will get better when a drug blocks a MetabolicPathway at a specific point. But nature usually provides more than one pathway, and other drugs may change the pathways available. Redundancy enables adaptation, but interferes with intervention. The best block is one that works early in a cascade, before the potential pathways fan out.
So perfectly narrow categorization, is an ellusive and complex goal. For example, a study to learn how to cluster may involve 60,000 data points per subject, and cover 1,000 families. Such studies are difficult to reproduce, and their results are difficult to reproduce.
See: GailRiskModel.
See: StatisticalProcessControl.
See (Korean Alert) Clustering, ClusteringExample.