Aktivitäten pro Jahr
Abstract
We consider the problem of causal structure learning in the setting of heterogeneous populations,
i.e., populations in which a single causal structure does not adequately represent all population
members, as is common in biological and social sciences. To this end, we introduce a distance
covariance-based kernel designed specifically to measure the similarity between the underlying
nonlinear causal structures of different samples. Indeed, we prove that the corresponding feature
map is a statistically consistent estimator of nonlinear independence structure, rendering the kernel
itself a statistical test for the hypothesis that sets of samples come from different generating causal
structures. Even stronger, we prove that the kernel space is isometric to the space of causal ancestral
graphs, so that distance between samples in the kernel space is guaranteed to correspond to distance
between their generating causal structures. This kernel thus enables us to perform clustering to
identify the homogeneous subpopulations, for which we can then learn causal structures using
existing methods. Though we focus on the theoretical aspects of the kernel, we also evaluate its
performance on synthetic data and demonstrate its use on a real gene expression data set.
Keywords: graphical causal models; distance covariance; whole-graph embeddings; clustering.
i.e., populations in which a single causal structure does not adequately represent all population
members, as is common in biological and social sciences. To this end, we introduce a distance
covariance-based kernel designed specifically to measure the similarity between the underlying
nonlinear causal structures of different samples. Indeed, we prove that the corresponding feature
map is a statistically consistent estimator of nonlinear independence structure, rendering the kernel
itself a statistical test for the hypothesis that sets of samples come from different generating causal
structures. Even stronger, we prove that the kernel space is isometric to the space of causal ancestral
graphs, so that distance between samples in the kernel space is guaranteed to correspond to distance
between their generating causal structures. This kernel thus enables us to perform clustering to
identify the homogeneous subpopulations, for which we can then learn causal structures using
existing methods. Though we focus on the theoretical aspects of the kernel, we also evaluate its
performance on synthetic data and demonstrate its use on a real gene expression data set.
Keywords: graphical causal models; distance covariance; whole-graph embeddings; clustering.
Originalsprache | Englisch |
---|---|
Seiten (von - bis) | 542-558 |
Fachzeitschrift | Proceedings of Machine Learning Research (PMLR) |
Jahrgang | 177 |
Publikationsstatus | Veröffentlicht - 2022 |
ÖFOS 2012
- 102019 Machine Learning
Fingerprint
Untersuchen Sie die Forschungsthemen von „A Distance Covariance-based Kernel for Nonlinear Causal Clustering in Heterogeneous Populations“. Zusammen bilden sie einen einzigartigen Fingerprint.Aktivitäten
- 1 Teilnahme an ...
-
Conference on Causal Learning and Reasoning
Alex Markham (Teilnehmer*in) & Richeek Das (Teilnehmer*in)
11 Apr. 2022 → 13 Apr. 2022Aktivität: Wissenschaftliche Veranstaltungen › Teilnahme an ...