The Paper Feed
A feed of Bayesian network related papers, articles, books and research that we happen across and find of interest
An extension to the noisy-OR function to resolve the ‘explaining away’ deficiency for practical Bayesian network problems
The “Leaky noisy-OR” is a common method used to simplify the elicitation of complex conditional probability tables in Bayesian networks involving Boolean variables. It has proven useful for approximating the required relationship in many real-world situations where there are two or more variables that are potential causes of a single effect variable. However, one of the properties of leaky noisy-OR is Conditional Inter-causal Independence (CII). This property means that ‘explaining away‘ behaviour-one of the most powerful benefits of BN inference is not present when the effect variable is observed as false. Yet, for many real-world problems where the leaky noisy-OR has been considered, this behaviour would be expected, meaning that leaky noisy-OR is deficient as an approximation of the required relationship in such cases. There have been previous attempts to adapt noisy-OR to resolve this problem. However, they require too many additional parameters to be elicited. We describe a simple but powerful extension to leaky noisy-OR that requires only a single additional parameter. While it does not solve the CII problem in all cases, it resolves most of the explaining away deficiencies that occur in practice. The problem and solution is illustrated using an example from intelligence analysis.
Bayesian networks with a logistic regression model for the conditional probabilities
Logistic regression techniques can be used to restrict the conditional probabilities of a Bayesian network for discrete variables. More specifically, each variable of the network can be modeled through a logistic regression model, in which the parents of the variable define the covariates. When all main effects and interactions between the parent variables are incorporated as covariates, the conditional probabilities are estimated without restrictions, as in a traditional Bayesian network. By incorporating interaction terms up to a specific order only, the number of parameters can be drastically reduced. Furthermore, ordered logistic regression can be used when the categories of a variable are ordered, resulting in even more parsimonious models. Parameters are estimated by a modified junction tree algorithm. The approach is illustrated with the Alarm network.
Bayesian networks for static and temporal data fusion
Prediction and inference on temporal data is very frequently performed using time series data alone. We believe that these tasks could benefit from leveraging the contex- tual metadata associated to time series - such as location, type, etc. Conversely, tasks involving prediction and inference on metadata could benefit from information held within time series. However, there exists no standard way of jointly modeling both time series data and descriptive metadata. Moreover, metadata frequently contains highly correlated or redundant information, and may contain errors and missing values. We first consider the problem of learning the inherent probabilistic graphical structure of metadata as a Bayesian Network. This has two main benefits: (i) once structured as a graphical model, metadata is easier to use in order to improve tasks on temporal data and (ii) the learned model enables inference tasks on metadata alone, such as missing data imputation. However, Bayesian network structure learning is a tremendous mathematical challenge, that involves a NP-Hard optimization problem. We present a tailor-made structure learning algorithm, inspired from novel theoretical results, that exploits (quasi)-determinist dependencies that are typically present in descriptive metadata. This algorithm is tested on numerous benchmark datasets and some industrial metadatasets containing deterministic relationships. In both cases it proved to be significantly faster than state of the art, and even found more performant structures on industrial data. Moreover, learned Bayesian networks are consistently sparser and therefore more readable. We then focus on designing a model that includes both static (meta)data and dynamic data. Taking inspiration from state of the art probabilistic graphical models for tem- poral data (Dynamic Bayesian Networks) and from our previously described approach for metadata modeling, we present a general methodology to jointly model metadata and temporal data as a hybrid static-dynamic Bayesian network. We propose two main algorithms associated to this representation: (i) a learning algorithm, which while being optimized for industrial data, still generalizes to any task of static and dynamic data fusion, and (ii) an inference algorithm, enabling both usual tasks on temporal or static data alone, and tasks using the two types of data. Finally, we discuss some of the notions introduced during the thesis, including ways to measure the generalization performance of a Bayesian network by a score inspired from the cross-validation procedure from supervised machine learning. We also propose various extensions to the algorithms and theoretical results presented in the previous chapters, and formulate some research perspectives.
Hidden Node Detection between Observable Nodes Based on Bayesian Clustering
Structure learning is one of the main concerns in studies of Bayesian networks. In the present paper, we consider networks consisting of both observable and hidden nodes, and propose a method to investigate the existence of a hidden node between observable nodes, where all nodes are discrete. This corresponds to the model selection problem between the networks with and without the middle hidden node. When the network includes a hidden node, it has been known that there are singularities in the parameter space, and the Fisher information matrix is not positive definite. Then, the many conventional criteria for structure learning based on the Laplace approximation do not work. The proposed method is based on Bayesian clustering, and its asymptotic property justifies the result; the redundant labels are eliminated and the simplest structure is detected even if there are singularities.
Parsimonious graphical dependence models constructed from vines
Multivariate models with parsimonious dependence have been used for a large number of variables, and have mainly been developed for multivariate Gaussian. Graphical dependence model representations include Bayesian networks, conditional independence graphs, and truncated vines. The class of Gaussian truncated vines is a subset of Gaussian Bayesian networks and Gaussian conditional independence graphs, but has an extension to non‐Gaussian dependence with (i) combinations of continuous and discrete random variables with arbitrary univariate margins, and (ii) accommodation of latent variables. To illustrate the importance of graphical models with latent variables that do not rely on the Gaussian assumption, the combined factor‐vine structure is presented and applied to a data set of stock returns.
Visual Analysis of Bayesian Networks for Electronic Health Records
Worldwide the amount of data generated by the medical community is staggering, and increasing dramatically. Using this data to improve patient care using analytics and machine learning is a huge and largely untapped opportunity. The most important medical data captured exist in patients' electronic health records (EHRs) which are maintained and utilized by health care providers. EHRs consist of rich and comprehensive patient-specific information from a large number of sources in different formats with heterogeneous data types. There are numerous challenges in attempting to apply existing analytic tools and methodologies to this data. Many features extracted from EHRs have dependent relationships - for example, “flu” and “high body temperature”. Bayesian networks, as one of the few modeling methodologies which capture feature dependence rather than assuming independence, provide a flexible foundation for modeling EHRs. However, existing Bayesian network learning methodologies produce models whose complexity makes them difficult for clinicians to utilize or even interpret. Therefore, better model visualization methodologies, as well as learning methods which produce models more amenable to simplification and summarization, are critical to making them interpretable and useful to clinicians, and therefore to improving patient care. In this dissertation, I present a framework for predictive analysis of patient clinical data, from feature extraction to model analysis. I first study straightforward machine learning approaches on extracted EHR features and find that incorporating diagnosis features improves area under ROC curve (AUC) by 10% compared to a baseline. Because of the many dependencies between features extracted from EHRs, I next investigate Bayesian network models, in which my clinician collaborators have identified known and suspected high pressure ulcer risk factors. The models also substantially increase sensitivity of the prediction - nearly three times higher comparing to logistical regression models - without sacrificing overall accuracy. However, interpreting these models involves a significant cognitive burden, motivating my investigation of visual analytic techniques. To this end, I develop an interactive tool for visualizing Bayesian networks to improve clinicians’ insight and interpretation of models. I perform a user study to assess the impact of the tool and its features. The results show quantitatively that users complete tasks more efficiently when using the tool, and qualitatively that they found it useful. Bayesian networks containing natural groupings or “clusters” are better suited to visualization and summarization. Since existing Bayesian network learning methods do not naturally yield such groupings, I alter the Bayesian network learning process to learn structures which optimize not just for representing dependency relationships, but additionally and simultaneously, for clusterability measures. My results show that the augmented Bayesian network process can find structures with much larger clusterability measures, with only a small decrease in their standard scoring measure. Visualizations of learned clustered Bayesian networks show that the algorithm cohesively groups related features, making the networks easier to interpret.
Using Bayesian networks to guide the assessment of new evidence in an appeal case
When new forensic evidence becomes available after a conviction there is no systematic framework to help lawyers to determine whether it raises sufficient questions about the verdict in order to launch an appeal. This paper presents such a framework driven by a recent case, in which a defendant was convicted primarily on the basis of audio evidence, but where subsequent analysis of the evidence revealed additional sounds that were not considered during the trial. The framework is intended to overcome the gap between what is generally known from scientific analyses and what is hypothesized in a legal setting. It is based on Bayesian networks (BNs) which have the potential to be a structured and understandable way to evaluate the evidence in a specific case context. However, BN methods suffered a setback with regards to the use in court due to the confusing way they have been used in some legal cases in the past. To address this concern, we show the extent to which the reasoning and decisions within the particular case can be made explicit and transparent. The BN approach enables us to clearly define the relevant propositions and evidence, and uses sensitivity analysis to assess the impact of the evidence under different assumptions. The results show that such a framework is suitable to identify information that is currently missing, yet clearly crucial for a valid and complete reasoning process. Furthermore, a method is provided whereby BNs can serve as a guide to not only reason with incomplete evidence in forensic cases, but also identify very specific research questions that should be addressed to extend the evidence base and solve similar issues in the future.
Evaluating the Weighted Sum Algorithm for Estimating Conditional Probabilities in Bayesian Networks
The primary challenge in constructing a Bayesian Network (BN) is acquiring its Conditional Probability Tables (CPTs). CPTs can be elicited from domain experts; however, they scale exponentially in size, thus making their elicitation very time consuming and costly. Das  proposed a solution to this problem using the weighted sum algorithm (WSA). In this paper we present two empirical studies that evaluates the WSA's efficiency and accuracy, we also describe an extension for the algorithm to deal with one of its shortcomings. Our results show that the estimates obtained using the WSA were highly accurate and make significant reductions in elicitation.
Improving the analysis of dependable systems by mapping fault trees into Bayesian networks
Bayesian Networks (BN) provide a robust probabilistic method of reasoning under uncertainty. They have been successfully applied in a variety of real-world tasks but they have received little attention in the area of dependability. The present paper is aimed at exploring the capabilities of the BN formalism in the analysis of dependable systems. To this end, the paper compares BN with one of the most popular techniques for dependability analysis of large, safety critical systems, namely Fault Trees (FT). The paper shows that any FT can be directly mapped into a BN and that basic inference techniques on the latter may be used to obtain classical parameters computed from the former (i.e. reliability of the Top Event or of any sub-system, criticality of components, etc). Moreover, by using BN, some additional power can be obtained, both at the modeling and at the analysis level. At the modeling level, several restrictive assumptions implicit in the FT methodology can be removed and various kinds of dependencies among components can be accommodated. At the analysis level, a general diagnostic analysis can be performed. The comparison of the two methodologies is carried out by means of a running example, taken from the literature, that consists of a redundant multiprocessor system.
An explication of uncertain evidence in Bayesian networks: likelihood evidence and probabilistic evidence
This paper proposes a systematized presentation and a terminology for observations in a Bayesian network. It focuses on the three main concepts of uncertain evidence, namely likelihood evidence and fixed and not-fixed probabilistic evidence, using a review of previous literature. A probabilistic finding on a variable is specified by a local probability distribution and replaces any former belief in that variable. It is said to be fixed or not fixed regarding whether it has to be kept unchanged or not after the arrival of observation on other variables. Fixed probabilistic evidence is defined by Valtorta et al. (J Approx Reason 29(1):71–106 2002) under the name soft evidence, whereas the concept of not-fixed probabilistic evidence has been discussed by Chan and Darwiche (Artif Intell 163(1):67–90 2005). Both concepts have to be clearly distinguished from likelihood evidence defined by Pearl (1988), also called virtual evidence, for which evidence is specified as a likelihood ratio, that often represents the unreliability of the evidence. Since these three concepts of uncertain evidence are not widely understood, and the terms used to describe these concepts are not well established, most Bayesian networks engines do not offer well defined propagation functions to handle them. Firstly, we present a review of uncertain evidence and the proposed terminology, definitions and concepts related to the use of uncertain evidence in Bayesian networks. Then we describe updating algorithms for the propagation of uncertain evidence. Finally, we propose several results where the use of fixed or not-fixed probabilistic evidence is required.