The Paper Feed
A feed of Bayesian network related papers, articles, books and research that we happen across and find of interest
Parsimonious graphical dependence models constructed from vines
Multivariate models with parsimonious dependence have been used for a large number of variables, and have mainly been developed for multivariate Gaussian. Graphical dependence model representations include Bayesian networks, conditional independence graphs, and truncated vines. The class of Gaussian truncated vines is a subset of Gaussian Bayesian networks and Gaussian conditional independence graphs, but has an extension to non‐Gaussian dependence with (i) combinations of continuous and discrete random variables with arbitrary univariate margins, and (ii) accommodation of latent variables. To illustrate the importance of graphical models with latent variables that do not rely on the Gaussian assumption, the combined factor‐vine structure is presented and applied to a data set of stock returns.
A Regional Application of Bayesian Modeling for Coastal Erosion and Sand Nourishment Management
This paper presents an application of the Bayesian belief network for coastal erosion management at the regional scale. A “Bayesian ERosion Management Network” (BERM-N) is developed and trained based on yearly cross-shore profile data available along the Holland coast. Profiles collected for over 50 years and at 604 locations were combined with information on different sand nourishment types (i.e., beach, dune, and shoreface) and volumes implemented during the analyzed time period. The network was used to assess the effectiveness of nourishments in mitigating coastal erosion. The effectiveness of nourishments was verified using two coastal state indicators, namely the momentary coastline position and the dune foot position. The network shows how the current nourishment policy is effective in mitigating the past erosive trends. While the effect of beach nourishment was immediately visible after implementation, the effect of shoreface nourishment reached its maximum only 5–10 years after implementation of the nourishments. The network can also be used as a predictive tool to estimate the required nourishment volume in order to achieve a predefined coastal erosion management objective. The network is interactive and flexible and can be trained with any data type derived from measurements as well as numerical models.
Visual Analysis of Bayesian Networks for Electronic Health Records
Worldwide the amount of data generated by the medical community is staggering, and increasing dramatically. Using this data to improve patient care using analytics and machine learning is a huge and largely untapped opportunity. The most important medical data captured exist in patients' electronic health records (EHRs) which are maintained and utilized by health care providers. EHRs consist of rich and comprehensive patient-specific information from a large number of sources in different formats with heterogeneous data types. There are numerous challenges in attempting to apply existing analytic tools and methodologies to this data. Many features extracted from EHRs have dependent relationships - for example, “flu” and “high body temperature”. Bayesian networks, as one of the few modeling methodologies which capture feature dependence rather than assuming independence, provide a flexible foundation for modeling EHRs. However, existing Bayesian network learning methodologies produce models whose complexity makes them difficult for clinicians to utilize or even interpret. Therefore, better model visualization methodologies, as well as learning methods which produce models more amenable to simplification and summarization, are critical to making them interpretable and useful to clinicians, and therefore to improving patient care. In this dissertation, I present a framework for predictive analysis of patient clinical data, from feature extraction to model analysis. I first study straightforward machine learning approaches on extracted EHR features and find that incorporating diagnosis features improves area under ROC curve (AUC) by 10% compared to a baseline. Because of the many dependencies between features extracted from EHRs, I next investigate Bayesian network models, in which my clinician collaborators have identified known and suspected high pressure ulcer risk factors. The models also substantially increase sensitivity of the prediction - nearly three times higher comparing to logistical regression models - without sacrificing overall accuracy. However, interpreting these models involves a significant cognitive burden, motivating my investigation of visual analytic techniques. To this end, I develop an interactive tool for visualizing Bayesian networks to improve clinicians’ insight and interpretation of models. I perform a user study to assess the impact of the tool and its features. The results show quantitatively that users complete tasks more efficiently when using the tool, and qualitatively that they found it useful. Bayesian networks containing natural groupings or “clusters” are better suited to visualization and summarization. Since existing Bayesian network learning methods do not naturally yield such groupings, I alter the Bayesian network learning process to learn structures which optimize not just for representing dependency relationships, but additionally and simultaneously, for clusterability measures. My results show that the augmented Bayesian network process can find structures with much larger clusterability measures, with only a small decrease in their standard scoring measure. Visualizations of learned clustered Bayesian networks show that the algorithm cohesively groups related features, making the networks easier to interpret.
Using Bayesian networks to guide the assessment of new evidence in an appeal case
When new forensic evidence becomes available after a conviction there is no systematic framework to help lawyers to determine whether it raises sufficient questions about the verdict in order to launch an appeal. This paper presents such a framework driven by a recent case, in which a defendant was convicted primarily on the basis of audio evidence, but where subsequent analysis of the evidence revealed additional sounds that were not considered during the trial. The framework is intended to overcome the gap between what is generally known from scientific analyses and what is hypothesized in a legal setting. It is based on Bayesian networks (BNs) which have the potential to be a structured and understandable way to evaluate the evidence in a specific case context. However, BN methods suffered a setback with regards to the use in court due to the confusing way they have been used in some legal cases in the past. To address this concern, we show the extent to which the reasoning and decisions within the particular case can be made explicit and transparent. The BN approach enables us to clearly define the relevant propositions and evidence, and uses sensitivity analysis to assess the impact of the evidence under different assumptions. The results show that such a framework is suitable to identify information that is currently missing, yet clearly crucial for a valid and complete reasoning process. Furthermore, a method is provided whereby BNs can serve as a guide to not only reason with incomplete evidence in forensic cases, but also identify very specific research questions that should be addressed to extend the evidence base and solve similar issues in the future.
Sensitivity Analysis in a Bayesian Network for Modeling an Agent
Agent-based social simulation (ABSS) has become a popular method for simulating and visualizing a phenomenon while making it possible to decipher the system’s dynamism. When a large amount of data is used for an agent’s behavior, such as a questionnaire survey, a Bayesian network is often the preferred method for modeling an agent. Based on the data, a Bayesian network is used in ABSS. However, it is very difficult to learn the accurate structure of a Bayesian network from the raw data because there exist many variables and the search space is too wide. This study proposes a new method for obtaining an appropriate structure for a Bayesian network by using sensitivity analysis in a stepwise fashion. This method enables us to find a feature subset, which is good to explain objective variables without reducing the accuracy. A simple Bayesian network structure that maintains accuracy while indicating an agent’s behavior provides ABSS users with an intuitive understanding of the behavioral principle of an agent. To illustrate the effectiveness of the proposed method, data from a questionnaire survey about healthcare electronics was used.
Evaluating the Weighted Sum Algorithm for Estimating Conditional Probabilities in Bayesian Networks
The primary challenge in constructing a Bayesian Network (BN) is acquiring its Conditional Probability Tables (CPTs). CPTs can be elicited from domain experts; however, they scale exponentially in size, thus making their elicitation very time consuming and costly. Das  proposed a solution to this problem using the weighted sum algorithm (WSA). In this paper we present two empirical studies that evaluates the WSA's efficiency and accuracy, we also describe an extension for the algorithm to deal with one of its shortcomings. Our results show that the estimates obtained using the WSA were highly accurate and make significant reductions in elicitation.
A Bayesian network based learning system for modelling faults in large-scale manufacturing
Manufacturing companies can benefit from the early prediction and detection of failures to improve their product yield and reduce system faults through advanced data analytics. Whilst an abundance of data on their processing systems exist, they face difficulties in using it to gain insights to improve their systems. Bayesian networks (BNs) are considered here for diagnosing and predicting faults in a large manufacturing dataset from Bosch. Whilst BN structure learning has been performed traditionally on smaller sized data, this work demonstrates the ability to learn an appropriate BN structure for a large dataset with little information on the variables, for the first time. This paper also demonstrates a new framework for creating an appropriate probabilistic model for the Bosch dataset through the selection of statistically important variables on the response; this is then used to create a BN network which can be used to answer probabilistic queries and classify products based on changes in the sensor values in the production process.
Partial Least Squares Discriminant Analysis and Bayesian Networks for Metabolomic Prediction of Childhood Asthma
To explore novel methods for the analysis of metabolomics data, we compared the ability of Partial Least Squares Discriminant Analysis (PLS-DA) and Bayesian networks (BN) to build predictive plasma metabolite models of age three asthma status in 411 three year olds (n = 59 cases and 352 controls) from the Vitamin D Antenatal Asthma Reduction Trial (VDAART) study. The standard PLS-DA approach had impressive accuracy for the prediction of age three asthma with an Area Under the Curve Convex Hull (AUCCH) of 81%. However, a permutation test indicated the possibility of overfitting. In contrast, a predictive Bayesian network including 42 metabolites had a significantly higher AUCCH of 92.1% (p for difference < 0.001), with no evidence that this accuracy was due to overfitting. Both models provided biologically informative insights into asthma; in particular, a role for dysregulated arginine metabolism and several exogenous metabolites that deserve further investigation as potential causative agents. As the BN model outperformed the PLS-DA model in both accuracy and decreased risk of overfitting, it may therefore represent a viable alternative to typical analytical approaches for the investigation of metabolomics data.
Risk Assessment of Underground Subway Stations to Fire Disasters Using Bayesian Network
Subway station fires often have serious consequences because of the high density of people and limited number of exits in a relatively enclosed space. In this study, a comprehensive model based on Bayesian network (BN) and the Delphi method is established for the rapid and dynamic assessment of the fire evolution process, and consequences, in underground subway stations. Based on the case studies of typical subway station fire accidents, 28 BN nodes are proposed to represent the evolution process of subway station fires, from causes to consequences. Based on expert knowledge and consistency processing by the Delphi method, the conditional probabilities of child BN nodes are determined. The BN model can quantitatively evaluate the factors influencing fire causes, fire proof/intervention measures, and fire consequences. The results show that the framework, combined with Bayesian network and the Delphi method, is a reliable tool for dynamic assessment of subway station fires. This study could offer insights to a more realistic analysis for emergency decision-making on fire disaster reduction, since the proposed approach could take into account the conditional dependency in the fire propagation process and incorporate fire proof/intervention measures, which is helpful for resilience and sustainability promotion of underground facilities.
Improving the analysis of dependable systems by mapping fault trees into Bayesian networks
Bayesian Networks (BN) provide a robust probabilistic method of reasoning under uncertainty. They have been successfully applied in a variety of real-world tasks but they have received little attention in the area of dependability. The present paper is aimed at exploring the capabilities of the BN formalism in the analysis of dependable systems. To this end, the paper compares BN with one of the most popular techniques for dependability analysis of large, safety critical systems, namely Fault Trees (FT). The paper shows that any FT can be directly mapped into a BN and that basic inference techniques on the latter may be used to obtain classical parameters computed from the former (i.e. reliability of the Top Event or of any sub-system, criticality of components, etc). Moreover, by using BN, some additional power can be obtained, both at the modeling and at the analysis level. At the modeling level, several restrictive assumptions implicit in the FT methodology can be removed and various kinds of dependencies among components can be accommodated. At the analysis level, a general diagnostic analysis can be performed. The comparison of the two methodologies is carried out by means of a running example, taken from the literature, that consists of a redundant multiprocessor system.
Modeling interrelationships between health behaviors in overweight breast cancer survivors: Applying Bayesian networks
Obesity and its impact on health is a multifaceted phenomenon encompassing many factors, including demographics, environment, lifestyle, and psychosocial functioning. A systems science approach, investigating these many influences, is needed to capture the complexity and multidimensionality of obesity prevention to improve health. Leveraging baseline data from a unique clinical cohort comprising 333 postmenopausal overweight or obese breast cancer survivors participating in a weight-loss trial, we applied Bayesian networks, a machine learning approach, to infer interrelationships between lifestyle factors (e.g., sleep, physical activity), body mass index (BMI), and health outcomes (biomarkers and self-reported quality of life metrics). We used bootstrap resampling to assess network stability and accuracy, and Bayesian information criteria (BIC) to compare networks. Our results identified important behavioral subnetworks. BMI was the primary pathway linking behavioral factors to glucose regulation and inflammatory markers; the BMI-biomarker link was reproduced in 100% of resampled networks. Sleep quality was a hub impacting mental quality of life and physical health with > 95% resampling reproducibility. Omission of the BMI or sleep links significantly degraded the fit of the networks. Our findings suggest potential mechanistic pathways and useful intervention targets for future trials. Using our models, we can make quantitative predictions about health impacts that would result from targeted, weight loss and/or sleep improvement interventions. Importantly, this work highlights the utility of Bayesian networks in health behaviors research.
An explication of uncertain evidence in Bayesian networks: likelihood evidence and probabilistic evidence
This paper proposes a systematized presentation and a terminology for observations in a Bayesian network. It focuses on the three main concepts of uncertain evidence, namely likelihood evidence and fixed and not-fixed probabilistic evidence, using a review of previous literature. A probabilistic finding on a variable is specified by a local probability distribution and replaces any former belief in that variable. It is said to be fixed or not fixed regarding whether it has to be kept unchanged or not after the arrival of observation on other variables. Fixed probabilistic evidence is defined by Valtorta et al. (J Approx Reason 29(1):71–106 2002) under the name soft evidence, whereas the concept of not-fixed probabilistic evidence has been discussed by Chan and Darwiche (Artif Intell 163(1):67–90 2005). Both concepts have to be clearly distinguished from likelihood evidence defined by Pearl (1988), also called virtual evidence, for which evidence is specified as a likelihood ratio, that often represents the unreliability of the evidence. Since these three concepts of uncertain evidence are not widely understood, and the terms used to describe these concepts are not well established, most Bayesian networks engines do not offer well defined propagation functions to handle them. Firstly, we present a review of uncertain evidence and the proposed terminology, definitions and concepts related to the use of uncertain evidence in Bayesian networks. Then we describe updating algorithms for the propagation of uncertain evidence. Finally, we propose several results where the use of fixed or not-fixed probabilistic evidence is required.
Impact of drivers of change, including climatic factors, on the occurrence of chemical food safety hazards in fruits and vegetables: a Bayesian Network approach
The presence and development of many food safety risks are driven by factors within and outside the food supply chain, such as climate, economy and human behaviour. The interactions between these factors and the supply chain are complex and a system or holistic approach is needed to reveal cause-effect relationships and to be able to perform effective mitigation actions to minimise food safety risks. In this study, we demonstrate the potential of the Bayesian Network (BN) approach to identify and quantify the strength of relationships and interactions between the presence of food safety hazards as reported in Rapid Alert System for Food and Feed (RASFF) for fruits and vegetables on one hand, and climatic factors, economic and agronomic data on the other. To this end, all food safety notifications in RASFF (i.e. 3,781 notifications) on fruits and vegetables originating from India, Turkey and the Netherlands were collected for the period 2005-2015. In addition, climatic factors (e.g. temperature, precipitation), agricultural factors (e.g. pesticide use, fertilizer use) and economic factors (e.g. price, production volumes) were collected for the countries of origin of the product concurrent with the period of food safety notification in RASFF. A BN was constructed with 80% of the collected data using a machine-learning algorithm and optimised for each specific hazard category. The performance of the developed BN was determined in terms of accuracy of prediction of the hazard category in the evaluation set comprising 20% of the total data. The accuracy was high (95%) and the following factors contributed most: product category, notifying country, yearly production, number of notification, maximal residue level (MRL) ratio, country of origin, and the annual agricultural budget of a country. The assessment of the impact of interactions within the BN showed a significant interaction between the presence and level of a hazard as reported in RASFF and several drivers of change but at present, no definite conclusions can be drawn regarding the climatic factors and food safety hazards.
Modelling Electronic Trust Using Bayesian Networks
This paper discusses importance of trust in the context of digital economy. Even though electronic commerce continues to grow worldwide due to many of its advantages, it has not been fully adopted yet. The reason for some barriers in adopting e-commerce lies in potential customers who still perceive online setting as quite risky. Customers who have concerns related to sellers’ IT infrastructure resilience, and secured and safe personal data, will hardly ever engage in e-transactions. The nature of trust is very subjective, complex and multi-faceted. Trust issues are not present only between buyers and sellers, but also between suppliers and sellers, trust in recommendations and references on certain products, etc. In this paper authors propose modelling trust using Bayesian networks and provide an illustrative example which is typical in online transactions.
Probabilistic Age Classification with Bayesian Networks
In the past few decades, the rise of criminal, civil and asylum cases involving young people lacking valid identification documents has generated an increase in the demand of age estimation. The chronological age or the probability that an individual is older or younger than a given age threshold are generally estimated by means of some statistical methods based on observations performed on specific physical attributes. Among these statistical methods, those developed in the Bayesian framework allow the user to provide coherent and transparent assignments which fulfill forensic and medico-legal purposes. The application of the Bayesian approach is facilitated by using probabilistic graphical tools, such as Bayesian networks. The aim of this work is to test the performances of the Bayesian network for age estimation recently presented in scientific literature in classifying individuals as older or younger than 18 years of age. For these exploratory analyses, a sample related to the ossification status of the medial clavicular epiphysis available in scientific literature was used. Results obtained in the classification are extremely promising: in the criminal context, the Bayesian network achieved, on the average, a rate of correct classifications of approximatively 97%, whilst in the civil context, the rate is, on the average, close to the 88%. These results encourage the continuation of the development and the testing of the method in order to support its practical application in casework.
Reducing COPD Readmissions: A Causal Bayesian Network Model - IEEE Journals
This paper introduces a causal Bayesian network model to study readmissions reduction for chronic obstructive pulmonary disease (COPD) patients. The model employs a Bayesian network learning method and adopts domain knowledge. Using this model, we analyze the impacts of critical variables on a patient's readmission risk by manipulation of such variables. Through this analysis, effective intervention options to reduce readmission can be identified, which can provide a quantitative tool for designing personalized interventions to reduce COPD readmissions.