The Paper Feed
A feed of Bayesian network related papers, articles, books and research that we happen across and find of interest
An Object-Oriented Bayesian Framework for the Detection of Market Drivers
We use Object Oriented Bayesian Networks (OOBNs) to analyze complex ties in the equity market and to detect drivers for the Standard & Poor’s 500 (S&P 500) index. To such aim, we consider a vast number of indicators drawn from various investment areas (Value, Growth, Sentiment, Momentum, and Technical Analysis), and, with the aid of OOBNs, we study the role they played along time in influencing the dynamics of the S&P 500. Our results highlight that the centrality of the indicators varies in time, and offer a starting point for further inquiries devoted to combine OOBNs with trading platforms.
Revealing the structure of the associations between housing system, facilities, management and welfare of commercial laying hens using Additive Bayesian Networks
After the ban of battery cages in 1988, a welfare control programme for laying hens was developed in Sweden. Its goal was to monitor and ensure that animal welfare was not negatively affected by the new housing systems. The present observational study provides an overview of the current welfare status of commercial layer flocks in Sweden and explores the complexity of welfare aspects by investigating and interpreting the inter-relationships between housing system, production type (i.e. organic or conventional), facilities, management and animal welfare indicators. For this purpose, a machine learning procedure referred to as structure discovery was applied to data collected through the welfare programme during 2010–2014 in 397 flocks housed in 193 different farms. Seventeen variables were fitted to an Additive Bayesian Network model. The optimal model was identified by an exhaustive search of the data iterated across incremental parent limits, accounting for prior knowledge about causality, potential over-dispersion and clustering. The resulting Directed Acyclic Graph shows the inter-relationships among the variables. The animal-based welfare indicators included in this study – flock mortality, feather condition and mite infestation – were indirectly associated with each other. Of these, severe mite infestations were rare (4% of inspected flocks) and mortality was below the acceptable threshold (< 0.6%). Feather condition scored unsatisfactory in 21% of the inspected flocks; however, it seemed to be only associated to the age of the flock, ruling out any direct connection with managerial and housing variables. The environment-based welfare indicators – lighting and air quality – were an issue in 5 and 8% of the flocks, respectively, and showed a complex inter-relationship with several managerial and housing variables leaving room for several options for intervention. Additive Bayesian Network modelling outlined graphically the underlying process that generated the observed data. In contrast to ordinary regression, it aimed at accounting for conditional independency among variables, facilitating causal interpretation.
Bayesian networks for static and temporal data fusion
Prediction and inference on temporal data is very frequently performed using time series data alone. We believe that these tasks could benefit from leveraging the contex- tual metadata associated to time series - such as location, type, etc. Conversely, tasks involving prediction and inference on metadata could benefit from information held within time series. However, there exists no standard way of jointly modeling both time series data and descriptive metadata. Moreover, metadata frequently contains highly correlated or redundant information, and may contain errors and missing values. We first consider the problem of learning the inherent probabilistic graphical structure of metadata as a Bayesian Network. This has two main benefits: (i) once structured as a graphical model, metadata is easier to use in order to improve tasks on temporal data and (ii) the learned model enables inference tasks on metadata alone, such as missing data imputation. However, Bayesian network structure learning is a tremendous mathematical challenge, that involves a NP-Hard optimization problem. We present a tailor-made structure learning algorithm, inspired from novel theoretical results, that exploits (quasi)-determinist dependencies that are typically present in descriptive metadata. This algorithm is tested on numerous benchmark datasets and some industrial metadatasets containing deterministic relationships. In both cases it proved to be significantly faster than state of the art, and even found more performant structures on industrial data. Moreover, learned Bayesian networks are consistently sparser and therefore more readable. We then focus on designing a model that includes both static (meta)data and dynamic data. Taking inspiration from state of the art probabilistic graphical models for tem- poral data (Dynamic Bayesian Networks) and from our previously described approach for metadata modeling, we present a general methodology to jointly model metadata and temporal data as a hybrid static-dynamic Bayesian network. We propose two main algorithms associated to this representation: (i) a learning algorithm, which while being optimized for industrial data, still generalizes to any task of static and dynamic data fusion, and (ii) an inference algorithm, enabling both usual tasks on temporal or static data alone, and tasks using the two types of data. Finally, we discuss some of the notions introduced during the thesis, including ways to measure the generalization performance of a Bayesian network by a score inspired from the cross-validation procedure from supervised machine learning. We also propose various extensions to the algorithms and theoretical results presented in the previous chapters, and formulate some research perspectives.
Hidden Node Detection between Observable Nodes Based on Bayesian Clustering
Structure learning is one of the main concerns in studies of Bayesian networks. In the present paper, we consider networks consisting of both observable and hidden nodes, and propose a method to investigate the existence of a hidden node between observable nodes, where all nodes are discrete. This corresponds to the model selection problem between the networks with and without the middle hidden node. When the network includes a hidden node, it has been known that there are singularities in the parameter space, and the Fisher information matrix is not positive definite. Then, the many conventional criteria for structure learning based on the Laplace approximation do not work. The proposed method is based on Bayesian clustering, and its asymptotic property justifies the result; the redundant labels are eliminated and the simplest structure is detected even if there are singularities.
Sensitivity Analysis in a Bayesian Network for Modeling an Agent
Agent-based social simulation (ABSS) has become a popular method for simulating and visualizing a phenomenon while making it possible to decipher the system’s dynamism. When a large amount of data is used for an agent’s behavior, such as a questionnaire survey, a Bayesian network is often the preferred method for modeling an agent. Based on the data, a Bayesian network is used in ABSS. However, it is very difficult to learn the accurate structure of a Bayesian network from the raw data because there exist many variables and the search space is too wide. This study proposes a new method for obtaining an appropriate structure for a Bayesian network by using sensitivity analysis in a stepwise fashion. This method enables us to find a feature subset, which is good to explain objective variables without reducing the accuracy. A simple Bayesian network structure that maintains accuracy while indicating an agent’s behavior provides ABSS users with an intuitive understanding of the behavioral principle of an agent. To illustrate the effectiveness of the proposed method, data from a questionnaire survey about healthcare electronics was used.
A Bayesian network based learning system for modelling faults in large-scale manufacturing
Manufacturing companies can benefit from the early prediction and detection of failures to improve their product yield and reduce system faults through advanced data analytics. Whilst an abundance of data on their processing systems exist, they face difficulties in using it to gain insights to improve their systems. Bayesian networks (BNs) are considered here for diagnosing and predicting faults in a large manufacturing dataset from Bosch. Whilst BN structure learning has been performed traditionally on smaller sized data, this work demonstrates the ability to learn an appropriate BN structure for a large dataset with little information on the variables, for the first time. This paper also demonstrates a new framework for creating an appropriate probabilistic model for the Bosch dataset through the selection of statistically important variables on the response; this is then used to create a BN network which can be used to answer probabilistic queries and classify products based on changes in the sensor values in the production process.
Impact of drivers of change, including climatic factors, on the occurrence of chemical food safety hazards in fruits and vegetables: a Bayesian Network approach
The presence and development of many food safety risks are driven by factors within and outside the food supply chain, such as climate, economy and human behaviour. The interactions between these factors and the supply chain are complex and a system or holistic approach is needed to reveal cause-effect relationships and to be able to perform effective mitigation actions to minimise food safety risks. In this study, we demonstrate the potential of the Bayesian Network (BN) approach to identify and quantify the strength of relationships and interactions between the presence of food safety hazards as reported in Rapid Alert System for Food and Feed (RASFF) for fruits and vegetables on one hand, and climatic factors, economic and agronomic data on the other. To this end, all food safety notifications in RASFF (i.e. 3,781 notifications) on fruits and vegetables originating from India, Turkey and the Netherlands were collected for the period 2005-2015. In addition, climatic factors (e.g. temperature, precipitation), agricultural factors (e.g. pesticide use, fertilizer use) and economic factors (e.g. price, production volumes) were collected for the countries of origin of the product concurrent with the period of food safety notification in RASFF. A BN was constructed with 80% of the collected data using a machine-learning algorithm and optimised for each specific hazard category. The performance of the developed BN was determined in terms of accuracy of prediction of the hazard category in the evaluation set comprising 20% of the total data. The accuracy was high (95%) and the following factors contributed most: product category, notifying country, yearly production, number of notification, maximal residue level (MRL) ratio, country of origin, and the annual agricultural budget of a country. The assessment of the impact of interactions within the BN showed a significant interaction between the presence and level of a hazard as reported in RASFF and several drivers of change but at present, no definite conclusions can be drawn regarding the climatic factors and food safety hazards.
Reducing COPD Readmissions: A Causal Bayesian Network Model - IEEE Journals
This paper introduces a causal Bayesian network model to study readmissions reduction for chronic obstructive pulmonary disease (COPD) patients. The model employs a Bayesian network learning method and adopts domain knowledge. Using this model, we analyze the impacts of critical variables on a patient's readmission risk by manipulation of such variables. Through this analysis, effective intervention options to reduce readmission can be identified, which can provide a quantitative tool for designing personalized interventions to reduce COPD readmissions.