The Paper Feed
A feed of Bayesian network related papers, articles, books and research that we happen across and find of interest
Revealing the structure of the associations between housing system, facilities, management and welfare of commercial laying hens using Additive Bayesian Networks
After the ban of battery cages in 1988, a welfare control programme for laying hens was developed in Sweden. Its goal was to monitor and ensure that animal welfare was not negatively affected by the new housing systems. The present observational study provides an overview of the current welfare status of commercial layer flocks in Sweden and explores the complexity of welfare aspects by investigating and interpreting the inter-relationships between housing system, production type (i.e. organic or conventional), facilities, management and animal welfare indicators. For this purpose, a machine learning procedure referred to as structure discovery was applied to data collected through the welfare programme during 2010–2014 in 397 flocks housed in 193 different farms. Seventeen variables were fitted to an Additive Bayesian Network model. The optimal model was identified by an exhaustive search of the data iterated across incremental parent limits, accounting for prior knowledge about causality, potential over-dispersion and clustering. The resulting Directed Acyclic Graph shows the inter-relationships among the variables. The animal-based welfare indicators included in this study – flock mortality, feather condition and mite infestation – were indirectly associated with each other. Of these, severe mite infestations were rare (4% of inspected flocks) and mortality was below the acceptable threshold (< 0.6%). Feather condition scored unsatisfactory in 21% of the inspected flocks; however, it seemed to be only associated to the age of the flock, ruling out any direct connection with managerial and housing variables. The environment-based welfare indicators – lighting and air quality – were an issue in 5 and 8% of the flocks, respectively, and showed a complex inter-relationship with several managerial and housing variables leaving room for several options for intervention. Additive Bayesian Network modelling outlined graphically the underlying process that generated the observed data. In contrast to ordinary regression, it aimed at accounting for conditional independency among variables, facilitating causal interpretation.
Bayesian networks for static and temporal data fusion
Prediction and inference on temporal data is very frequently performed using time series data alone. We believe that these tasks could benefit from leveraging the contex- tual metadata associated to time series - such as location, type, etc. Conversely, tasks involving prediction and inference on metadata could benefit from information held within time series. However, there exists no standard way of jointly modeling both time series data and descriptive metadata. Moreover, metadata frequently contains highly correlated or redundant information, and may contain errors and missing values. We first consider the problem of learning the inherent probabilistic graphical structure of metadata as a Bayesian Network. This has two main benefits: (i) once structured as a graphical model, metadata is easier to use in order to improve tasks on temporal data and (ii) the learned model enables inference tasks on metadata alone, such as missing data imputation. However, Bayesian network structure learning is a tremendous mathematical challenge, that involves a NP-Hard optimization problem. We present a tailor-made structure learning algorithm, inspired from novel theoretical results, that exploits (quasi)-determinist dependencies that are typically present in descriptive metadata. This algorithm is tested on numerous benchmark datasets and some industrial metadatasets containing deterministic relationships. In both cases it proved to be significantly faster than state of the art, and even found more performant structures on industrial data. Moreover, learned Bayesian networks are consistently sparser and therefore more readable. We then focus on designing a model that includes both static (meta)data and dynamic data. Taking inspiration from state of the art probabilistic graphical models for tem- poral data (Dynamic Bayesian Networks) and from our previously described approach for metadata modeling, we present a general methodology to jointly model metadata and temporal data as a hybrid static-dynamic Bayesian network. We propose two main algorithms associated to this representation: (i) a learning algorithm, which while being optimized for industrial data, still generalizes to any task of static and dynamic data fusion, and (ii) an inference algorithm, enabling both usual tasks on temporal or static data alone, and tasks using the two types of data. Finally, we discuss some of the notions introduced during the thesis, including ways to measure the generalization performance of a Bayesian network by a score inspired from the cross-validation procedure from supervised machine learning. We also propose various extensions to the algorithms and theoretical results presented in the previous chapters, and formulate some research perspectives.
Hidden Node Detection between Observable Nodes Based on Bayesian Clustering
Structure learning is one of the main concerns in studies of Bayesian networks. In the present paper, we consider networks consisting of both observable and hidden nodes, and propose a method to investigate the existence of a hidden node between observable nodes, where all nodes are discrete. This corresponds to the model selection problem between the networks with and without the middle hidden node. When the network includes a hidden node, it has been known that there are singularities in the parameter space, and the Fisher information matrix is not positive definite. Then, the many conventional criteria for structure learning based on the Laplace approximation do not work. The proposed method is based on Bayesian clustering, and its asymptotic property justifies the result; the redundant labels are eliminated and the simplest structure is detected even if there are singularities.
A Bayesian network based learning system for modelling faults in large-scale manufacturing
Manufacturing companies can benefit from the early prediction and detection of failures to improve their product yield and reduce system faults through advanced data analytics. Whilst an abundance of data on their processing systems exist, they face difficulties in using it to gain insights to improve their systems. Bayesian networks (BNs) are considered here for diagnosing and predicting faults in a large manufacturing dataset from Bosch. Whilst BN structure learning has been performed traditionally on smaller sized data, this work demonstrates the ability to learn an appropriate BN structure for a large dataset with little information on the variables, for the first time. This paper also demonstrates a new framework for creating an appropriate probabilistic model for the Bosch dataset through the selection of statistically important variables on the response; this is then used to create a BN network which can be used to answer probabilistic queries and classify products based on changes in the sensor values in the production process.