|
|
|
|
|
|
|
Margherita Benzi (benzi@nous.unige.it) University of Torino at Vercelli The Trade-Off Between Science, Philosophy and Common Sense in Statistical Causality Philosophical discussion about causality often bases its arguments on intuition: advocates of a certain theory try to show that some alternative account leads to conclusions contrary to intuition. How shall we evaluate this appeal to intuition, especially with regard to theories which aim to account not only for commonsensical uses, but also for scientific ones? Some suggestions come from the theory of probabilistic causal inference (from now on PCI), a research program endorsed by artificial intelligence, with particular relevance in the fields of learning and automated discovery. In the following, we will concentrate mostly on contributions presented in the last decade by P.Spirtes, K.Glymour, R. Scheines (from now on SGS), but we will take into account also works by J. Pearl. Previous work by SGS had originated a strain of criticism by philosophers (in particular by N. Cartwright ); more recently the statistician D.A. Freedman raised the objection that approaches like SGS's risk to result in "trying to magically pull causal rabbits out of a statistical hat". Here I will not discuss directly these criticisms, but I will give a different and more positive account of the philosophical implications of this research program. I will try to show that the theory in question results in a clarification of the respective roles of science, philosophy and common sense in causal inference. Graphs and probability As R.Scheines recently wrote, the theory of probabilistic causal inference "unites two pieces of mathematics and one piece of philosophy. The two mathematical pieces are directed acyclic graphs (DAGs) and probability theory (with the focus on conditional independence) and the philosophy involves causation among variables. " [Scheines 1997,185-6]. DAG theory (also known as "Bayesian networks theory") gives a probabilistic interpretation to DAGs, making them represent sets of probability distributions. Each variable is associated to a node of the graph; the edges between nodes represent probabilistic dependence relations between the variables. The (sets of) probability distributions on the set of the variables are represented by the Markov condition ("in a graph, each variable is independent from its non descendants- non parents, given its parents"), which in turn is equivalent to a graphical relation, called d-separation [Pearl 1988]. Thank to d-separation, the relations of probabilistic independence among the variables in a domain can be "read off" by observing graphical relations between nodes in the DAG. This correspondence enables reasoning systems to reduce the number of calculations necessary to perform sound probabilistic inferences. DAG theory has become a theory of reasoning under uncertainty in artificial intelligence. However, at this stage of the theory, no concept of causation needs to be invoked. The application of DAG theory to causal reasoning and the transformation of DAGs (or Bayesian networks) into causal networks require a further step. This step consists in the so-called Causal Markov Assumption, stating that "when DAGs are interpreted causally the Markov condition and d-separation are in fact the correct connection between causal structure and probabilistic independence". Graphs and causality In early developments of Bayesian networks theory [Pearl 1988], the causal interpretation was meant principally as a support in building the graphs. The appeal to one's intuitions about causal relations helped her to built "sensible" graphs, but it was not well specified what these intuitions about causal relations were. How to specify this mysterious "causal intuition"? At present, a view shared by many PCI theorists is that an event (represented by a variable) causes another if manipulating the first you alter the second. This idea of manipulation can therefore used to elicit and/or our intuitions about causal relations. GS admit a definition of causal influence in terms of manipulation of intervention, but avoid putting much emphasis on causal intuition. Instead, they seem to suggest to start with statistical data, i.e., with probability distributions, and use them to derive a DAG which can be interpreted as the true causal structure (i.e. the true set of causal relations in the domain). The whole enterprise seems to be undermined by the fact that in a large majority of cases there are several causal structures corresponding to the same probability distribution; the problem is then that of narrowing the class of statistically indistinguishable structures licensed by the data; the solution is pursued by stating new assumptions, and exploring their consequences. However, there will be cases in which no general principle will be sufficient to eliminate every alternative except one; in some of these cases, all background knowledge possessed by the epistemic agent, or his intuitions about Œwhat causes what' are allowed to influence the choice of a single causal structure. Discussion PCI- theorists try to devise a set of axioms apt to build up a ' calculus' for probabilistic causality, or an algorithm that, given probabilities as input, produces (automatically) causal relations (distinguished by spurious correlation) as output. From a philosophical point of view, this research program has two consequences which seem worthy to be mentioned here: i) the enterprise of representing causality as a calculus simply aims to avoid the problems connected with the question of the interpretation of causality (analogously to what happened with the axiomatization of probability); [as in SGS 93]. ii) this research program pursues a further aim: if it is possible to devise algorithms capable, given a certain amount of empirical data as input, to produce causal structures as output, then it should be possible to mechanize the discovery of causal models, or "teach" computers to make scientific discoveries. As it has recently stressed by D. Gillies, if this mechanization would be true, the possibility and the utility of inductive logic would be established, contra Popper, who maintained the appeal to intuition as the ultimate factor of scientific discovery, but also contra Carnap, who mainly confined himself into the restrictive view of inductive logic as calculus of degrees of confirmations. At this very moment these aims are only partially reached. PCI theory neither banishes intuition from the domain of causal inference; nor succeeds in eliminating any need of philosophical definition of causality; rather, it circumscribes these two aspects, leading to a tradeoff between common sense, philosophy and the mathematical apparatus. This cautious use of intuitions can be contrasted with the current practice of many scientific disciplines which deny causal considerations on a theoretical level, but use them extensively in everyday routine work. On the contrary, PCI entails a conscious and controlled use of causal intuition, which appears to be a radical shift of habit, more "self conscious" and therefore less prone to sink into the pitfalls of intuition. Pearl [1988] - Pearl, J. Probabilistic Reasoning in Inteligent Systems, San Mateo, Calif.:Morgan Kaufmann. Scheines [1997] - Scheines, R. "An introduction to causal inference", in McKim and Turner (eds.) Causality in Crisis?, University of Notre Dame Press, 185-200. SGS [1993] - Spirtes, P., Glymour, C., and Scheines, R. Causation, Prediction , and Search. Lecture Notes in Statistics 81, New York: Springer-Verlag. |
|
|
www.vc.unipmn.it/~sifa/sifa.htm |