Encyclopedia of Complexity and Systems Science

Living Edition
| Editors: Robert A. Meyers

Approximations to Algorithmic Probability

• Hector Zenil
Living reference work entry
DOI: https://doi.org/10.1007/978-3-642-27737-5_700-1

Keywords

Probabilistic Algorithm Small Turing Machines Universal Distribution Algorithmic Content Block Decomposition Method (BDM)
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

A Computational Theory of Everything

Physicists have long been searching for a so-called Theory of Everything (ToE). Just as quantum mechanics explains the smallest phenomena, from microwaves to light, and general relativity explains the classical world, from gravity to space and time, a ToE would explain everything in the universe, from the smallest to the largest phenomena, in a single formulation.

Science has operated under the assumption and in light of strong evidence that the world is highly, if not completely, algorithmic in nature. If the world were not structured, our attempts to construct a body of theories from observations, to build what we consider ordered scientific models of the world, would have failed. That they have not is spectacular vindication of the validity of the world’s orderly character. We started out believing that the world was ruled by magic, and by irrational and emotional gods. However, thinkers from ancient civilizations such as China and India and, notably, Greece made the first recorded attempts to introduce reason and logic into scholarly discourse, advancing rules that may be seen as the first steps towards a computational theory of causality, where every observation was assumed to have a cause that was something other than a metaphysical explanation.

Over the last few centuries, science has made a lot of progress explaining more with less effectively suggesting the algorithmic nature of the universe. For a long time, the Ptolemaic geocentric model was thought to explain the strange motion of planets in the sky. When it came to planetary movement, the geocentric theory had the same predictive power as the later heliocentric model. The geocentric model, however, was unable to explain how the outer planets seemed to have a period related to the Sun’s period if they had no special relation amongst themselves. Not only was the heliocentric model a better model because it required, overall, fewer constraints or features, such as the many epicycles needed in the geocentric model, but it did not need to be supplemented by additional explanations. Some anomalies remained to be accounted for (having to do with the elliptical orbit, as discovered by Kepler) but it did not need to explain odd regularities that were apparent yet unexplained by the geocentric model. We have replaced the heliocentric by much more sophisticated models, starting with Kepler’s and Newton’s, that not only endorsed and embraced the heliocentric model but placed at the center of the solar system a center of gravity common to the Sun and the rest of the planets, a center of gravity that turns out to be inside the Sun but not to be exactly the Sun or its center. In other words, under updated models the truth value of the claim that either the planets revolve around the Sun or that the Sun revolves around the planets is relative. With this update on the heliocentric model, Newton was able to link the movement of the planets to the movement of everyday objects on Earth, and to supply a reason as to why objects fall down and why we appear to be stuck to the Earth’s surface, an amazing feat where a minor increase in the sophistication of a model yielded explanations of a tremendous number of hitherto unexplained phenomena. But it was not until Einstein’s general relativity that we could explain phenomena that none of the previous classical models could, phenomena that were considered strange anomalies, as if they were the result of the caprices of ancient gods, such as the so-called Venus’ perihelion movement, the phenomenon that sees Venus’ elongated orbit change significantly over the course of a year. The only indication that Venus’ movement was not subject to the mood of an angry god was perhaps its regularity. So both regularity in phenomena such as the movement of stars and planets, and the achievement of superior models able to explain a much greater number of unexplained phenomena with little to no increase in sophistication compared to precursors, are at the center of science.

That we are able to explain more without need of many more assumptions, devising ever more encompassing scientific models, suggests that our complex world tends toward algorithmic simplicity. Indeed, the more simple physics finds the world to be, the more algorithmic it suggests it is, as the universe seems to be generated or explained by simple principles. The chances of uncovering a short computer program producing the universe are thus greater if physical phenomena can be explained by increasingly simple and shorter descriptions and models. This does not come as a surprise. If current cosmological theories are correct, the universe has a common and single origin and is thus deterministically connected from beginning to end.

It is true that laws are only models of natural phenomena and that some later models may become slightly more complex before becoming more simple, but ultimately all models in science have become simpler and simpler with time and further research, so that today we can explain a greater number of phenomena encompassing almost every aspect of the natural world using only two major models, general relativity and the standard model. So both the direction of science and the assumption that the world is algorithmic in nature are solid and definite. We can scientifically explain much more today than we could 50 years ago, and 50 years ago we could explain much more than we could 200 years ago, and so on, using simpler and shorter scientific models of incredible accuracy.

If the world is causal and deterministic, as science empirically shows, general relativity suggests and quantum mechanics does not rule out, such a theory would have a computational form (Wolfram 2002), and even riding on quantum fluctuations it would appear algorithmic at the macroscopic scale. A computational theory of everything of this type in (Schmidhuber 2002), and based on the works of Solomonoff (1964) and Levin (1974), has also been suggested to be miraculous (Kirchherr et al. 1997), and more recently, the merits of approximating it were underscored by Marvin Minsky. According to him, the underlying theory would only be next in importance (while actually being deeply related) to the results of Kurt Gödel, who found that proof and truth are often disconnected, given that one cannot prove certain truths and that being able to prove every possible truth leads to contradictions. At the Panel on The Limits of Understanding Minsky pointed out (Source: https://youtu.be/DfY-DRsE86s?t=1h30m2s):

It seems to me that the most important discovery since Gödel was the discovery by Chaitin, Solomonoff and Kolmogorov of the concept called, Algorithmic Probability, which is a fundamental new theory of how to make predictions given a collection of experiences, and this is a beautiful theory. Everybody should learn it, but it’s got one problem, that is, that you cannot actually calculate what this theory predicts because it is too hard, it requires an infinite amount of work. However, it should be possible to make practical approximations to the Chaitin, Kolmogorov, Solomonoff theory that would make better predictions than anything we have today. Everybody should learn all about that and spend the rest of their lives working on it.

World Science Festival in New York City, Dec 14, 2014.

The Power of Mechanistic Models

The first mechanistic model of the solar system did not originate with Claudius Ptolemy, Copernicus, or even Kepler, but with Newton and his theory of gravitation. A mechanistic model is one that posits the causes of certain observables rather than merely describing phenomena. None of the models previous to Newton accounted for what underlay the motion of planets, only offering reasons or hypotheses for their apparent movements. It is not until Newton that we knew the planets were somehow “falling,” just as apples fall from a tree. The Sun and the planets in our solar system are attracted to each other and pull toward each other, making them move in the way they do.

So what is a causal and mechanistic model in more general terms? Newton’s model is causal and mechanistic because it provides the means to, in principle, reproduce the conditions of the solar system and generate exactly the same outcome. This basically means that Newton’s model can be emulated mechanistically to reproduce the movement of the planets and Sun. Indeed, the only difference between Kepler’s and Newton’s models, given that they are equally accurate in their prediction of planetary motion, is that Newton’s is causal and can explain many more phenomena outside the realm of planetary motion. Not all models are of the same type. For example, in a mechanistic model, the sun’s cycle may require two coupled gears of size 20 × 20 the gear cycle of the moon; the mechanical model would then suggest such a ratio between size and distance of moon to sun. However, this would have been impossible to figure out by following the epicycles model. An example of a mechanistic model is a Rube Goldberg Machine (see Fig. 5), because such a machine implements itself, that is, every action of the machine corresponds to the actual performance of a task. In other words, the best model of something is itself. But we rarely have the chance to design or control a system to include an accurate representation of itself; instead we produce hypotheses for models that we test.

Let’s take as an example a very abstract one, a rational number such as π. The constant π is a computable number and thus can be generated mechanistically. A mechanical model of π is the one suggested by the relationship between the diameter and the circumference of a circle. This, in practice, is quite difficult and is not undertaken physically, but from its abstract calculation, mathematicians have learned many properties beyond the original realm of π itself. For example, π is of low algorithmic (Kolmogorov-Chaitin) complexity because it has many short descriptions, as has been found from many relationships in mathematics.

Algorithmic Probability

What connects all mechanistic model-driven approaches in science is none other than the computational theory of Algorithmic Probability (Levin 1974; Solomonoff 1964), a theory that elegantly connects computation with classical probability in a proper way through a theorem called the algorithmic Coding theorem (c.f. below), which for its part establishes that the most frequent outcome of a causal/deterministic system will also be the most simple in the sense of algorithmic complexity, that is, the estimation of the length of the shortest computer program that produces such an output.

This measure defines the probability of data being produced by a random program running on a universal Turing machine (Levin 1974; Solomonoff 1964). The measure is key to understanding data in the best possible scientific way, that is, by exploring hypotheses in search of the most likely genesis of the data, so as to be able to make predictions. Formally,
$$m(x)=\sum_{p:U(p)=x}1/{2}^{\mid p\mid }<1$$
(1)
i.e., it is the sum over all the programs that produce x running on a universal (prefix-free (The group of valid programs forms a prefix-free set (no element is a prefix of any other, a property necessary to keep 0 < m(x) < 1). For example, program 2 starts with program 1, so if program 1 is a valid program then program 2 cannot be valid. The set of valid programs is said to constitute a prefix-free set, that is, no element is a prefix of any other, a property necessary to keep 0 < m(x) < 1.)) Turing machine U. The probability measure m(x) is traditionally called Levin’s semi-measure, Solomonoff-Levin’s semi-measure, or algorithmic probability, and it defines what is known as the Universal Distribution (semi from semi-decidable and the fact that m(x), unlike probability measures, does not add up to 1). m(x) formalizes the concept of Occam’s razor (favoring simple – or shorter – hypotheses over complicated ones) (Fig. 1).
In Zenil et al. (2017), we showed that the use and application of computable measures, in particular Shannon Entropy (Shannon 1948), can lead to weak and contradictory results when evaluating the information content or random typicality of objects, in particular graphs and networks, given that such objects, like any other object, can be described in different ways without loss of information (see Fig. 2) and, unlike universal measures such as algorithmic complexity, computable measures are not robust to changes in such descriptions (Fig. 2).

An Alternative Method to Lossless Compression

There is a trend in statistical machine learning fueled by big data and computational power that involves the packing of a set of regression analysis methods into what is called a deep learning and deep neural network. These approaches offer black boxes with incredible capabilities but cannot be farther away from mechanistic models. In some ways, they are sophisticated epicyclic models of some data. They can be extremely useful, can even be as precise as the corresponding mechanistic models of the phenomena they try to capture, but they will fall short at scaling and abstracting general properties into models one can build upon without having to add yet another layer of black boxes. The challenge is how to put together the enormous capabilities of deep learning to synthesize data into a combinatorial set of feature variables with the abstraction capabilities of truly model-based approaches.

Some approaches are unconventional but are not meant to remain so forever. The alternatives for measuring and applying algorithmic complexity other than lossless compression and translating highly abstract theory to highly applied areas have started to produce surprising results that would not be possible had the new numerical methods been unable to go beyond previous traditional schemes. The detailed theory introduced in (Delahaye and Zenil 2012; Soler-Toscano et al. 2014) as a novel alternative to compression (Delahaye and Zenil 2012) has found applications in the study of graphs and networks (Zenil et al. 2014), in molecular biology (Zenil et al. 2016), and animal and human cognition (Gauvrit et al. 2014a, b, 2016, 2017a, b).

The alternative to compression is based upon the algorithmic Coding theorem that describes the connection between the complexity of an object K(x) and its algorithmic probability of being produced by a random computer program m(x) (Gregory and Chaitin 1966; Levin 1974) (whose executable code digits are the result of tossing a coin). The theorem established that:
$$K(x)=-{\mathrm{log}}_2\left(m(x)\right)+c$$
(2)
where c is a constant independent of x.

The intuition behind the Coding theorem (2) is a beautiful relation between program-size complexity and frequency of production. If you wished to produce the digits of a mathematical constant like π by throwing digits at random, you would have to produce every digit of its infinite irrational decimal expansion. If you seated a monkey at a typewriter (with, say, 50 keys), the probability of the monkey typing an initial segment of 2400 digits of π by chance is (1/502400).

If, instead, the hypothetical monkey (simply a source of randomness) is seated at a computer, the chances of its producing a program generating the digits of π are on the order of 1/50158, because it would take the monkey only 158 characters to produce the first 2400 digits of π using, for example, C language.

The less random an object, the more likely it is to be produced by a short program. There is a greater probability of producing the program that produces the object, especially if the program producing x is short. Therefore, if an object is compressible, then the chances of producing the compressed program that produces said object are greater than they would be for a random object, i.e., ∣p∣ ≪ ∣x∣ such that U (x) = x.

An Unconventional Approach to Computing the Uncomputable

It is one thing to propose that if the world is computational then it has to be generated by some computational rule (Schmidhuber 2002), and a different thing to explore the space of such possible computational rules (Wolfram 2002). However, very little to nothing had been done to actually approximate something like Levin’s universal distribution, to plug it back into a possible algorithmic theory of everything and find applications in the sense of Minsky (c.f. quotation above). Moreover, through the algorithmic Coding theorem one would be able to approximate algorithmic (Kolmogorov-Chaitin) complexity rather than, for example, approximating it only by statistical means, by using common lossless compression algorithms that implement measures of entropy rate based on statistical regularities – repetitions – captured by a sliding window traversing a string.

Let (n, 2) be the set of all Turing machines with n states and 2 symbols, and let D(n) be the function that assigns to every finite binary string x the quotient:
$$D(n)=\frac{\#\mathrm{of}\ \mathrm{times}\ \mathrm{that}\ \mathrm{a}\ \mathrm{machine}\;\left(n,2\right)\;\mathrm{produces}\ x}{\#\mathrm{of}\ \mathrm{machines}\ \mathrm{in}\;\left(n,2\right)}$$
(3)
The function D(n,m) is an algorithmic measure based on Levin’s Universal Distribution m(s) approximated by finite means (Soler-Toscano et al. 2014), and it can be used to approximate K(s), the algorithmic complexity of string s by using the algorithmic Coding Theorem as follows (Fig. 3):
$$\mathrm{CTM}(s)\sim -{\mathrm{log}}_2D\left(n,m\right)(s)$$

where CTM stands for Coding Theorem Method. The greater the value of n we use to calculate D(n,m)(s), the better the approximation to K(s).

D(n) can also be seen as averaging K over a large set of possible computer languages, each represented by a small Turing machine, in order to reduce the possible impact of the constant from the invariance theorem.

The CTM allows us not only to build an empirical distribution of computer program outputs from smaller to larger sizes, but once a string is generated for the first time among all the computer programs of the smallest size, we know which Turing machine is the smallest one producing a string and also the runtime of such a Turing machine. We take the runtime of such a Turing machine as an estimation of another seminal measure of complexity, Bennett’s Logical Depth (LD) approximated by CTM. LD is a measure of sophistication that assigns low complexity to both simple and random objects and is defined as the computation time that it takes to decompress an object from its (near) greatest compressed version.

Once the tools and methods are to hand, one can see to what extent the world really runs on a universal mechanical computer algorithmicity (Zenil 2011; Zenil and Delahaye 2010): how much the outcome of a process resembles the way in which outcomes would be sorted according to the universal distribution, and of programmability (Zenil 2013, 2014a, b, 2015); how much such a process can be reprogrammed at will. The more reprogrammable, the more causal, given that a random object cannot be reprogrammed in any practical way other than by changing every possible bit.

A Hybrid Divide-and-Conquer Method

The Block Decomposition Method (BDM) allows us to extend the power of the Coding theorem by using a combination of Shannon entropy and algorithmic probability. The idea is to divide a piece of data into segments for which we have algorithmic probability approximations, and then count the number of times that a local regularity occurs (Fig. 5). The power of LD can also be extended by a BDM-like formula, a multiplicative variation of the BDM formula.

What would these CTM and BDM methods do with, for example, the binary digital expansion of a mathematical constant such as π that we know if of low complexity but is also random-looking? We know statistical approaches would fail to characterize it. But we need to make a distinction between principle and practice. The Coding Theorem method (CTM) can, in principle, and to some extent, in practice, find the algorithmic nature of π. CTM would eventually find that subsequences of π have lower complexity than that estimated by, e.g., Shannon Entropy by an uninformed observer (with no access to the knowledge of the deterministic nature of π). Such an observer would assign maximal entropy to the digits of π because it looks random (there are no apparent repetitions at any level of granularity) where, in contrast, CTM (and thus BDM) would asymptotically find π to be of low algorithmic complexity because it can be generated by short mathematical formulas implemented by short computer programs.

A Sequence of Causal Models Is Itself a Causal Model

Trillions of small computer programs were run to build an empirical probability distribution (Delahaye and Zenil 2012; Soler-Toscano et al. 2014) that approximates what in the theory of algorithmic probability is called the universal distribution (Levin 1974), used as the prior in all sorts of Bayesian calculations that require the framing of hypotheses about generating models or mechanisms that produce an object. Take a binary string of a million 1 s. The theory tells us that the most likely mechanistic generator of 1 s is not print (1 million 1 s) but a recursive program that iterates over print (1) 1 million times, that is, in this case, the shortest possible program. For this sequence the most likely and highest causal generating program can easily be guessed, but for most sequences this is very difficult. However, the more difficult the more random, thus giving us some clues as to the causal content of the object.

The combination of the CTM and BDM algorithms can also be seen as learning the parameters of small Turing machines. The reference machine size will always be a concern, given the invariance theorem that does not guarantee fast convergence. However, we have also shown (Soler-Toscano et al. 2014) that the method is well-behaved and seems to converge because in exploring greater rule spaces according to increasing Turing machine size, it converges in output distribution (Fig. 4).

BDM shows that the best solution to approximating K is to make the best use of both methods using each where relevant and/or practical and thus combining Entropy and algorithmic complexity (Fig. 5). One can decide to capture more algorithmic content with CTM alone but it becomes infeasible (ultimately uncomputable), and then one switches to Entropy, especially if one has already found that an object is of low Entropy, and thus low K. There is no need to waste resources only to rediscover that it is of low K.

For example, while compression algorithms may fail to find the algorithmic content of π, our method may give you a small clue that π is not truly random (maximal K) as Entropy would otherwise suggest. Applying these more powerful measures across several strings, and not only testing against π, provides a handle that one would never reach with statistical tools. So if your segment of π is too long, then BDM will behave asymptotically, like Entropy alone if one does not update the precomputed data from CTM. BDM is optimal for short strings, but for medium strings, from 100 to 1 K bits, it carries the CTM signal to find small signs of algorithmic content, thereby complementing lossless compression algorithms covering the full range of lengths and different information content approximations, showing that the best approaches are hybrid algorithms.

Given that the model machine we use is universal, we know that some machines will implement algorithms generating subsequences of π because π is of low K and thus there are very short programs that can generate it (in full, let alone in pieces). To properly compare Entropy and K approximated by CTM/BDM, one has to normalize max entropy with max K and look at the distributions (see Fig. 4 bottom). One will see then that Entropy approximates the classical Bernoulli distribution, while K with BDM approximates an exponential, just as the theory predicts (Levin’s universal distribution or semi measure (Levin 1974)). The distribution would not yet yield the same order of strings, but one can see the strings that have max Entropy but are not max K – these are the ones gained by applying our methods versus Entropy or compression. Performing a test for very long sequences of π without updating CTM eventually leads to BDM losing the algorithmic edge given by CTM (Zenil et al. 2016), unless ones reruns CTM for (exponentially) longer periods.

The small machine constitutes a mechanistic model because the Turing machine is mechanistic in nature and explains the small piece of data that it outputs upon halting. By putting all small Turing machines together, each explaining a small piece of the larger system, one can take the whole sequence and use it as a model approximation for the whole. If the small patches are causal and with small programs (Turing machines), the sequence will be shorter than the maximum K sequence (similar in size to the object itself), thus providing not only a local approximation of algorithmic complexity but also insight into the causal nature (or otherwise) of the larger system.

Conclusion

We have explained how computability and algorithmic complexity are now at the center of the challenge of causality in science, and while it is more difficult to numerically evaluate, new methods have been advanced to estimate universal measures of complexity. Only universal measures of complexity can characterize any (computable) feature and can be invariant to object description and can encode, in principle, every possible computable function and therefore, potentially, every deterministic system following classical mechanics. It is therefore an extremely powerful measure tantamount to a computational theory of everything, and numerical methods to approximate it give an edge in finding key indications of causal content in a system, which is germane to understanding aspects of evolving causal systems and to developing algorithmic calculi to manipulate them.

Bibliography

1. Delahaye J-P, Zenil H (2012) Numerical evaluation of algorithmic complexity for short strings: a glance into the innermost structure of randomness. Appl Math Comput 219(1):63–77
2. Gauvrit N, Singmann H, Soler-Toscano F, Zenil H (2016) Algorithmic complexity for psychology: a user-friendly implementation of the coding theorem method. Behav Res Methods 48(1):314–329
3. Gauvrit N, Soler-Toscano F, Zenil H (2014) Natural scene statistics mediate the perception of image complexity. Vis Cogn 22(8):1084–1091
4. Gauvrit N, Zenil H, Delahaye J-P, Soler-Toscano F (2014) Algorithmic complexity for short binary strings applied to psychology: a primer. Behav Res Methods 46(3):732–744
5. Gauvrit N, Zenil H, Soler-Toscano F, Delahaye J-P, Brugger P (2017) Human behavioral complexity peaks at age 25. PLoS Comput Biol 13(4):e1005408
6. Gauvrit N, Zenil H, Tegnér J (2017) The information-theoretic and algorithmic approach to human, animal and artificial cognition. In: Dodig-Crnkovic G, Giovagnoli R (eds) Representation and reality: humans, animals and machines. Springer, New YorkGoogle Scholar
7. Gregory J, Chaitin GJ (1966) On the length of programs for computing finite binary sequences. J ACM 13(4):547–569
8. Kirchherr W, Li M, Vitányi P (1997) The miraculous universal distribution. Math Intell 19:7–15
9. Levin LA (1974) Laws of information conservation (nongrowth) and aspects of the foundation of probability theory. Probl Peredachi Inf 10(3):30–35Google Scholar
10. Schmidhuber J (2002) Hierarchies of generalized Kolmogorov complexities and nonenumerable universal measures computable in the limit. Int J Found Comput Sci 13(4):587–612
11. Shannon CE (1948) A mathematical theory of communication parts i and ii. Bell Syst Tech J 27:379–423. and 623–656
12. Soler-Toscano F, Zenil H, Delahaye J-P, Gauvrit N (2014) Calculating Kolmogorov complexity from the output frequency distributions of small Turing machines. PLoS One 9(5):e96223
13. Solomonoff RJ (1964) A formal theory of inductive inference. Part i. Inf Control 7(1):1–22
14. Tegnér J, Zenil H, Kiani NA, Ball G, Gomez-Cabrero D (2016) A perspective on bridging scales and design of models using low-dimensional manifolds and data-driven model inference. Phil Trans R Soc A 374(2080):20160144
15. Wolfram S (2002) A new kind of science. Wolfram Media, Champaign
16. Zenil H (2011) The world is either algorithmic or mostly random, 2011. Winning 3rd place in the international essay context of the FQXiGoogle Scholar
17. Zenil H (2013) A behavioural foundation for natural computing and a programmability test. In: Dodig-Crnkovic G, Giovagnoli R (eds) Computing nature. Springer, New York, pp 87–113
18. Zenil H (2014a) Programmability: a Turing test approach to computation. In: L. De Mol, G. Primiero, (eds) Turing in context, Koninklijke Vlaamse Academie van België voor Wetenschappen en Kunsten (Belgian Academy of Sciences and Arts), Contactforum. Belgian Academy of Sciences and ArtsGoogle Scholar
19. Zenil H (2014b) What is nature-like computation? A behavioural approach and a notion of programmability. Philos Technol 27(3):399–421
20. Zenil H (2015) Algorithmicity and programmability in natural computing with the game of life as in silico case study. J Exp Theor Artif Intell 27(1):109–121
21. Zenil H, Delahaye J-P (2010) On the algorithmic nature of the world. In: Dodig-Crnkovic G, Burgin M (eds) Information and computation. World Scientific, LondonGoogle Scholar
22. Zenil H, Kiani NA, Tegnér J (2016) Methods of information theory and algorithmic complexity for network biology. Semin Cell Dev Biol 51:32–43
23. Zenil H, Kiani N, Jesper T (2017) Low algorithmic complexity entropy-deceiving graphs. Phys Rev E 96(012308)Google Scholar
24. Zenil H, Soler-Toscano F, Dingle K, Louis AA (2014) Correlation of automorphism group size and topological properties with program-size complexity evaluations of graphs and complex networks. Physica A Stat Mech Appl 404:341–358
25. Zenil H, Soler-Toscano F, Kiani NA, Hernández-Orozco S, Rueda-Toicen A (2016) A decomposition method for global evaluation of Shannon entropy and local estimations of algorithmic complexity. arXiv preprint arXiv:1609.00110Google Scholar
26. Zenil H. (2017) Algorithmic Data Analytics, Small Data Matters and Correlation versus Causation. In M. Ott, W. Pietsch, J. Wernecke (eds.), Berechenbarkeit der Welt? Philosophie und Wissenschaft im Zeitalter von Big Data (Computability of the World? Philosophy and Science in the Age of Big Data), Springer Verlag, pp. 453-475Google Scholar