6 days ago
AF - Probabilistic Payor Lemma? by Abram Demski
Link to original article
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Probabilistic Payor Lemma?, published by Abram Demski on March 19, 2023 on The AI Alignment Forum. Epistemic status: too good to be true? Please check my math. We've known for a while that Löb's theorem fails when proof is relaxed to probabilistic belief. This has pros and cons. On the pro side, it means there's no Löbian Obstacle to probabilistic self-trust. On the con side, it means that some Löb-derived insights for proof-based decision theory don't translate to probabilistic decision theory, at least not as directly as one might hope. In particular, it appeared to dash hopes for probabilistic generalizations of the "Löbian handshake" for cooperation. Recently, Andrew Critch wrote about the Payor Lemma, which allows for a very similar "modal handshake" without Löb's Theorem. The lemma was proved using the same modal assumptions as Löb's, so on the surface it may appear to be just a different method to achieve similar results, whose main advantage is that it is much easier to prove (and therefore explain and understand) than Löb's Theorem. But, a natural question arises: does Payor's Lemma have a suitable probabilistic version? I'll give an affirmative proof; but I haven't confirmed that the assumptions are reasonable to my satisfaction. Setup Let L be a language in first-order logic, expressive enough to represent its sentences s∈L as quoted terms ┌s┐, eg, through Gödel numbering; and with a probability function symbol on these terms, p(┌s┐), which can be equated with (some representation of) rational numbers, e.g. p(┌⊤┐)=1, p(┌s┐)=12, etc. I also assume the system can reason about these rational numbers in the basic ways you'd expect. For all a,b∈L and all r∈Q, we have: If ⊢a, then ⊢p(┌a┐)=1. If ⊢ab, then ⊢p(┌a┐)≤p(┌b┐). (These assumptions might look pretty minimal, but they aren't going to be true for every theory of self-referential truth; more on this later.) Let B(s) abbreviate the sentence p(┌s┐)>c for any s and some globally fixed constant c strictly between 0 and 1. This is our modal operator. Some important properties of B: Necessitation. If ⊢s, then ⊢B(s), for any s. Proof: Since ⊢s implies ⊢p(s)=1, and c∈(0,1), we have ⊢p(┌s┐)>c,, which is to say, ⊢B(s). [End proof.] Weak distrubitivity. If ⊢xy, then ⊢B(x)B(y). Proof: When ⊢xy, we have ⊢p(y)≥p(x), so ⊢p(x)>cp(y)>c. [End proof.] (Regular distributivity would say B(xy) implies B(x)B(y). The assumption ⊢xy is stronger than B(xy), so the above is a weaker form of distributivity.) Theorem Statement If ⊢B(B(x)x)x, then ⊢x. Proof ⊢x(B(x)x), by tautology (a(ba)). So ⊢B(x)B(B(x)x), from 1 by weak distributivity. Suppose ⊢B(B(x)x)x. ⊢B(x)x from 2 and 3. ⊢B(B(x)x) from 4 by necessitation. ⊢x from 4 and 1.[End proof.] Discussion Comparison to Original Proof The proof steps mirror Critch's treatment very closely. The key difference is step 2, IE, how I obtain a statement like ⊢□x□(□xx). Critch uses distributivity, which is not available to me: B(ab)(B(a)B(b))? Suppose B(ab), ie, p(┌ab┐)>c. Rewrite p(┌b∨¬a┐)>c. Now suppose B(a), that is, p(┌a┐)>c. p(┌¬a┐)<1−c. p(┌b∨¬a┐)≤p(┌b┐)+p(┌¬a┐)
p(┌b∨¬a┐)−1+c>c−1+c. p(┌b┐)>2c−1. So we only get: Bc(ab)(Bc(a)Bd(b)), where Br(s) abbreviates p(┌s┐)>r and we have d=2c−1. So in general, attempted applications of distributivity create weakened belief operators, which would get in the way of the proof (very similar to how probabilistic Löb fails). However, the specific application we want happens to go through, due to a logical relationship between a and b; namely, that b is a weaker statement than a. This reveals a way in which the assumptions for Payor's Lemma are importantly weaker than those required for Löb to go through. So, the key observation I'm making is that weak distributility is all that's needed for Payor, and seems much more plaus...