Discover and read the best of Twitter Threads about #Bookofwhy

Most recents (24)

I agree. I don't know of any causal quantity that remains undefined in SCM, assuming of course that we take the relation "listens to" as a primitive for defining SCM. As for RCT, the proof in #Bookofwhy is the only defensible proof I know that RCT delivers (asymptotically)
1/3
the Average Causal Effect (ACE) of treatment on outcome. It's strange indeed that ACE isn't treated in the RCT literature as the ultimate goal of the RCT exercise. Taking "comparison" as the goal of RCTs hardly justifies their "gold standard" stature, not to mention expense.
2/3
This is even less justified when the goal is cautiously declared to be "comparison between the two groups participating in the trial" with no reference to a policy decision, nor to the target population eventually affected by the conclusion of the RCT study.
3/3
Read 3 tweets
You hit it on the nail: Statistics 101 book from the future. This is precisely what I had in mind inwriting #Bookofwhy. So, here is an even simpler, more compelling example of why combining observational and experimental data should inform us about individuals behavior:
1/3
It's called "The Curse of Free-will and the Paradox of Inevitable Regret" ucla.in/2N6x36Q.
Here, the reason that free choice turns out disastrous is familiar to all of us: Laziness. We also understand why the consequence of this tempting quality of human nature
2/3
can only be revealed by comparing observational and experimental data. The dictatorial nature of the latter proves its virtue by comparing the average performance of the oppressed to that of the un-oppressed (or un-disciplined). Any one study, alone, cannot do it.
3/3
Read 3 tweets
1/ It might be useful to look carefully at the logic of proving an "impossibility theorem" for NN, and why it differs fundamentally from Minsky/Papert perceptrons. The way we show that a task is impossible in Rung-1 is to present two different data-generating models (DGM) that
2/ generate the SAME probability distribution (P) but assign two different answers to the research question Q. Thus, the limitation is not in a particular structure of the NN but in ANY method, however sophisticated, that gets its input from a distribution, lacking interventions.
3/ This is demonstrated so convincingly through Simpson's paradox ucla.in/2Jfl2VS,
even pictorially, w/o equations, in #Bookofwhy
Thus, it matters not if you call your method NN or "hierarchical nets" or "representation learning", passive observations => wrong answers
Read 4 tweets
1/10 Ao ver o fio abaixo, sobre alta probabilidade de vieses na coleta/interpretação de dados durante a pandemia de #COVID19, retweetado pelo @yudapearl, q recebeu o TuringAward (conhecido como Nobel da computação) pelo seu trabalho em causalidade, caiu a
2/10 ficha que comentei sobre um fenômeno estatístico interessante em uma aula que dei na @uofl, à convite do @DibsonG, e acabei esquecendo de comentar sobre isso em outros lugares. Até cogitei escrever um post sobre isso no meu blog, mas acabou passando batido. Image
3/10 Como quem me acompanha deve imaginar, parte da introdução da aula se deu em torno da dificuldade q costumamos ter em aceitar q correlação não implica em causalidade, e do perigo de condicionar por todas as variáveis do estudo p/ tentar obter essa implicação. No começo da Image
Read 10 tweets
1/ I'm glad, Sean, that our brief exchange has resulted in your great clarification of the issues, from which I have learned a lot. Two thoughts come immediately to mind: (1) It is a blessing that we can enjoy a division of labor between CI and statistics, the former generates
2/ causal estimands, the latter estimate them. Note though that the former is not totally oblivious to the type of data available. Different types of data will result in different estimands. eg.,experimental vs. observational, corrupted by missingness or by proxies or by
3/ differential selection etc. (2) I don't buy the mystification of "collecting adequate data". I am in the business of automating a scientist, so, if there is human judgement involved in the data collection task, we do not stop here and surrender it to humans. We see it as an
Read 4 tweets
1/ I beg to differ. For readers interested in my unbiased opinion, this review is ill-informed, ill-motivated and misleading. Bent on showing that "there is more to it than #Bookofwhy", sweating hard and unsuccessfully to find any such "more to it", the reviewers neglect to tell
2/ readers what they would miss not reading the book. The quote: "causal inquiry cannot be reduced to a mathematical exercise nor automatized." is truly indicative of the occult "more to it" agenda of the review, and its failure to appreciate how much has been automated already.
3/ Another unbiased opinion I have regarding "the particular value of randomized experiments". I challenge anyone to show me a clearer/deeper exposition of RCTs and their "particular value" than that given in #Bookofwhy ch. 4, The Skillful Interrogation of Nature, Why RCTs Work?
Read 3 tweets
1/ Legality aside, I would assess how likely I am to encounter a complication never seen by the AI surgeon, which a human surgeon can handle through her knowledge of anatomy. It actually happened to me, when a famous surgeon quoted us 90% success record but, when asked
2/ how many cases like THAT have you seen? He said: "None. I have never seen a lung as bad a lung as this." We cancelled the surgery, and my wife is still alive. Remarkably, counterfactual logic permits us to bound the probability that a given patient will benefit from a given
3/ treatment, once we have a combination of both experimental and observational data. See tight bounds on PNS (prob. of necessary and sufficient causation) in ucla.in/2Nfphba. Thus, smart ML algorithms will be able to tell a patient how trustworthy they are. #Bookofwhy
Read 3 tweets
1/13 Hoje irei falar para vocês sobre dois conceitos que são menos independentes do que parecem para alguns: Causalidade e predição. Confesso que é até estranho tratá-las como duas coisas separadas, ou diferentes, e espero convencê-los ao... #IA #AI #ML #causality #bookofwhy
2/13 final dessa thread de que essa visão é fundamentada. Na década de 50, Jacob Yerushalmy realizou um estudo onde acompanhou 15 mil crianças da região da baía de São Francisco. Para surpresa de Yerushalmy, e contrariando o que já se mostrava forte na época (que fumar
3/13 fazia mal a saúde), seus resultados indicavam que bebês de mães fumantes nascidos com baixo peso tinham mais chances de sobreviver do que bebês de mães não fumantes nascidos com baixo peso. Não era um estudo de inferência causal, era apenas predição, alguns podem dizer.
Read 14 tweets
I summarize this discussion thus: Progress in our
century amounts to building computational models of mental processes that have escaped scrutiny under slippery terms such as "design" "discovery" "background knowledge"
"familiarity" "fieldwork". etc. Saying "DAGs are not enough"
2/ produces no progress until one is prepared to propose a mathematical model for what IS enough. The IV criterion I presented is necessary for justification, so keep it in mind and teach it to natural experimentalists who should be jubilant seeing their IV's justified. 2/3
3/3 We should always be open to extensions and refinements, but mystification holds us back. We had enough of that in the 20th century. #Bookofwhy
Read 3 tweets
"How to extend causal inferences from a RTC to
a target population?" I am retweeting this thread since many readers ask this question, and many know of the comprehensive solution described in ucla.in/2wbH488 and in ucla.in/2Jc1kdD
using graphical models. 1/2
I was surprised therefore to read Miguel's proposal to recast the whole question in the opaque language of PO, where assumptions are so far removed from scientific understanding. A face-to-face comparison of the two approaches is given here: ucla.in/2L6yTzE
#Bookofwhy.
The thread itself, including Miguel's proposal, can be found here
#Bookofwhy
Read 3 tweets
1/ I am retweeting, because the question: "Compare the PO vs. DAG approaches on applied, 'real-life' data" comes up once a week. Here it is: Take any applied paper labeled "PO" and re-label it "DAG" after adding to it three tiny ingredients: (1) Show how the modeling assumptions
2/ emerge organically from our scientific understanding of the problem, (2) Show if there is anything we can do if it doesn't thus emerge, and (3) test whether those assumptions are compatible with the available data. Once added, Bingo! We have an "applied DAG" approach,
3/3 demonstrated on "real life" data, that can be compared to the "applied PO approach" or "applied quasi-experimental approach" on all counts, especially on: (1) transparency, (2) Power, and (3) Testability. #Bookofwhy
Read 3 tweets
1/3 This is an excellent paper, that every regression analyst should read. Primarily, to appreciate how problems that have lingered in decades of confusion can be untangled today using CI tools. What I learned from it was that the "suppressor surprise" is surprising even when
2/3 cast in a purely predictive context: "How can adding a second lousy predictor make the first a better predictor?" Evidently, what people expect from predictors clashes with the logic of regression slopes. The explanation I offered here ucla.in/2N8mBMg (Section 3)
3/3 shows how the phenomenon comes about, but the reason for the clash is still puzzling: What exactly do people expect from predictors, and why? #Bookofwhy
Read 6 tweets
1/ Continuing our exploration of "Reduced Form Equations" (RFE) and what they mean to economists, let me address some hard questions that CI analysts frequently ask. Q1: Isn't a RFE just a regression equation? A1. Absolutely Not! A RFE carries causal information, a regression
2/ equation does not. Q2: Isn't a RFE just a structural equation? A1. No! Although a RFE carries causal information (much like a structural equation) the RFE may not appear as such in the structural model; it is derived from many such equations though functional composition.
3/ (The output-instrument in the IV setting is a typical example). Q3: One may derive many equations from a structural model; what makes a RFE so special to deserve its own name. A3: It is exceptional because it comes with a license of identification by OLS. This is not usually
Read 5 tweets
1/3 Econ. readers asked if they can get hold of those magical night-vision goggles that tell us which causal effects in an econ. model are identifiable by OLS (and how). The answer is embarrassingly simple: Consider
Model 2 in p. 163 of ucla.in/2mhxKdO
2/3 Take any two variables, say Z_3 and Y. If you can
find a set S of observed variables (e.g., S={W_1,W_2}),
non-descendants of Z_3, that block all back doors paths from Z_3 to Y, you are done; the coefficient a in the regression Y = a*Z_3 + b1*W_1 +b2*W_2
gives you the right
3/3 answer. A similar goggle works for a single
structural parameter (See page 84 of Primer ucla.in/2KYYviP). Duck soup. This re-begs the question whether the restricted notion of "reduced form" is still needed in 2019. @EconBookClub @lewbel #Bookofwhy.
Read 3 tweets
1/4 I would like to welcome the 500 new followers who
have joined us on Tweeter since @SamHarrisOrg posted our conversation on his podcast. Welcome to the wonderful land of WHY and, please, be aware of what you are getting yourself into. Our main theme is the #Bookofwhy and
2/4 the way it attempts to democratize the science of cause and effect and apply it in artificial intelligence, philosophy, and the health and social sciences, see ucla.in/2KYvzau. We alert each other to new advances in causal reasoning and new methods of answering causal
3/4 questions when all we have are data, assumptions and the logic of causation. We also debate detractors and nitpickers who mistrust fire descending from Mt. Olympus for use by ordinary mortals. I spend time on such debates knowing that for every nitpicker there are dozens of
Read 4 tweets
I have just posted my 3rd comment on Lord Paradox errorstatistics.com/2019/08/02/s-s…. Here, I returned to my original goal of empowering readers with an understanding of the origin of the paradox and what we can learn from it. Red herrings have been taking too much of our time. #Bookofwhy
It is important to add that the huge literature on Simpson's and Lord's paradoxes attests to a century of scientific frustration with a simple causal problem, deeply entrenched in out intuition, yet helplessly begging for a formal language to get resolved. I can count dozens of
red herrings thrown at this stubborn problem, to deflect attention from its obvious resolution. Why? Because the latter requires the acceptance of a new language - the hardest transition for adults. I am grateful for the opportunity to talk to thousands of young followers here
Read 5 tweets
Any discussion of "paradoxes" is really an exercise in psychology. Yet we, quantitative analysts, are trying to avoid psychology at all cost. We can't. We must explicate why two strong intuitions seem to clash, and the conditions under which our intuitions fail. See #Bookofwhy
That is why I am begging folks: "Please, do not tell me 'I am not entirely satisfied' before you tell me why you are surprised (by the paradox) ". I am proud that #Bookofwhy addresses this question (of "surprise") head on, before offering "a resolution".
Now, speaking specifically about Lord's paradox, the paradox was introduced to us in "asymptotic" terms (ie, using distributions, not samples) and we were surprised. Is it likely that we can resolve our surprise by going to finite samples? or to "block design"? #Bookofwhy
Read 3 tweets
1/ I am recommending this paper to every data analyst educated by traditional textbooks, which start with
regression equations, add and delete regressors, estimate and compare coefficients before and after
deletion, and then ask which coefficient has "causal interpretation"
2/ I was shocked to realize that the majority of data analysts today are products of this culture, trapped in endless confusion, with little chance to snap out of it, since journal editors, reviewers and hiring committees
are trapped in the same culture. The new PO framework does
3/ offer a theoretical escape route from this culture, through the assumption of "conditional ignorability" but, since it is congnitively formidable, practicing analysts must rely on regression arguments.
Keele et al examine a causal model (Fig.2) and ask: suppose we regress Y
Read 5 tweets
1/ I just read arxiv.org/pdf/1905.11374… and I agree with @suchisaria. Anyone concerned with stability (or invariance) should start with this paper to get a definition of we are looking for and why. Arjovsky etal paper should be read with this perspective in mind, as an attempt to
2/ secure this sort of stability w/o having a model, but having a collection of varying datasets instead. The former informs us when the latter's attempts will succeed or fail. It is appropriate here to repeat my old slogan: "It is only by taking models seriously that we learn
3/ when they are not needed". I wish I could quote from Aristotle but, somehow, the Greeks did not argue with their Babylonian rivals, the curve-fitters. They just went ahead and measured the radius of the earth AS IF their model was correct, and the earth was round. #Bookofwhy
Read 3 tweets
1/3 While struggling to find a ladder or demarcation lines between rungs, Hand's paper called my attention to a comprehensive article by Shmueli (2010) "To Explain or to Predict?", rich with references and quotes, which states: "Although not explicitly stated in the methodology
2/2 literature, applied statisticians instinctively sense that predicting and explaining are different." projecteuclid.org/download/pdfvi…
Going through the quotes and references in Hand's paper, I believe they confirm Shmueli's verdict: "the statistical literature lacks a thorough
3/3 discussion of the many differences that arise in the process of modeling for an explanatory versus a predictive goal." One may also note that, adding to a "thorough" discussion, the Ladder of Causation provides "formal distinctions" between those differences. #Bookofwhy
Read 3 tweets
1/4 Happy Anniversary! About a year ago, I started Twitting and posted this:
6.27.18 - Hi everybody, the intense discussion over The Book of Why
drove me to add my two cents. I will not be able to comment on every
tweet, but I will try to squeak where it makes a difference....
2/4 A year later, I can hardly believe the 2,300 Tweets behind me, 19.1K followers, and a pleasant sense of comradeship with the many inquisitive minds that have been helping me demystify the science of cause and effect. I have benefited immensely from seeing how causality is
3/4 bouncing from your angle and how it should be presented to improve on past faults. To celebrate our anniversary, I am providing a link to a search-able file with all our conversations (my voice) sorted chronologically web.cs.ucla.edu/~kaoru/jp-twee… Today I re-read selected sections
Read 4 tweets
1/3 Economists' "statistical discrimination," as it turns out, is both (1) the use of statistical associations to discriminate and (2) an attempt to define discrimination using statistical vocabulary alone. According to Phelps (1972), you discriminate whether you hire people by
2/3 education, race or zip code, as long as you base decisions on PREDICTED performance, rather than performance itself. So, all learning is
"discriminatory" for it is based on past experience which PREDICTS, yet is not equal, the situation at hand. Conclusion: I was right
3/3 to suspect criteria entitled "statistical discrimination" and their ability to capture notions such as "fairness," in which causal relations play a major role. @JaapAbbring @steventberry #Bookofwhy
Read 3 tweets
1/3 If by "statistical discrimination" we mean the use of statistical associations in the decision, then it is perfectly harmonious with the causal definition of "discrimination" and I see nothing wrong with it. What makes me suspicious are attempts to DEFINE discrimination by
2/3 statistical criteria. Note that in your example, causal considerations are unavoidable, for if the observed characteristics causally produce/prevent necessary skills, our notion of discrimination would change. This sensitivity to causal relations is absent from the fairness
3/3 literature I have sampled thus far, and which seems to dominate recent discussions in ML. Note also that we now have the tools to define and manage criterion based on combined statistical+causal relations. BTW, the link you gave us is blocked. #Bookofwhy
Read 3 tweets
1/3 This paper identifies causal effects by proxies, a task shown feasible here ucla.in/2N5GNOK and here ucla.in/2N9icIX. Blocking back doors is a sufficient condition for identification, not necessary; it turns out that, under certain circumstances, a proxy
2/3 can replace a blocker Z. Going from ATE to ITE ("individual" treatment effect) is not really hard (theoretically) if by "individual" we mean "c-specific effects" where c is a set of characteristics marking the individual, namely X=x.
The nice thing about the paper you cited,
3/3 is that everything is spelled out in a language CI folks can understand, so that mysteries can be de-mystified. The authors should be commended. Effect-restoration was a breathtaking mystery to me in 2010, and it shows in the writing; I called it "far from obvious" #Bookofwhy
Read 3 tweets

Related hashtags

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!