The likely implied thought here is that it would be linked to the Covid vaccines. However, I believe the most likely explanation is provided by this citation:
> To find out how many deaths actually occurred during the last two decades among FIFA players (2001-2020), we used Wikipedia - "List of association footballers who died while playing". To know how many cases occurred in 2021, we used the list collected by us in "Real-Time News" (which includes the cases noted in Wikipedia for 2021).
There is an obvious sampling bias, and they likely collected much more cases for 2021 than for other years. Looking at mortality statistics by cause at nation level would be much more rigorous.
My understanding of it after a quick read of the paper: you want to make a suction cup. The usual way is with a solid cup (think plastic) with a softer rubber-like ring around it. Near the ring, you will have atmospheric pressure (high) outside, and vacuum pressure (low) inside, so if the ring doesn't make perfect contact with the surface, some air is going to come in and ruin you vacuum. What they are doing is that they are rotating a bit of water in the suction cup, which because of centrifugal force will come close to the suction cup frontier in a ring-like shape. This water ring will -- thanks to fluid mechanics black magic -- have a different pressure at its exterior and its interior. Its interior pressure will necessarily be the same as the vacuum, and you can make it so that the pressure outside is the same as the atmospheric pressure, hence, according to this paper, if the rubber ring fails to make hermetic contact, air won't come in because at the frontier of the cup, the pressure is the same both outside (atmosphere) and inside (exterior of the water ring).
Does the water on the innermost edge of the rotating ring, which is exposed to the vacuum, NOT vaporize because the absolute pressure is still above 10kPa (100kPa atmospheric - 80kPa vacuum)? [0]
Will the seal slowly evaporate away or absorb into a porous surface like concrete?
Pretty much anyone over 40 doing a home video in their garage is going to come across like that. It's really unfortunate, but appearances do matter, a lot.
In this case, besides the appearances, what the guy is demonstrating in the videos seems legit.
Here's an earlier video where he explains what he believes is the effect at work, with a simple demonstration: https://youtu.be/I3g0CcLzC6I
It would be awesome if a youtuber like Steve Mould or Dustin from SmarterEveryDay did a video on this.
Thanks for your comments! I don't see how what is discussed here conflicts with the notation I introduced into the post, do you still believe there is a soundness issue in what I have written?
I'm just saying that the notation of say A * X|Y * B seemed unfamiliar to me. I only know conditional notation within a P(...). Or an expectation, etc. Apparently your way of writing is used by others as well, but it may be good to know that it is not fully rigorous.
Again, there are different people preferring different presentations. I as a student was often frustrated by abused notations and was often confused by such things when trying to understand something in detail. For a more cursory and "practical" understanding it could be good enough.
> it may be good to know that it is not fully rigorous
What is the problem with A|B=b being a random variable? (Apart from you unfamiliarity with the concept, I mean.)
Edit: I don’t say there are no problems, I ask what do you think the problem is? There is no problem in the discrete case. In the continuous setting things are indeed more complicated (but if the limiting process is well defined there are no issues).
Note that the same lack of rigour that you find in conditional ramdom variables affects conditional probabilities. If you can accept the latter there is no reason to reject the former.
A random variable is different concept from a distribution. For me personally it is helpful to keep them separate, but I can see that others may not care about the complete conceptual picture.
In the PDF file linked above I can see conditional probabilities, conditional distributions and conditional expectation etc, which are all valid and rigorous. I can see that the author thinks it's a good idea to merge these into a single concept of conditional random variable for didactic reasons, but that's not a rigorous concept.
Practically, if you have two random variables then you can take their joint distribution. What would be the joint distribution of (A|B) and (C|D)? For actual random variables it's simple: you can take intersections in event space, but a "conditional random variable" does not correspond to any subset of the event space.
Very simply speaking (this is my working model, not the exact precise math definition which involves a lot of measure theory): in probability theory we have an event space containing atomic events that cover all possible outcomes for the whole experiment/observation. A random variable is a function that maps from each such potential (atomic) event to a number. That's right. The random variable is a function but not the mass function, which maps from a number to a probability.
Conditional probability P(A|B) is an expression defined to mean P(A,B)/P(B). That's a clear definition. I am yet to see the actual definition of a conditional random variable.
Again, disclaimer 1: I can see the practicality of disregarding formality. Still I argue this is best done only when you do know better but it would be tedious to be technically correct all the time. But as a beginner I find it more useful to keep track of the correct concepts. For example not distinguishing random variables and distributions can be very confusing when considering more advanced things, like mutual information and KL-divergence. The former operates on random variables, the latter on distributions. I remember this was a difficult realization for me because the material we used didn't emphasize the difference enough, probably in the name of practicality.
> Practically, if you have two random variables then you can take their joint distribution.
If they are defined in the same sample space.
> a "conditional random variable" does not correspond to any subset of the event space
I would say it's exactly the other way around, the domain of a "conditional random variable" is a subset of the domain of the "unconditioned" random variable (the subset where the conditioning holds).
I think it will help if you think in terms of conditioning on (for example, a coarser sigma algebra). You would get another random variable that is measurable on the sigma algebra you conditioned on. If that is coarser so would be the new function you obtained by conditioning.
Let's talk about a fair dice roll to make it concrete, and let the rolled number be X and let the event that we rolled an even number be E. P(X=6|E) = 1/3. P(X|E) is a distribution where 1,3,5 has 0 probability mass and 2,4,6 have 1/3 each.
If we consider X|E as a random variable, what is its value if we roll an odd number? Undefined? What does that mean? Random variables always have some value.
Sure you can build a new event space (sigma algebra) but then you can't use random variables over the original one.
Let's consider two independent rolls, X and Y. You can't compute the joint distribution P(Y, (X|E)), it just doesn't make sense as the two "variables" are defined over different spaces. Note that this is not the same as P(X,Y | E). The latter is simple a conditional probability, without any concept "conditional random variables".
Again, this is totally obvious to people who have experience with probabilities, but could be confusing to students. Such cases are where students who try to understand the details may be left more confused than students who just want to get the main idea.
Sure you can. The TLDR would be "piecewise constant projection"
I think picking up a standard graduate probability book will clear this up better than any long comment trail. There are no problems defining a coarser sigma algebra using an original one and then defining a function measurable on the new sigma algebra. Note this continues to be an r.v. in the original space as meaurability is preserved. A consistent definition the values of the conditioned r.v. would be the piecewise constant approximation of the original r.v. over the indivisible elements of the coarser sigma algebra.
Let me try another route.
You seem to be accepting of a conditional expectation. Now what is a conditional expectation if not a function. Now all we need is that function be measurable with respect to the new sigma algebra, thats ensured byconstruction. Hope it helped some
> I think picking up a standard graduate probability book will clear this up better than any long comment trail.
Can you recommend one? I just picked up Probability and Measure by Billingsley and it does not mention "conditional random variable" a single time in over 600 pages. It does have a lot of "conditional probability", "conditional distribution", "conditional expectation" etc.
> You seem to be accepting of a conditional expectation.
Conditional expectation is defined in terms of conditional probabilities, and those are in turn explicitly defined as P(A|B)=P(A,B)/P(B), so there's nothing not to accept.
Billingsley is pretty darn good. It might have left the connection as a dotted line given that the notion is no different from conditional expectation. The only connection you have to make is conditional expectation is a function and a random variable. You must have seen expectation taken of a conditional expectation. That should should convince you that condititional expectation is indeed a random variable. Since that r.v. was obtained by conditioning its not a stretvh to call it a conditioned r.v.
Any book that explains conditioning over a sigma algebra should suffice. You could try Loeve, Dudely or Neveu but dont remember if its mentioned explicitly.
BTW conditional expectation is really more fundamental than conditional probability. Its the former that yields the latter in measure theoretic probability. If you want to drink from the source that would be Kolmogorov.
Finally if you are reading Billingsley you are adequately qualified to call yourself a mathematician.
It's getting a little tedious. Please show me a concrete citation of a serious textbook (not a tutorial/handout by a grad student or a paper by a random researcher) that puts the three words "conditional random variable" next to each other (consistently, not simply as a one-off potential mistake). Google doesn't show serious sources for it.
While I agree with isolated points of your comment I think it doesn't add up to a useful/coherent concept of conditional random variable.
Thats a little too much to ask, perhaps if they were grep'able I could have obliged, unfortunately I dont have a photographic memory.
More concretely its just another name for conditional expectation. I am assuming you are aware that conditional expectation is a random variable obtained via conditioning (equivalently as a piecewise approximation in L_2). If you arent familiar with that view point that would be the place to start. Kolmogorov, Neveu, Dudely, Billingsley will all cover that view point.
> I am assuming you are aware that conditional expectation is a random variable
That's not what we're considering here, but things of the form X|Y=y for a concrete y. Even as E[X|Y=y], that's not a function, y is specified. Do you agree we shouldn't call X|Y=y a conditional random variable?
The expectation E[X|Y=y] is a fixed value. (Edit: it’s the expectation of the random variable “X|Y=y”, while E[X|Y] is a random variable because it’s a function of the random variable Y: for each element in the sample space there is a corresponding value of “y” and in turn there is a value of the expectation E[X|Y=y].)
X|Y=y (as used in the blog post being discussed) is a random variable: it’s a function from a subset of the original sample space (corresponding to the elements for which the value of the random variable Y is y) to real values (or whatever the image of the X random variable is).
> If we consider X|E as a random variable, what is its value if we roll an odd number? Undefined? What does that mean? Random variables always have some value.
Random variables have some value on their domain, and for the random variable X | E=1 the sample space is restricted to the elementary events {2,4,6} which conform the composite event E=1. The original sample space is partitioned in the subspaces {1,3,5} and {2,4,6} when we condition on the values of the random variable E (0:odd, 1: even).
> Sure you can build a new event space (sigma algebra) but then you can't use random variables over the original one.
I guess we all agree then.
> Let's consider two independent rolls, X and Y. You can't compute the joint distribution P(Y, (X|E)), it just doesn't make sense as the two "variables" are defined over different spaces.
The variables X and Y describing independent rolls are also defined over different spaces and to have a joint distribution you have to define a "common" sample space of the form {x=1,y=1},{x=2,y=1},..,{x=6,y=6}.
You could do the same for a roll of a dice and the toss of a coin. Or do you think that computing the joint distribution of a coin toss and a dice roll doesn't make sense because they are defined over different spaces?
> You could do the same for a roll of a dice and the toss of a coin. Or do you think that computing the joint distribution of a coin toss and a dice roll doesn't make sense because they are defined over different spaces?
Of course it doesn't! You first have to define them on a common space (the Cartesian product), and for that you have to specify their joint probabilities. One example might be that you model them as independent. Otherwise we wouldn't know how the coin and the dice relate. Sure independence is usually a good default assumption, but it's still a necessary step.
What did you mean with the following paragraph then?
> Let's consider two independent rolls, X and Y. You can't compute the joint distribution P(Y, (X|E)), it just doesn't make sense as the two "variables" are defined over different spaces.
Do you agree that you cannot compute the joint distribution P(Y,X) either because the two variables are defined over different spaces?
If you mean that the space for this single experiment composed of two rolls (random variables X and Y) is the cartesian product of {x=1,x=2,x=3,x=4,x=5,x=6} and {y=1,y=2,y=3,y=4,y=5,y=6}, then I agree.
But the fact that each variable alone is defined on the "same" sample space {1,2,3,4,5,6} is irrelevant.
The situation is no different from the joint probability for random variables X and Z corresponding to a single experiment consisting of a dice roll and a coin toss, where the relevant space is the cartesian product of {x=1,x=2,x=3,x=4,x=5,x=6} and {z=1,z=2}.
And it is also similar for the situation you asked about, with a random variable Y and a "conditional" random variable X|Even. The relevant space is the cartesian product of {y=1,y=2,y=3,y=4,y=5,y=6} and {x=2,x=4,x=6}.
Let's consider something with less independence, because it makes things harder to notice. Temperature indoors T1, temperature outdoors T2, IsOvercast O.
Let's say T2|O=1 is a "conditional random variable". Let's consider the average temperature indoors and outdoors. What would ((T1|O=1) + T2)/2 even mean? How could you use the two "variables" in the same expression? What is even their joint distribution? They are defined over different spaces!
This means, we must always carefully condition all variables used together on the exact same things. So ((T1|O=1) + (T2|O=1))/2 is valid. But then why do this on all variable instances that we use? It would be very tedious. At some point we want to get to a distribution (or some function of a distribution, like the expectation or variance), so it's much simpler to say for example P((T1 + T2)/2 | O=1), which is just a good old conditional distribution. Conditioning is an operation on a distribution and in my mind the bar (|) is really a slot in the P() notation and is short for P(A,B)/P(B). A bar popping up elsewhere (like in expectations) must be directly determined by the distribution (a random variable is not).
Overall, since you cannot mix differently conditioned "conditional random variables" in a single expression, you may just as well put your conditioning on the side of the whole expression in the P().
> How could you use the two "variables" in the same expression?
Do you expect to be able to use every random variable which can be conceived in the same expression?
If you object to the name “conditional random variable” [+] that’s debatable, but if you say that the resulting thing is not a random variable I think you are wrong.
Another thing that is a random variable, even though I suspect you may not like it, is the probability distribution of a random variable.
[+] which I don’t think was actually used by the OP, by the way.
Thanks for your comment (and thanks to everyone who answered below). I have been way too many times in the same situation where someone says something is "trivial" or "easy" so I completely get your point and will try not to make the same mistake for the next posts.
Regarding your question, I think the answers you have had are on point!
I am neither a lawyer nor a US citizen, but I think the ruling was not on "whether or not discrimination took place", but rather on "if such discrimination had taken place, would the 1st Amendment of the constitution have allowed it as 'editorial freedom'".
I didn't read the article, but that's usually how things work. Also a lawyer I know that does patent stuff says law firms will file motions they know have no hope of success because it's free money.
More like it's a lot easier to argue at the beginning that the plaintiff has no standing to sue than it is to go through a whole discovery process and trial to hope for a judgement in your favor. Let alone the risks of the discovery process itself.
You have to up and up hard bilk your clients or do something illegal to get the bar pissed at you.
Sometimes it's not the lawyers but the clients with deep pockets. My friend spent ten years litigating one case over MOSFET patents for an unhinged client. And they lost 95% of the time against another well funded company.
Me and two other researchers have published a paper[1] using Valiant's Probably Approximately Correct learning to learn regulatory Gene networks, which can be interesting if you want to dig deeper !
If anyone has questions on the topic, feel free to ask, I'll keep an eye on the thread.
[1]A. Carcano, F. Fages, and S. Soliman, “Probably Approximately Correct Learning of Regulatory Networks from Time-Series Data,” presented at the CMSB’17 - 15th International Conference on Computational Methods for Systems Biology, 2017, vol. Lecture Notes in Computer Science, pp. 74–90.
https://hal.archives-ouvertes.fr/hal-01519826v2
Thanks for the paper link! I've been digging into PAC learning so I'm taking anything I can get.
On the paper, I was hoping you could help me understand a few things:
- It seems that the main finding is that any k-CNF form, like Thomas' Boolean Regulatory Network, can be expressed by PAC learning bounds. In section 4 of the abstract [1], you mentioned that "when the dimension increases... the PAC learning algorithm can leverage available prior knowledge...". Are you referring to the time dimension adding more clauses to the k-CNF?
- I'm having trouble reconciling the PAC term "h" with "model confidence" in section 5.2. Is this allowed because the PAC learning "delta" (probability) [2] parameter is dropped for the k-CNF adaptation?
- In this concrete case, is the learning portion just the mapping the stochastic traces to outputs (i.e. lookups)? I'm missing some understanding on how such a mapping handles stochasticity.
You'll have to forgive me, as I'm still trying to understand the paper. It's incredibly interesting to me, so thanks for writing it!
Unfortunately the final version of the paper is not the one that is on hal -- which is an older one, and I must concede that being new to this whole publishing world, I don't know exactly how you can have access to it. I'll try to sort it out and get back to you. In the mean time, you can refer to the hal version.
In section 4 of the abstract [1], you mentioned that "when the dimension increases... the PAC learning algorithm can leverage available prior knowledge...". Are you referring to the time dimension adding more clauses to the k-CNF?
I think this sentence is actually referring to what is presented in section 5.3 of the final (and hal version) paper.
I'm having trouble reconciling the PAC term "h" with "model confidence" in section 5.2. Is this allowed because the PAC learning "delta" (probability) [2] parameter is dropped for the k-CNF adaptation?
I think there are two things here.
First the "delta" you are referring to is indeed taken to be equal to the "epsilon" and both are what we call "h" in section 2.1.
Second, the idea of the discussion in section 5.2 is the following: let's say you fix a number A of initial states and simulate B steps of for each of these states. You will have a given number of (de)activation samples for each (de)activation function. Then, accross all 2n (de)activation functions, take the minimum number of samples you got, and call it Lmin. Then using results from section 2.2, with S=(2n)^(k+1), you can find h so that Lmin=2h(S+log(h)). This h will be your "model confidence".
However, if we didn't have one "h" but one "delta" and one "epsilon" (wikipedia notations), I guess we would not have a given value but only a relation between the two (i.e. defining one would define the other).
I'm afraid I don't get your last point.
I'm new to HN so I hope this answer will be somewhat correctly formatted.
Thanks, these are great responses! Also, I apologize for not fixing my own formatting sooner, which makes the questions almost impossible to read. You answered my last question about stochastic traces by explaining the de/activation function relationships with Lmin.
Interesting work! We definitely need far better inference models in computational biology than what we currently have. I agree with your paper that modeling is unfortunately more of an "art" than a science nowadays... and with huge societal consequences.
Having said that, on a cursory read I think you may be misapplying Valiant's algorithm...
In particular, the original (union bound) PAC guarantee relies crucially on IID samples, so you cannot straightforwardly apply it to time series data and expect the guarantee to hold unchanged. Instead, you should use block bootstrap methods to sample consecutive segments of your time series of a certain size --in which case a (possibly weaker) PAC-like guarantee might hold, provided the dependence across time decays sufficiently fast [1].
I'm also a bit concerned about the semantics of your approach, since I thought gene regulatory inference was/is notoriously intractable, and Valiant's model is very stringent and conservative... So IMO somewhere along the line you are getting a massive free lunch simply by reducing to k-CNF!
Not saying it's wrong per se of course; but I couldn't easily tell exactly where the 'trick' is... So if I were you I would try to communicate more clearly (to dumb non-experts like me) how exactly this particular reduction captures something highly non-trivial in gene regulatory networks to achieve such a (seemingly) drastic speedup..
OK, I think I narrowed the 'trick' down. It's actually an interesting existential question.
Correct me if I'm wrong but I think the key step is the use of "positive Boolean semantics"; which, as your Ref. 9 proves, are substantially weaker --and hence, unsurprisingly, far more tractable-- than more conventional "stochastic" or "differential" semantics...
But then Ref. 9 [1] goes on to make, I think, a frankly astonishing, Church-Turing like existential claim in Biology (Sec 3.2, infra):
[...]if a behavior is not possible in the boolean semantics, it is surely not possible in the stochastic semantics whatever the influence forces are.
If that is the case, that would IMO have huge consequences! It would mean, then, that some of the underlying machinery of Biology may turn out to be far simpler than we think: no more pesky self-loops or bistable, mutually inhibitory modules to deal with! Tractable network inference, at last! It would potentially revolutionize computational biology, if true.
But, is it true? I think I see the intuition, but I don't think the case is as clear-cut, with that single "surely" carrying way too much of the rhetorical work... Indeed, the claim hinges on what I think is a rather interesting, non-trivial existential question: informally, if 'something' (of a given type) cannot be denoted in a certain weaker type, does that mean that 'something' cannot exist?
Anyway, not your paper per se; but I think it's an interesting debate nonetheless.