Hey Ryan, great piece. Something that isn't clear to me is: what is the advantage of using the prior? I can understand that it is more mathematically sound. However, in the context of Machine Learning, will I see a greater accuracy from this technique or will I need less tuning to get good results (or something else)?
Also, is this a relatively recent finding or can I expect it to be available on common packages, such as SkLearn?
Finally, how this compares to just using the Rule of Succession? On the wolf hunt example, I would rather add two fake datapoints (one fake success and one fake failure) than have to undergo all this math to reach an apparently similar result.
Thanks :)