redoules.github.io/blog/Statistics_10days-day2.html
Guillaume 44f740504b added an article
about uploading data to a sharepoint site
2020-07-20 20:20:09 +02:00

478 lines
23 KiB
HTML
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<!DOCTYPE html>
<html lang="fr">
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1">
<!-- The above 3 meta tags *must* come first in the head; any other head content must come *after* these tags -->
<meta name="description" content="Data Science for Political and Social Phenomena">
<meta name="author" content="Guillaume Redoulès">
<link rel="icon" href="../favicon.ico">
<title>Day 2 - Probability, Compound Event Probability - Blog</title>
<!-- JQuery -->
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.3/jquery.min.js"></script>
<script>
window.jQuery || document.write('<script src="../theme/js/jquery.min.js"><\/script>')
</script>
<!-- Bootstrap core CSS -->
<link rel="stylesheet" href="../theme/css/bootstrap.css" />
<!-- IE10 viewport hack for Surface/desktop Windows 8 bug -->
<link rel="stylesheet" type="text/css" href="../theme/css/ie10-viewport-bug-workaround.css" />
<!-- Custom styles for this template -->
<link rel="stylesheet" type="text/css" href="../theme/css/style.css" />
<link rel="stylesheet" type="text/css" href="../theme/css/notebooks.css" />
<link href='https://fonts.googleapis.com/css?family=PT+Serif:400,700|Roboto:400,500,700' rel='stylesheet' type='text/css'>
<!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries -->
<!--[if lt IE 9]>
<script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script>
<script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>
<![endif]-->
<meta name="tags" content="Basics" />
</head>
<body>
<div class="navbar navbar-fixed-top">
<div class="container">
<div class="navbar-header">
<button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-collapse">
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<a class="navbar-brand" href="..">Guillaume Redoulès</a>
</div>
<div class="navbar-collapse collapse" id="searchbar">
<ul class="nav navbar-nav navbar-right">
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-haspopup="true" aria-expanded="false">About<span class="caret"></span></a>
<ul class="dropdown-menu">
<li><a href="../pages/about.html">About Guillaume</a></li>
<li><a href="https://github.com/redoules">GitHub</a></li>
<li><a href="https://www.linkedin.com/in/guillaume-redoul%C3%A8s-33923860/">LinkedIn</a></li>
</ul>
</li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-haspopup="true" aria-expanded="false">Data Science<span class="caret"></span></a>
<ul class="dropdown-menu">
<li><a href="..#Blog">Blog</a></li>
<li><a href="..#Python">Python</a></li>
<li><a href="..#Bash">Bash</a></li>
<li><a href="..#SQL">SQL</a></li>
<li><a href="..#Mathematics">Mathematics</a></li>
<li><a href="..#Machine_Learning">Machine Learning</a></li>
<li><a href="..#Projects">Projects</a></li>
</ul>
</li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-haspopup="true" aria-expanded="false">Projects<span class="caret"></span></a>
<ul class="dropdown-menu">
<li><a href="https://github.com/redoules/redoules.github.io">Notes (Github)</a></li>
</ul>
</li>
<!--<li class="dropdown">
<a href="../feeds/blog.rss.xml">Blog RSS</a>
</li>-->
</ul>
<form class="navbar-form" action="../search.html" onsubmit="return validateForm(this.elements['q'].value);">
<div class="form-group" style="display:inline;">
<div class="input-group" style="display:table;">
<span class="input-group-addon" style="width:1%;"><span class="glyphicon glyphicon-search"></span></span>
<input class="form-control search-query" name="q" id="tipue_search_input" placeholder="e.g. scikit KNN, pandas merge" required autocomplete="off" type="text">
</div>
</div>
</form>
</div>
<!--/.nav-collapse -->
</div>
</div>
<!-- end of header section -->
<div class="container">
<!-- <div class="alert alert-warning" role="alert">
Did you find this page useful? Please do me a quick favor and <a href="#" class="alert-link">endorse me for data science on LinkedIn</a>.
</div> -->
<section id="content" class="body">
<header>
<h1>
Day 2 - Probability, Compound Event Probability
</h1>
<ol class="breadcrumb">
<li>
<time class="published" datetime="2018-11-09T20:01:00+01:00">
09 novembre 2018
</time>
</li>
<li>Blog</li>
<li>Basics</li>
</ol>
</header>
<div class='article_content'>
<h2>Basic probability with dices</h2>
<h3>Problem</h3>
<p>In this challenge, we practice calculating probability. In a single toss of 2 fair (evenly-weighted) six-sided dice, find the probability that their sum will be at most 9.</p>
<h3>Mathematical explanation</h3>
<p>A nice way to think about sums-of-two-dice problems is to lay out the sums in a 6-by-6 grid in the obvious manner.
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}</p>
<div class="highlight"><pre><span></span><span class="na">.dataframe</span> <span class="no">tbody</span> <span class="no">tr</span> <span class="no">th</span> <span class="err">{</span>
<span class="nl">vertical-align:</span> <span class="nf">top</span><span class="c1">;</span>
<span class="err">}</span>
<span class="na">.dataframe</span> <span class="no">thead</span> <span class="no">th</span> <span class="err">{</span>
<span class="nl">text-align:</span> <span class="nf">right</span><span class="c1">;</span>
<span class="err">}</span>
</pre></div>
<p></style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>1</th>
<th>2</th>
<th>3</th>
<th>4</th>
<th>5</th>
<th>6</th>
</tr>
</thead>
<tbody>
<tr>
<th>1</th>
<td>2</td>
<td>3</td>
<td>4</td>
<td>5</td>
<td>6</td>
<td>7</td>
</tr>
<tr>
<th>2</th>
<td>3</td>
<td>4</td>
<td>5</td>
<td>6</td>
<td>7</td>
<td>8</td>
</tr>
<tr>
<th>3</th>
<td>4</td>
<td>5</td>
<td>6</td>
<td>7</td>
<td>8</td>
<td>9</td>
</tr>
<tr>
<th>4</th>
<td>5</td>
<td>6</td>
<td>7</td>
<td>8</td>
<td>9</td>
<td>10</td>
</tr>
<tr>
<th>5</th>
<td>6</td>
<td>7</td>
<td>8</td>
<td>9</td>
<td>10</td>
<td>11</td>
</tr>
<tr>
<th>6</th>
<td>7</td>
<td>8</td>
<td>9</td>
<td>10</td>
<td>11</td>
<td>12</td>
</tr>
</tbody>
</table>
</div></p>
<p>We see that the identic values are on the same diagonal. The number of elements on the diagonal varies from 1 to 6 and then back to 1. </p>
<p>let's call A &lt; x the event : the sum all the 2 tosses is at most x.
</p>
<div class="math">$$P(A\leq9)=\sum_{i=2}^{9} P(A = i)$$</div>
<div class="math">$$P(A\leq9)=1-P(A\gt9)$$</div>
<div class="math">$$P(A\leq9)=1-\sum_{i=10}^{12} P(A = i)$$</div>
<p>The value of <span class="math">\(P(A = i) = \frac{i-1}{36}\)</span> if <span class="math">\(i \leq 7\)</span> and <span class="math">\(P(A = i) = \frac{13-i}{36}\)</span></p>
<p>hence
</p>
<div class="math">$$P(A\leq9)=1-\sum_{i=10}^{12} \frac{13-i}{36}$$</div>
<div class="math">$$P(A\leq9)= 1-\frac{6}{36}$$</div>
<div class="math">$$P(A\leq9)= \frac{5}{6}$$</div>
<h3>Let's program it</h3>
<div class="highlight"><pre><span></span><span class="nb">sum</span><span class="p">([</span><span class="mi">1</span> <span class="k">for</span> <span class="n">d1</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="mi">7</span><span class="p">)</span> <span class="k">for</span> <span class="n">d2</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="mi">7</span><span class="p">)</span> <span class="k">if</span> <span class="n">d1</span><span class="o">+</span><span class="n">d2</span><span class="o">&lt;=</span><span class="mi">9</span><span class="p">])</span> <span class="o">/</span> <span class="mi">36</span>
</pre></div>
<div class="highlight"><pre><span></span><span class="err">0.8333333333333334</span>
</pre></div>
<h2>More dices</h2>
<h3>Problem</h3>
<p>In a single toss of 2 fair (evenly-weighted) six-sided dice, find the probability that the values rolled by each die will be different and the two dice have a sum of 6. </p>
<h3>Mathematical explanation</h3>
<p>Let's consider 2 events : A and B. A compound event is a combination of 2 or more simple events. If A and B are simple events, then AB denotes the occurence of either A or B. A∩B denotes the occurence of A and B together.</p>
<p>We denote A the event "the values of each dice is different". The opposit event is A' "the values of each dice is the same".
</p>
<div class="math">$$P(A) = 1-P(A')$$</div>
<div class="math">$$P(A)=1-\frac{6}{36}$$</div>
<div class="math">$$P(A)=\frac{5}{6}$$</div>
<p>We denote B the event "the two dice have a sum of 6", this probability has been computed on the first part of the article :
</p>
<div class="math">$$P(B)=\frac{5}{36}$$</div>
<p>The probability of having 2 dice different of sum 6 is :</p>
<div class="math">$$P(A|B) = 4/5$$</div>
<p>The probability that both A and B occure is equal to P(A∩B).</p>
<p>Since <span class="math">\(P(A|B)=\frac{P(A∩B)}{P(B)}\)</span></p>
<div class="math">$$P(A∩B)=P(B)*P(A|B)$$</div>
<div class="math">$$P(A∩B)=5/36*4/5$$</div>
<div class="math">$$P(A∩B)=1/9$$</div>
<h3>Let's program it</h3>
<div class="highlight"><pre><span></span><span class="nb">sum</span><span class="p">([</span><span class="mi">1</span> <span class="k">for</span> <span class="n">d1</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="mi">7</span><span class="p">)</span> <span class="k">for</span> <span class="n">d2</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="mi">7</span><span class="p">)</span> <span class="k">if</span> <span class="p">(</span><span class="n">d1</span><span class="o">+</span><span class="n">d2</span><span class="o">==</span><span class="mi">6</span><span class="p">)</span> <span class="ow">and</span> <span class="p">(</span><span class="n">d1</span><span class="o">!=</span><span class="n">d2</span><span class="p">)])</span> <span class="o">/</span> <span class="mi">36</span>
</pre></div>
<div class="highlight"><pre><span></span><span class="err">0.1111111111111111</span>
</pre></div>
<h2>Compound Event Probability</h2>
<h3>Problem</h3>
<p>There are 3 urns labeled X, Y, and Z.</p>
<ul>
<li>Urn X contains 4 red balls and 3 black balls.</li>
<li>Urn Y contains 5 red balls and 4 black balls.</li>
<li>Urn Z contains 4 red balls and 4 black balls. </li>
</ul>
<p>One ball is drawn from each of the urns. What is the probability that, of the 3 balls drawn, are 2 red and is 1 black?</p>
<h3>Mathematical explanation</h3>
<p>Let's write the different probabilities:</p>
<div>
<style scoped="">
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table class="dataframe" border="1">
<thead>
<tr style="text-align: right;">
<th></th>
<th>Red ball</th>
<th>Black ball</th>
</tr>
</thead>
<tbody>
<tr>
<th>Urne X</th>
<td>$$\frac{4}{7}$$</td>
<td>$$\frac{3}{7}$$</td>
</tr>
<tr>
<th>Urne Y</th>
<td>$$\frac{5}{9}$$</td>
<td>$$\frac{4}{9}$$</td>
</tr>
<tr>
<th>Urne Z</th>
<td>$$\frac{1}{2}$$</td>
<td>$$\frac{1}{2}$$</td>
</tr>
</tbody>
</table>
</div>
<h4>Addition rule</h4>
<p>A and B are said to be mutually exclusive or disjoint if they have no events in common (i.e., and A∩B=∅ and P(A∩B)=0. The probability of any of 2 or more events occurring is the union () of events. Because disjoint probabilities have no common events, the probability of the union of disjoint events is the sum of the events' individual probabilities. A and B are said to be collectively exhaustive if their union covers all events in the sample space (i.e., AB=S and P(AB)=1). This brings us to our next fundamental rule of probability: if 2 events, A and B, are disjoint, then the probability of either event is the sum of the probabilities of the 2 events (i.e., P(A or B) = P(A)+P(B))</p>
<h4>Mutliplication rule</h4>
<p>If the outcome of the first event (A) has no impact on the second event (B), then they are considered to be independent (e.g., tossing a fair coin). This brings us to the next fundamental rule of probability: the multiplication rule. It states that if two events, A and B, are independent, then the probability of both events is the product of the probabilities for each event (i.e., P(A and B)= P(A)xP(B)). The chance of all events occurring in a sequence of events is called the intersection (∩) of those events. </p>
<p>The balls drawn from the urns are independant hence : </p>
<p>p = P(2 red (R) and 1 back (B))
</p>
<div class="math">$$p = P(RRB) + P(RBR) + P(BRR)$$</div>
<p>Each of those 3 probability if equal to the product of the probability of drawing each ball
<span class="math">\(P(RRB) = P(R|X) * P(R|Y) * P(B|Z) = 4/7*5/9*1/2\)</span></p>
<ul>
<li>
<p><span class="math">\(P(RRB) = 20/126\)</span></p>
</li>
<li>
<p><span class="math">\(P(RBR) = 16/126\)</span></p>
</li>
<li>
<p><span class="math">\(P(BRR) = 15/126\)</span></p>
</li>
</ul>
<p>this leads to </p>
<ul>
<li><span class="math">\(p = 51/126\)</span></li>
</ul>
<p>and finally
</p>
<div class="math">$$p = \frac{17}{42}$$</div>
<h3>Let's program it</h3>
<div class="highlight"><pre><span></span><span class="n">X</span> <span class="o">=</span> <span class="mi">3</span><span class="o">*</span><span class="p">[</span><span class="s2">&quot;B&quot;</span><span class="p">]</span><span class="o">+</span><span class="mi">4</span><span class="o">*</span><span class="p">[</span><span class="s2">&quot;R&quot;</span><span class="p">]</span>
<span class="n">Y</span> <span class="o">=</span> <span class="mi">4</span><span class="o">*</span><span class="p">[</span><span class="s2">&quot;B&quot;</span><span class="p">]</span><span class="o">+</span><span class="mi">5</span><span class="o">*</span><span class="p">[</span><span class="s2">&quot;R&quot;</span><span class="p">]</span>
<span class="n">Z</span> <span class="o">=</span> <span class="mi">4</span><span class="o">*</span><span class="p">[</span><span class="s2">&quot;B&quot;</span><span class="p">]</span><span class="o">+</span><span class="mi">4</span><span class="o">*</span><span class="p">[</span><span class="s2">&quot;R&quot;</span><span class="p">]</span>
<span class="n">target</span> <span class="o">=</span> <span class="p">[</span><span class="s2">&quot;BRR&quot;</span><span class="p">,</span> <span class="s2">&quot;RRB&quot;</span><span class="p">,</span> <span class="s2">&quot;RBR&quot;</span><span class="p">]</span>
<span class="nb">sum</span><span class="p">([</span><span class="mi">1</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">X</span> <span class="k">for</span> <span class="n">y</span> <span class="ow">in</span> <span class="n">Y</span> <span class="k">for</span> <span class="n">z</span> <span class="ow">in</span> <span class="n">Z</span> <span class="k">if</span> <span class="n">x</span><span class="o">+</span><span class="n">y</span><span class="o">+</span><span class="n">z</span> <span class="ow">in</span> <span class="n">target</span><span class="p">])</span><span class="o">/</span><span class="nb">sum</span><span class="p">([</span><span class="mi">1</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">X</span> <span class="k">for</span> <span class="n">y</span> <span class="ow">in</span> <span class="n">Y</span> <span class="k">for</span> <span class="n">z</span> <span class="ow">in</span> <span class="n">Z</span><span class="p">])</span>
</pre></div>
<div class="highlight"><pre><span></span><span class="err">0.40476190476190477</span>
</pre></div>
<script type="text/javascript">if (!document.getElementById('mathjaxscript_pelican_#%@#$@#')) {
var align = "center",
indent = "0em",
linebreak = "false";
if (false) {
align = (screen.width < 768) ? "left" : align;
indent = (screen.width < 768) ? "0em" : indent;
linebreak = (screen.width < 768) ? 'true' : linebreak;
}
var mathjaxscript = document.createElement('script');
mathjaxscript.id = 'mathjaxscript_pelican_#%@#$@#';
mathjaxscript.type = 'text/javascript';
mathjaxscript.src = 'https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.3/latest.js?config=TeX-AMS-MML_HTMLorMML';
mathjaxscript[(window.opera ? "innerHTML" : "text")] =
"MathJax.Hub.Config({" +
" config: ['MMLorHTML.js']," +
" TeX: { extensions: ['AMSmath.js','AMSsymbols.js','noErrors.js','noUndefined.js'], equationNumbers: { autoNumber: 'AMS' } }," +
" jax: ['input/TeX','input/MathML','output/HTML-CSS']," +
" extensions: ['tex2jax.js','mml2jax.js','MathMenu.js','MathZoom.js']," +
" displayAlign: '"+ align +"'," +
" displayIndent: '"+ indent +"'," +
" showMathMenu: true," +
" messageStyle: 'normal'," +
" tex2jax: { " +
" inlineMath: [ ['\\\\(','\\\\)'] ], " +
" displayMath: [ ['$$','$$'] ]," +
" processEscapes: true," +
" preview: 'TeX'," +
" }, " +
" 'HTML-CSS': { " +
" styles: { '.MathJax_Display, .MathJax .mo, .MathJax .mi, .MathJax .mn': {color: 'inherit ! important'} }," +
" linebreaks: { automatic: "+ linebreak +", width: '90% container' }," +
" }, " +
"}); " +
"if ('default' !== 'default') {" +
"MathJax.Hub.Register.StartupHook('HTML-CSS Jax Ready',function () {" +
"var VARIANT = MathJax.OutputJax['HTML-CSS'].FONTDATA.VARIANT;" +
"VARIANT['normal'].fonts.unshift('MathJax_default');" +
"VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
"VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
"});" +
"MathJax.Hub.Register.StartupHook('SVG Jax Ready',function () {" +
"var VARIANT = MathJax.OutputJax.SVG.FONTDATA.VARIANT;" +
"VARIANT['normal'].fonts.unshift('MathJax_default');" +
"VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
"VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
"});" +
"}";
(document.body || document.getElementsByTagName('head')[0]).appendChild(mathjaxscript);
}</script>
</div>
<aside>
<div class="bug-reporting__panel">
<h3>Find an error or bug? Have a suggestion?</h3>
<p>Everything on this site is avaliable on GitHub. Head on over and <a href='https://github.com/redoules/redoules.github.io/issues/new'>submit an issue.</a> You can also message me directly by <a href='mailto:guillaume.redoules@gadz.org'>email</a>.</p>
</div>
</aside>
</section>
</div>
<!-- start of footer section -->
<footer class="footer">
<div class="container">
<p class="text-muted">
<center>This project contains 119 pages and is available on <a href="https://github.com/redoules/redoules.github.io">GitHub</a>.
<br/>
Copyright &copy; Guillaume Redoulès,
<time datetime="2018">2018</time>.
</center>
</p>
</div>
</footer>
<!-- This jQuery line finds any span that contains code highlighting classes and then selects the parent <pre> tag and adds a border. This is done as a workaround to visually distinguish the code inputs and outputs -->
<script>
$( ".hll, .n, .c, .err, .k, .o, .cm, .cp, .c1, .cs, .gd, .ge, .gr, .gh, .gi, .go, .gp, .gs, .gu, .gt, .kc, .kd, .kn, .kp, .kr, .kt, .m, .s, .na, .nb, .nc, .no, .nd, .ni, .ne, .nf, .nl, .nn, .nt, .nv, .ow, .w, .mf, .mh, .mi, .mo, .sb, .sc, .sd, .s2, .se, .sh, .si, .sx, .sr, .s1, .ss, .bp, .vc, .vg, .vi, .il" ).parent( "pre" ).css( "border", "1px solid #DEDEDE" );
</script>
<!-- Load Google Analytics -->
<script>
/*
(function(i, s, o, g, r, a, m) {
i['GoogleAnalyticsObject'] = r;
i[r] = i[r] || function() {
(i[r].q = i[r].q || []).push(arguments)
}, i[r].l = 1 * new Date();
a = s.createElement(o),
m = s.getElementsByTagName(o)[0];
a.async = 1;
a.src = g;
m.parentNode.insertBefore(a, m)
})(window, document, 'script', '//www.google-analytics.com/analytics.js', 'ga');
ga('create', 'UA-66582-32', 'auto');
ga('send', 'pageview');
*/
</script>
<!-- End of Google Analytics -->
<!-- Bootstrap core JavaScript
================================================== -->
<!-- Placed at the end of the document so the pages load faster -->
<script src="../theme/js/bootstrap.min.js"></script>
<!-- IE10 viewport hack for Surface/desktop Windows 8 bug -->
<script src="../theme/js/ie10-viewport-bug-workaround.js"></script>
</body>
</html>