redoules.github.io/blog/Statistics_10days-day4.html
Guillaume 900a7ce587 added an article
about Synology API
2020-01-03 09:56:38 +01:00

360 lines
22 KiB
HTML

<!DOCTYPE html>
<html lang="fr">
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1">
<!-- The above 3 meta tags *must* come first in the head; any other head content must come *after* these tags -->
<meta name="description" content="Data Science for Political and Social Phenomena">
<meta name="author" content="Guillaume Redoulès">
<link rel="icon" href="../favicon.ico">
<title>Day 4 - Binomial and geometric distributions - Blog</title>
<!-- JQuery -->
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.3/jquery.min.js"></script>
<script>
window.jQuery || document.write('<script src="../theme/js/jquery.min.js"><\/script>')
</script>
<!-- Bootstrap core CSS -->
<link rel="stylesheet" href="../theme/css/bootstrap.css" />
<!-- IE10 viewport hack for Surface/desktop Windows 8 bug -->
<link rel="stylesheet" type="text/css" href="../theme/css/ie10-viewport-bug-workaround.css" />
<!-- Custom styles for this template -->
<link rel="stylesheet" type="text/css" href="../theme/css/style.css" />
<link rel="stylesheet" type="text/css" href="../theme/css/notebooks.css" />
<link href='https://fonts.googleapis.com/css?family=PT+Serif:400,700|Roboto:400,500,700' rel='stylesheet' type='text/css'>
<!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries -->
<!--[if lt IE 9]>
<script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script>
<script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>
<![endif]-->
<meta name="tags" content="Basics" />
</head>
<body>
<div class="navbar navbar-fixed-top">
<div class="container">
<div class="navbar-header">
<button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-collapse">
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<a class="navbar-brand" href="..">Guillaume Redoulès</a>
</div>
<div class="navbar-collapse collapse" id="searchbar">
<ul class="nav navbar-nav navbar-right">
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-haspopup="true" aria-expanded="false">About<span class="caret"></span></a>
<ul class="dropdown-menu">
<li><a href="../pages/about.html">About Guillaume</a></li>
<li><a href="https://github.com/redoules">GitHub</a></li>
<li><a href="https://www.linkedin.com/in/guillaume-redoul%C3%A8s-33923860/">LinkedIn</a></li>
</ul>
</li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-haspopup="true" aria-expanded="false">Data Science<span class="caret"></span></a>
<ul class="dropdown-menu">
<li><a href="..#Blog">Blog</a></li>
<li><a href="..#Python">Python</a></li>
<li><a href="..#Bash">Bash</a></li>
<li><a href="..#SQL">SQL</a></li>
<li><a href="..#Mathematics">Mathematics</a></li>
<li><a href="..#Machine_Learning">Machine Learning</a></li>
<li><a href="..#Projects">Projects</a></li>
</ul>
</li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-haspopup="true" aria-expanded="false">Projects<span class="caret"></span></a>
<ul class="dropdown-menu">
<li><a href="https://github.com/redoules/redoules.github.io">Notes (Github)</a></li>
</ul>
</li>
<!--<li class="dropdown">
<a href="../feeds/blog.rss.xml">Blog RSS</a>
</li>-->
</ul>
<form class="navbar-form" action="../search.html" onsubmit="return validateForm(this.elements['q'].value);">
<div class="form-group" style="display:inline;">
<div class="input-group" style="display:table;">
<span class="input-group-addon" style="width:1%;"><span class="glyphicon glyphicon-search"></span></span>
<input class="form-control search-query" name="q" id="tipue_search_input" placeholder="e.g. scikit KNN, pandas merge" required autocomplete="off" type="text">
</div>
</div>
</form>
</div>
<!--/.nav-collapse -->
</div>
</div>
<!-- end of header section -->
<div class="container">
<!-- <div class="alert alert-warning" role="alert">
Did you find this page useful? Please do me a quick favor and <a href="#" class="alert-link">endorse me for data science on LinkedIn</a>.
</div> -->
<section id="content" class="body">
<header>
<h1>
Day 4 - Binomial and geometric distributions
</h1>
<ol class="breadcrumb">
<li>
<time class="published" datetime="2018-11-11T10:19:00+01:00">
11 novembre 2018
</time>
</li>
<li>Blog</li>
<li>Basics</li>
</ol>
</header>
<div class='article_content'>
<h2>Binomial distribution</h2>
<h3>Problem 1</h3>
<p>The ratio of boys to girls for babies born in Russia is <span class="math">\(r=\frac{N_b}{N_g}=1.09\)</span>. If there is 1 child born per birth, what proportion of Russian families with exactly 6 children will have at least 3 boys?</p>
<h3>Mathematical explanation</h3>
<p>Let's first compute the probability of having a boy :</p>
<div class="math">$$p_b=\frac{N_b}{N_b+N_g}$$</div>
<p>where:
* <span class="math">\(N_b\)</span> is the number of boys
* <span class="math">\(N_g\)</span> is the number of girls
* <span class="math">\(r=\frac{N_b}{N_g}\)</span></p>
<div class="math">$$p_b=\frac{1}{1+\frac{1}{r}}$$</div>
<div class="math">$$p_b=\frac{r}{r+1}$$</div>
<div class="highlight"><pre><span></span><span class="n">r</span> <span class="o">=</span> <span class="mf">1.09</span>
<span class="n">p_b</span><span class="o">=</span><span class="n">r</span><span class="o">/</span><span class="p">(</span><span class="n">r</span><span class="o">+</span><span class="mi">1</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;The probability of having a boy is p=</span><span class="si">{p_b:3f}</span><span class="s2">&quot;</span><span class="p">)</span>
</pre></div>
<div class="highlight"><pre><span></span><span class="err">The probability of having a boy is p=0.521531</span>
</pre></div>
<p>The probability of getting 3 boys in 6 children is given by :
</p>
<div class="math">$$b(x=3, n=6, p=p_b)$$</div>
<p>In order to compute the proportion of Russian families with exactly 6 children will have at 3 least boys we need to compute the cumulative probability distribution </p>
<div class="math">$$b(x\geq 3, n=6, p=p_b) = \sum_{i=3}^{6} b(x\geq i, n=6, p=p_b)$$</div>
<h3>Let's code it !</h3>
<div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">math</span>
<span class="k">def</span> <span class="nf">bi_dist</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">n</span><span class="p">,</span> <span class="n">p</span><span class="p">):</span>
<span class="n">b</span> <span class="o">=</span> <span class="p">(</span><span class="n">math</span><span class="o">.</span><span class="n">factorial</span><span class="p">(</span><span class="n">n</span><span class="p">)</span><span class="o">/</span><span class="p">(</span><span class="n">math</span><span class="o">.</span><span class="n">factorial</span><span class="p">(</span><span class="n">x</span><span class="p">)</span><span class="o">*</span><span class="n">math</span><span class="o">.</span><span class="n">factorial</span><span class="p">(</span><span class="n">n</span><span class="o">-</span><span class="n">x</span><span class="p">)))</span><span class="o">*</span><span class="p">(</span><span class="n">p</span><span class="o">**</span><span class="n">x</span><span class="p">)</span><span class="o">*</span><span class="p">((</span><span class="mi">1</span><span class="o">-</span><span class="n">p</span><span class="p">)</span><span class="o">**</span><span class="p">(</span><span class="n">n</span><span class="o">-</span><span class="n">x</span><span class="p">))</span>
<span class="k">return</span><span class="p">(</span><span class="n">b</span><span class="p">)</span>
<span class="n">b</span><span class="p">,</span> <span class="n">p</span><span class="p">,</span> <span class="n">n</span> <span class="o">=</span> <span class="mi">0</span><span class="p">,</span> <span class="n">p_b</span><span class="p">,</span> <span class="mi">6</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span><span class="mi">7</span><span class="p">):</span>
<span class="n">b</span> <span class="o">+=</span> <span class="n">bi_dist</span><span class="p">(</span><span class="n">i</span><span class="p">,</span> <span class="n">n</span><span class="p">,</span> <span class="n">p</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;probability of getting at least 3 boys in a family with exactly 6 children : </span><span class="si">{b:.3f}</span><span class="s2">&quot;</span><span class="p">)</span>
</pre></div>
<div class="highlight"><pre><span></span><span class="err">probability of getting at least 3 boys in a family with exactly 6 children : 0.696</span>
</pre></div>
<h3>Problem 2</h3>
<p>A manufacturer of metal pistons finds that, 12% on average, of the pistons they manufacture are rejected because they are incorrectly sized. What is the probability that a batch of 10 pistons will contain:
* No more than 2 rejects?
* At least 2 rejects?</p>
<h3>Mathematical explanation</h3>
<p>On average 12% of the pistons are rejected, this means that a piston has a probability of <span class="math">\(p_{rejected}=0.12\)</span> to be rejected.</p>
<p>The probability of getting less than 2 faulty pistons in a batch is :
</p>
<div class="math">$$p(rejet&lt;2) = b(x\leq 2, n= 10, p=p_{rejected})$$</div>
<div class="math">$$p(rejet&lt;2) = \sum_{i=0}^{2} b(x\leq i, n=10, p=p_{rejected})$$</div>
<div class="highlight"><pre><span></span><span class="n">b</span><span class="p">,</span> <span class="n">p</span><span class="p">,</span> <span class="n">n</span> <span class="o">=</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">12</span><span class="o">/</span><span class="mi">100</span><span class="p">,</span> <span class="mi">10</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="mi">3</span><span class="p">):</span>
<span class="n">b</span> <span class="o">+=</span> <span class="n">bi_dist</span><span class="p">(</span><span class="n">i</span><span class="p">,</span> <span class="n">n</span><span class="p">,</span> <span class="n">p</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;The probability of getting less than 2 faulty pistons in a batch is : </span><span class="si">{b:.3f}</span><span class="s2">&quot;</span><span class="p">)</span>
</pre></div>
<div class="highlight"><pre><span></span><span class="err">The probability of getting less than 2 faulty pistons in a batch is : 0.891</span>
</pre></div>
<p>The probability that a batch of 10 pistons will contain at least 2 rejects :
</p>
<div class="math">$$p(rejet&lt;2) = b(x\geq 2, n= 10, p=p_{rejected})$$</div>
<div class="math">$$p(rejet&lt;2) = \sum_{i=2}^{10} b(x\geq i, n=10, p=p_{rejected})$$</div>
<div class="highlight"><pre><span></span><span class="n">b</span><span class="p">,</span> <span class="n">p</span><span class="p">,</span> <span class="n">n</span> <span class="o">=</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">12</span><span class="o">/</span><span class="mi">100</span><span class="p">,</span> <span class="mi">10</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span><span class="mi">11</span><span class="p">):</span>
<span class="n">b</span> <span class="o">+=</span> <span class="n">bi_dist</span><span class="p">(</span><span class="n">i</span><span class="p">,</span> <span class="n">n</span><span class="p">,</span> <span class="n">p</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;The probability of getting at least 2 faulty pistons in a batch is : </span><span class="si">{b:.3f}</span><span class="s2">&quot;</span><span class="p">)</span>
</pre></div>
<div class="highlight"><pre><span></span><span class="err">The probability of getting at least 2 faulty pistons in a batch is : 0.342</span>
</pre></div>
<h2>Geometric distribution</h2>
<h3>Problem 1</h3>
<p>The probability that a machine produces a defective product is <span class="math">\(\frac{1}{3}\)</span>. What is the probability that the first defect is found during the fith inspection?</p>
<h3>Mathematical explanation</h3>
<p>In this case, we will use a geometric distribution to evaluate the probability :
* <span class="math">\(n=5\)</span>
* <span class="math">\(p=\frac{1}{3}\)</span></p>
<p>Hence, the probability that the first defect is found during the fith inspection is <span class="math">\(g(n=5,p=1/3)\)</span></p>
<div class="highlight"><pre><span></span><span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;The probability that the first defect is found during the fith inspection is {round(((1-p)**(n-1)) * p, 3)}&quot;</span><span class="p">)</span>
</pre></div>
<div class="highlight"><pre><span></span><span class="err">The probability that the first defect is found during the fith inspection is 0.038</span>
</pre></div>
<h3>Problem 2</h3>
<p>The probability that a machine produces a defective product is <span class="math">\(\frac{1}{3}\)</span>. What is the probability that the first defect is found during the first 5 inspections?</p>
<h3>Mathematical explanation</h3>
<p>In this problem, we need to compute the cumulative distribution function
</p>
<div class="math">$$p(x \leq5) = \sum_{i=1}^{5} g(n=i,p=1/3)$$</div>
<div class="highlight"><pre><span></span><span class="n">p_x5</span> <span class="o">=</span> <span class="mi">0</span>
<span class="n">p</span><span class="o">=</span><span class="mi">1</span><span class="o">/</span><span class="mi">3</span>
<span class="n">n</span><span class="o">=</span><span class="mi">5</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="n">n</span><span class="o">+</span><span class="mi">1</span><span class="p">):</span>
<span class="n">p_x5</span><span class="o">+=</span><span class="p">(</span><span class="mi">1</span><span class="o">-</span><span class="n">p</span><span class="p">)</span><span class="o">**</span><span class="p">(</span><span class="n">i</span><span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="o">*</span> <span class="n">p</span>
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;The probability that the first defect is found during the first 5 inspection is {round(p_x5, 3)}&quot;</span><span class="p">)</span>
</pre></div>
<div class="highlight"><pre><span></span><span class="err">The probability that the first defect is found during the first 5 inspection is 0.868</span>
</pre></div>
<script type="text/javascript">if (!document.getElementById('mathjaxscript_pelican_#%@#$@#')) {
var align = "center",
indent = "0em",
linebreak = "false";
if (false) {
align = (screen.width < 768) ? "left" : align;
indent = (screen.width < 768) ? "0em" : indent;
linebreak = (screen.width < 768) ? 'true' : linebreak;
}
var mathjaxscript = document.createElement('script');
mathjaxscript.id = 'mathjaxscript_pelican_#%@#$@#';
mathjaxscript.type = 'text/javascript';
mathjaxscript.src = 'https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.3/latest.js?config=TeX-AMS-MML_HTMLorMML';
mathjaxscript[(window.opera ? "innerHTML" : "text")] =
"MathJax.Hub.Config({" +
" config: ['MMLorHTML.js']," +
" TeX: { extensions: ['AMSmath.js','AMSsymbols.js','noErrors.js','noUndefined.js'], equationNumbers: { autoNumber: 'AMS' } }," +
" jax: ['input/TeX','input/MathML','output/HTML-CSS']," +
" extensions: ['tex2jax.js','mml2jax.js','MathMenu.js','MathZoom.js']," +
" displayAlign: '"+ align +"'," +
" displayIndent: '"+ indent +"'," +
" showMathMenu: true," +
" messageStyle: 'normal'," +
" tex2jax: { " +
" inlineMath: [ ['\\\\(','\\\\)'] ], " +
" displayMath: [ ['$$','$$'] ]," +
" processEscapes: true," +
" preview: 'TeX'," +
" }, " +
" 'HTML-CSS': { " +
" styles: { '.MathJax_Display, .MathJax .mo, .MathJax .mi, .MathJax .mn': {color: 'inherit ! important'} }," +
" linebreaks: { automatic: "+ linebreak +", width: '90% container' }," +
" }, " +
"}); " +
"if ('default' !== 'default') {" +
"MathJax.Hub.Register.StartupHook('HTML-CSS Jax Ready',function () {" +
"var VARIANT = MathJax.OutputJax['HTML-CSS'].FONTDATA.VARIANT;" +
"VARIANT['normal'].fonts.unshift('MathJax_default');" +
"VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
"VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
"});" +
"MathJax.Hub.Register.StartupHook('SVG Jax Ready',function () {" +
"var VARIANT = MathJax.OutputJax.SVG.FONTDATA.VARIANT;" +
"VARIANT['normal'].fonts.unshift('MathJax_default');" +
"VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
"VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
"});" +
"}";
(document.body || document.getElementsByTagName('head')[0]).appendChild(mathjaxscript);
}</script>
</div>
<aside>
<div class="bug-reporting__panel">
<h3>Find an error or bug? Have a suggestion?</h3>
<p>Everything on this site is avaliable on GitHub. Head on over and <a href='https://github.com/redoules/redoules.github.io/issues/new'>submit an issue.</a> You can also message me directly by <a href='mailto:guillaume.redoules@gadz.org'>email</a>.</p>
</div>
</aside>
</section>
</div>
<!-- start of footer section -->
<footer class="footer">
<div class="container">
<p class="text-muted">
<center>This project contains 115 pages and is available on <a href="https://github.com/redoules/redoules.github.io">GitHub</a>.
<br/>
Copyright &copy; Guillaume Redoulès,
<time datetime="2018">2018</time>.
</center>
</p>
</div>
</footer>
<!-- This jQuery line finds any span that contains code highlighting classes and then selects the parent <pre> tag and adds a border. This is done as a workaround to visually distinguish the code inputs and outputs -->
<script>
$( ".hll, .n, .c, .err, .k, .o, .cm, .cp, .c1, .cs, .gd, .ge, .gr, .gh, .gi, .go, .gp, .gs, .gu, .gt, .kc, .kd, .kn, .kp, .kr, .kt, .m, .s, .na, .nb, .nc, .no, .nd, .ni, .ne, .nf, .nl, .nn, .nt, .nv, .ow, .w, .mf, .mh, .mi, .mo, .sb, .sc, .sd, .s2, .se, .sh, .si, .sx, .sr, .s1, .ss, .bp, .vc, .vg, .vi, .il" ).parent( "pre" ).css( "border", "1px solid #DEDEDE" );
</script>
<!-- Load Google Analytics -->
<script>
/*
(function(i, s, o, g, r, a, m) {
i['GoogleAnalyticsObject'] = r;
i[r] = i[r] || function() {
(i[r].q = i[r].q || []).push(arguments)
}, i[r].l = 1 * new Date();
a = s.createElement(o),
m = s.getElementsByTagName(o)[0];
a.async = 1;
a.src = g;
m.parentNode.insertBefore(a, m)
})(window, document, 'script', '//www.google-analytics.com/analytics.js', 'ga');
ga('create', 'UA-66582-32', 'auto');
ga('send', 'pageview');
*/
</script>
<!-- End of Google Analytics -->
<!-- Bootstrap core JavaScript
================================================== -->
<!-- Placed at the end of the document so the pages load faster -->
<script src="../theme/js/bootstrap.min.js"></script>
<!-- IE10 viewport hack for Surface/desktop Windows 8 bug -->
<script src="../theme/js/ie10-viewport-bug-workaround.js"></script>
</body>
</html>