redoules.github.io/blog/Statistics_10days-day5.html
Guillaume 44f740504b added an article
about uploading data to a sharepoint site
2020-07-20 20:20:09 +02:00

380 lines
25 KiB
HTML
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<!DOCTYPE html>
<html lang="fr">
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1">
<!-- The above 3 meta tags *must* come first in the head; any other head content must come *after* these tags -->
<meta name="description" content="Data Science for Political and Social Phenomena">
<meta name="author" content="Guillaume Redoulès">
<link rel="icon" href="../favicon.ico">
<title>Day 5 - Poisson and Normal distributions - Blog</title>
<!-- JQuery -->
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.3/jquery.min.js"></script>
<script>
window.jQuery || document.write('<script src="../theme/js/jquery.min.js"><\/script>')
</script>
<!-- Bootstrap core CSS -->
<link rel="stylesheet" href="../theme/css/bootstrap.css" />
<!-- IE10 viewport hack for Surface/desktop Windows 8 bug -->
<link rel="stylesheet" type="text/css" href="../theme/css/ie10-viewport-bug-workaround.css" />
<!-- Custom styles for this template -->
<link rel="stylesheet" type="text/css" href="../theme/css/style.css" />
<link rel="stylesheet" type="text/css" href="../theme/css/notebooks.css" />
<link href='https://fonts.googleapis.com/css?family=PT+Serif:400,700|Roboto:400,500,700' rel='stylesheet' type='text/css'>
<!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries -->
<!--[if lt IE 9]>
<script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script>
<script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>
<![endif]-->
<meta name="tags" content="Basics" />
</head>
<body>
<div class="navbar navbar-fixed-top">
<div class="container">
<div class="navbar-header">
<button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-collapse">
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<a class="navbar-brand" href="..">Guillaume Redoulès</a>
</div>
<div class="navbar-collapse collapse" id="searchbar">
<ul class="nav navbar-nav navbar-right">
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-haspopup="true" aria-expanded="false">About<span class="caret"></span></a>
<ul class="dropdown-menu">
<li><a href="../pages/about.html">About Guillaume</a></li>
<li><a href="https://github.com/redoules">GitHub</a></li>
<li><a href="https://www.linkedin.com/in/guillaume-redoul%C3%A8s-33923860/">LinkedIn</a></li>
</ul>
</li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-haspopup="true" aria-expanded="false">Data Science<span class="caret"></span></a>
<ul class="dropdown-menu">
<li><a href="..#Blog">Blog</a></li>
<li><a href="..#Python">Python</a></li>
<li><a href="..#Bash">Bash</a></li>
<li><a href="..#SQL">SQL</a></li>
<li><a href="..#Mathematics">Mathematics</a></li>
<li><a href="..#Machine_Learning">Machine Learning</a></li>
<li><a href="..#Projects">Projects</a></li>
</ul>
</li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-haspopup="true" aria-expanded="false">Projects<span class="caret"></span></a>
<ul class="dropdown-menu">
<li><a href="https://github.com/redoules/redoules.github.io">Notes (Github)</a></li>
</ul>
</li>
<!--<li class="dropdown">
<a href="../feeds/blog.rss.xml">Blog RSS</a>
</li>-->
</ul>
<form class="navbar-form" action="../search.html" onsubmit="return validateForm(this.elements['q'].value);">
<div class="form-group" style="display:inline;">
<div class="input-group" style="display:table;">
<span class="input-group-addon" style="width:1%;"><span class="glyphicon glyphicon-search"></span></span>
<input class="form-control search-query" name="q" id="tipue_search_input" placeholder="e.g. scikit KNN, pandas merge" required autocomplete="off" type="text">
</div>
</div>
</form>
</div>
<!--/.nav-collapse -->
</div>
</div>
<!-- end of header section -->
<div class="container">
<!-- <div class="alert alert-warning" role="alert">
Did you find this page useful? Please do me a quick favor and <a href="#" class="alert-link">endorse me for data science on LinkedIn</a>.
</div> -->
<section id="content" class="body">
<header>
<h1>
Day 5 - Poisson and Normal distributions
</h1>
<ol class="breadcrumb">
<li>
<time class="published" datetime="2018-11-12T13:07:00+01:00">
12 novembre 2018
</time>
</li>
<li>Blog</li>
<li>Basics</li>
</ol>
</header>
<div class='article_content'>
<h2>Poisson Distribution</h2>
<h3>Problem 1</h3>
<p>A random variable, <span class="math">\(X\)</span>, follows Poisson distribution with mean of 2.5. Find the probability with which the random variable <span class="math">\(X\)</span> is equal to 5.</p>
<h3>Mathematical explanation</h3>
<p>In this case, the answer is straightforward, we just need to compute the value of the Poisson distribution of mean 2.5 at 5:</p>
<div class="math">$$P(\lambda = 2.5, x=5)=\frac{\lambda^ke^{-\lambda}}{k!}$$</div>
<div class="math">$$P(\lambda = 2.5, x=5)=\frac{2.5^5e^{-2.5}}{5!}$$</div>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">factorial</span><span class="p">(</span><span class="n">k</span><span class="p">):</span>
<span class="k">return</span> <span class="mi">1</span> <span class="k">if</span> <span class="n">k</span> <span class="o">==</span> <span class="mi">1</span> <span class="k">else</span> <span class="n">k</span> <span class="o">*</span> <span class="n">factorial</span><span class="p">(</span><span class="n">k</span><span class="o">-</span><span class="mi">1</span><span class="p">)</span>
<span class="kn">from</span> <span class="nn">math</span> <span class="kn">import</span> <span class="n">exp</span>
<span class="k">def</span> <span class="nf">poisson</span><span class="p">(</span><span class="n">l</span><span class="p">,</span><span class="n">k</span><span class="p">):</span>
<span class="k">return</span> <span class="p">(</span><span class="n">l</span><span class="o">**</span><span class="n">k</span> <span class="o">*</span> <span class="n">exp</span><span class="p">(</span><span class="o">-</span><span class="n">l</span><span class="p">))</span> <span class="o">/</span> <span class="n">factorial</span><span class="p">(</span><span class="n">k</span><span class="p">)</span>
<span class="n">l</span> <span class="o">=</span> <span class="mf">2.5</span>
<span class="n">k</span> <span class="o">=</span> <span class="mi">5</span>
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">&#39;Probability that a random variable X following a Poisson distribution of mean </span><span class="si">{</span><span class="n">l</span><span class="si">}</span><span class="s1"> equals </span><span class="si">{</span><span class="n">k</span><span class="si">}</span><span class="s1"> : </span><span class="si">{</span><span class="nb">round</span><span class="p">(</span><span class="n">poisson</span><span class="p">(</span><span class="n">l</span><span class="p">,</span><span class="n">k</span><span class="p">),</span><span class="mi">3</span><span class="p">)</span><span class="si">}</span><span class="s1">&#39;</span><span class="p">)</span>
</pre></div>
<div class="highlight"><pre><span></span><span class="err">Probability that a random variable X following a Poisson distribution of mean 2.5 equals 5 : 0.067</span>
</pre></div>
<h3>Problem 2</h3>
<p>The manager of a industrial plant is planning to buy a machine of either type <span class="math">\(A\)</span> or type <span class="math">\(B\)</span>. For each days operation:</p>
<ul>
<li>The number of repairs, <span class="math">\(X\)</span>, that machine <span class="math">\(A\)</span> needs is a Poisson random variable with mean 0.88. The daily cost of operating <span class="math">\(A\)</span> is <span class="math">\(C_A=160+40X^2\)</span>.</li>
<li>The number of repairs, <span class="math">\(Y\)</span>, that machine <span class="math">\(B\)</span> needs is a Poisson random variable with mean 1.55. The daily cost of operating <span class="math">\(B\)</span> is <span class="math">\(C_B=128+40Y^2\)</span>.</li>
</ul>
<p>Assume that the repairs take a negligible amount of time and the machines are maintained nightly to ensure that they operate like new at the start of each day. What is the expected daily cost for each machine.</p>
<h3>Mathematical explanation</h3>
<p>The cost for each machine follows a law that is the square of a Poisson distribution.</p>
<div class="math">$$C_Z = a + b*Z^2$$</div>
<p>
Since the expectation is a linear operator :
</p>
<div class="math">$$E[C_Z] = aE[1] + bE[Z^2]$$</div>
<p>Knowing that <span class="math">\(Z\)</span> follows a Poisson distribution of mean <span class="math">\(\lambda\)</span> we have :
</p>
<div class="math">$$E[C_Z] = a+ b(\lambda + \lambda^2)$$</div>
<div class="highlight"><pre><span></span><span class="n">averageX</span> <span class="o">=</span> <span class="mf">0.88</span>
<span class="n">averageY</span> <span class="o">=</span> <span class="mf">1.55</span>
<span class="n">CostX</span> <span class="o">=</span> <span class="mi">160</span> <span class="o">+</span> <span class="mi">40</span><span class="o">*</span><span class="p">(</span><span class="n">averageX</span> <span class="o">+</span> <span class="n">averageX</span><span class="o">**</span><span class="mi">2</span><span class="p">)</span>
<span class="n">CostY</span> <span class="o">=</span> <span class="mi">128</span> <span class="o">+</span> <span class="mi">40</span><span class="o">*</span><span class="p">(</span><span class="n">averageY</span> <span class="o">+</span> <span class="n">averageY</span><span class="o">**</span><span class="mi">2</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">&#39;Expected cost to run machine A : </span><span class="si">{</span><span class="nb">round</span><span class="p">(</span><span class="n">CostX</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span><span class="si">}</span><span class="s1">&#39;</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">&#39;Expected cost to run machine A : </span><span class="si">{</span><span class="nb">round</span><span class="p">(</span><span class="n">CostY</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span><span class="si">}</span><span class="s1">&#39;</span><span class="p">)</span>
</pre></div>
<div class="highlight"><pre><span></span><span class="err">Expected cost to run machine A : 226.176</span>
<span class="err">Expected cost to run machine A : 286.1</span>
</pre></div>
<h2>Normal Distribution</h2>
<h3>Problem 1</h3>
<p>In a certain plant, the time taken to assemble a car is a random variable, <span class="math">\(X\)</span>, having a normal distribution with a mean of 20 hours and a standard deviation of 2 hours. What is the probability that a car can be assembled at this plant in:</p>
<p>Less than 19.5 hours?
Between 20 and 22 hours?</p>
<h3>Mathematical explanation</h3>
<p><span class="math">\(X\)</span> is a real-valued random variable following a normal distribution : the probability of assembly the car in less than 19.5 hours is the cumulative distribution function of X evaluated at 19.5:</p>
<div class="math">$$P(X\leq 19.5)=F_X(19.5)$$</div>
<p>For a normal distribution, the cumulative distribution function is :
</p>
<div class="math">$$\Phi(x)=\frac{1}{2}\left(1+erf\left(\frac{x-\mu}{\sigma\sqrt{2}}\right)\right)$$</div>
<div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">math</span>
<span class="k">def</span> <span class="nf">cumulative</span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="n">mean</span><span class="p">,</span><span class="n">sd</span><span class="p">):</span>
<span class="k">return</span> <span class="mf">0.5</span><span class="o">*</span><span class="p">(</span><span class="mi">1</span><span class="o">+</span><span class="n">math</span><span class="o">.</span><span class="n">erf</span><span class="p">((</span><span class="n">x</span><span class="o">-</span><span class="n">mean</span><span class="p">)</span><span class="o">/</span><span class="p">(</span><span class="n">sd</span><span class="o">*</span><span class="n">math</span><span class="o">.</span><span class="n">sqrt</span><span class="p">(</span><span class="mi">2</span><span class="p">))))</span>
<span class="n">mean</span> <span class="o">=</span> <span class="mi">20</span>
<span class="n">sd</span> <span class="o">=</span> <span class="mi">2</span>
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">&#39;Probability that the car is built in less than 19.5 hours : </span><span class="si">{</span><span class="nb">round</span><span class="p">(</span><span class="n">cumulative</span><span class="p">(</span><span class="mf">19.5</span><span class="p">,</span><span class="n">mean</span><span class="p">,</span><span class="n">sd</span><span class="p">),</span><span class="mi">3</span><span class="p">)</span><span class="si">}</span><span class="s1">&#39;</span><span class="p">)</span>
</pre></div>
<div class="highlight"><pre><span></span><span class="err">Probability that the car is built in less than 19.5 hours : 0.401</span>
</pre></div>
<p>Similarly, the probability that a car is built between 20 and 22hours can be computed thanks to the cumulative density function:</p>
<div class="math">$$P(20\leq x\leq 22) = F_X(22)-F_X(20)$$</div>
<div class="highlight"><pre><span></span><span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">&#39;Probability that the car is built between 20 and 22 hours : </span><span class="si">{</span><span class="nb">round</span><span class="p">(</span><span class="n">cumulative</span><span class="p">(</span><span class="mi">22</span><span class="p">,</span><span class="n">mean</span><span class="p">,</span><span class="n">sd</span><span class="p">)</span><span class="o">-</span><span class="n">cumulative</span><span class="p">(</span><span class="mi">20</span><span class="p">,</span><span class="n">mean</span><span class="p">,</span><span class="n">sd</span><span class="p">),</span><span class="mi">3</span><span class="p">)</span><span class="si">}</span><span class="s1">&#39;</span><span class="p">)</span>
</pre></div>
<div class="highlight"><pre><span></span><span class="err">Probability that the car is built between 20 and 22 hours : 0.341</span>
</pre></div>
<h3>Problem 2</h3>
<p>The final grades for a Physics exam taken by a large group of students have a mean of <span class="math">\(\mu=70\)</span> and a standard deviation of <span class="math">\(\sigma=10\)</span>. If we can approximate the distribution of these grades by a normal distribution, what percentage of the students:
* Scored higher than 80 (i.e., have a <span class="math">\(grade \gt 80\)</span>))?
* Passed the test (i.e., have a <span class="math">\(grade \gt 60\)</span>)?
* Failed the test (i.e., have a <span class="math">\(grade \lt 60\)</span>)?</p>
<h3>Mathematical explanation</h3>
<p>Here again, we need to appy the cumulative density function to get the probabilities :</p>
<p>Probability that they scored higher than 80 :
</p>
<div class="math">$$P(X\gt80) = 1- P(X\lt80)$$</div>
<div class="math">$$P(X\gt80) = 1- F_X(80)$$</div>
<div class="highlight"><pre><span></span><span class="n">mean</span> <span class="o">=</span> <span class="mi">70</span>
<span class="n">sd</span> <span class="o">=</span> <span class="mi">10</span>
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">&#39;Probability that the the student scored higher than 80 : </span><span class="si">{</span><span class="nb">round</span><span class="p">(</span><span class="mi">1</span><span class="o">-</span> <span class="n">cumulative</span><span class="p">(</span><span class="mi">80</span><span class="p">,</span><span class="n">mean</span><span class="p">,</span><span class="n">sd</span><span class="p">),</span><span class="mi">3</span><span class="p">)</span><span class="si">}</span><span class="s1">&#39;</span><span class="p">)</span>
</pre></div>
<div class="highlight"><pre><span></span><span class="err">Probability that the the student scored higher than 80 : 0.159</span>
</pre></div>
<p>Probability that they passed the test :
</p>
<div class="math">$$P(X\gt60) = 1- P(X\lt60)$$</div>
<div class="math">$$P(X\gt80) = 1- F_X(60)$$</div>
<div class="highlight"><pre><span></span><span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">&#39;Probability that the the student passed the test : </span><span class="si">{</span><span class="nb">round</span><span class="p">(</span><span class="mi">1</span><span class="o">-</span> <span class="n">cumulative</span><span class="p">(</span><span class="mi">60</span><span class="p">,</span><span class="n">mean</span><span class="p">,</span><span class="n">sd</span><span class="p">),</span><span class="mi">3</span><span class="p">)</span><span class="si">}</span><span class="s1">&#39;</span><span class="p">)</span>
</pre></div>
<div class="highlight"><pre><span></span><span class="err">Probability that the the student passed the test : 0.841</span>
</pre></div>
<p>Probability that they failed : </p>
<div class="math">$$P(X\lt60) = F_X(60)$$</div>
<div class="highlight"><pre><span></span><span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">&#39;Probability that the student failed the test: </span><span class="si">{</span><span class="nb">round</span><span class="p">(</span><span class="n">cumulative</span><span class="p">(</span><span class="mi">60</span><span class="p">,</span><span class="n">mean</span><span class="p">,</span><span class="n">sd</span><span class="p">),</span><span class="mi">3</span><span class="p">)</span><span class="si">}</span><span class="s1">&#39;</span><span class="p">)</span>
</pre></div>
<div class="highlight"><pre><span></span><span class="err">Probability that the student failed the test: 0.159</span>
</pre></div>
<script type="text/javascript">if (!document.getElementById('mathjaxscript_pelican_#%@#$@#')) {
var align = "center",
indent = "0em",
linebreak = "false";
if (false) {
align = (screen.width < 768) ? "left" : align;
indent = (screen.width < 768) ? "0em" : indent;
linebreak = (screen.width < 768) ? 'true' : linebreak;
}
var mathjaxscript = document.createElement('script');
mathjaxscript.id = 'mathjaxscript_pelican_#%@#$@#';
mathjaxscript.type = 'text/javascript';
mathjaxscript.src = 'https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.3/latest.js?config=TeX-AMS-MML_HTMLorMML';
mathjaxscript[(window.opera ? "innerHTML" : "text")] =
"MathJax.Hub.Config({" +
" config: ['MMLorHTML.js']," +
" TeX: { extensions: ['AMSmath.js','AMSsymbols.js','noErrors.js','noUndefined.js'], equationNumbers: { autoNumber: 'AMS' } }," +
" jax: ['input/TeX','input/MathML','output/HTML-CSS']," +
" extensions: ['tex2jax.js','mml2jax.js','MathMenu.js','MathZoom.js']," +
" displayAlign: '"+ align +"'," +
" displayIndent: '"+ indent +"'," +
" showMathMenu: true," +
" messageStyle: 'normal'," +
" tex2jax: { " +
" inlineMath: [ ['\\\\(','\\\\)'] ], " +
" displayMath: [ ['$$','$$'] ]," +
" processEscapes: true," +
" preview: 'TeX'," +
" }, " +
" 'HTML-CSS': { " +
" styles: { '.MathJax_Display, .MathJax .mo, .MathJax .mi, .MathJax .mn': {color: 'inherit ! important'} }," +
" linebreaks: { automatic: "+ linebreak +", width: '90% container' }," +
" }, " +
"}); " +
"if ('default' !== 'default') {" +
"MathJax.Hub.Register.StartupHook('HTML-CSS Jax Ready',function () {" +
"var VARIANT = MathJax.OutputJax['HTML-CSS'].FONTDATA.VARIANT;" +
"VARIANT['normal'].fonts.unshift('MathJax_default');" +
"VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
"VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
"});" +
"MathJax.Hub.Register.StartupHook('SVG Jax Ready',function () {" +
"var VARIANT = MathJax.OutputJax.SVG.FONTDATA.VARIANT;" +
"VARIANT['normal'].fonts.unshift('MathJax_default');" +
"VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
"VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
"});" +
"}";
(document.body || document.getElementsByTagName('head')[0]).appendChild(mathjaxscript);
}</script>
</div>
<aside>
<div class="bug-reporting__panel">
<h3>Find an error or bug? Have a suggestion?</h3>
<p>Everything on this site is avaliable on GitHub. Head on over and <a href='https://github.com/redoules/redoules.github.io/issues/new'>submit an issue.</a> You can also message me directly by <a href='mailto:guillaume.redoules@gadz.org'>email</a>.</p>
</div>
</aside>
</section>
</div>
<!-- start of footer section -->
<footer class="footer">
<div class="container">
<p class="text-muted">
<center>This project contains 119 pages and is available on <a href="https://github.com/redoules/redoules.github.io">GitHub</a>.
<br/>
Copyright &copy; Guillaume Redoulès,
<time datetime="2018">2018</time>.
</center>
</p>
</div>
</footer>
<!-- This jQuery line finds any span that contains code highlighting classes and then selects the parent <pre> tag and adds a border. This is done as a workaround to visually distinguish the code inputs and outputs -->
<script>
$( ".hll, .n, .c, .err, .k, .o, .cm, .cp, .c1, .cs, .gd, .ge, .gr, .gh, .gi, .go, .gp, .gs, .gu, .gt, .kc, .kd, .kn, .kp, .kr, .kt, .m, .s, .na, .nb, .nc, .no, .nd, .ni, .ne, .nf, .nl, .nn, .nt, .nv, .ow, .w, .mf, .mh, .mi, .mo, .sb, .sc, .sd, .s2, .se, .sh, .si, .sx, .sr, .s1, .ss, .bp, .vc, .vg, .vi, .il" ).parent( "pre" ).css( "border", "1px solid #DEDEDE" );
</script>
<!-- Load Google Analytics -->
<script>
/*
(function(i, s, o, g, r, a, m) {
i['GoogleAnalyticsObject'] = r;
i[r] = i[r] || function() {
(i[r].q = i[r].q || []).push(arguments)
}, i[r].l = 1 * new Date();
a = s.createElement(o),
m = s.getElementsByTagName(o)[0];
a.async = 1;
a.src = g;
m.parentNode.insertBefore(a, m)
})(window, document, 'script', '//www.google-analytics.com/analytics.js', 'ga');
ga('create', 'UA-66582-32', 'auto');
ga('send', 'pageview');
*/
</script>
<!-- End of Google Analytics -->
<!-- Bootstrap core JavaScript
================================================== -->
<!-- Placed at the end of the document so the pages load faster -->
<script src="../theme/js/bootstrap.min.js"></script>
<!-- IE10 viewport hack for Surface/desktop Windows 8 bug -->
<script src="../theme/js/ie10-viewport-bug-workaround.js"></script>
</body>
</html>