{"pages":[{"title":"About Guillaume Redoulès","text":"I am a data scientist and a mechanical engineer working on numerical methods for stress computations in the field of rocket propulsion. Prior to that, I've got a MSc in Computational Fluid Dynamics and aerodynamics from Imperial College London. Email: guillaume.redoules@gadz.org Linkedin: Guillaume Redoulès Curriculum Vitae Experience Thermomecanical method and tools engineer , Ariane Group , 2015 - Present In charge of tools and methods related to thermomecanical computations. Focal point for machine learning. Education MSc Advanced Computational Methods for Aeronautics, Flow Management and Fluid-Structure Interaction , Imperial College London, London. 2013 Dissertation: \"Estimator design for fluid flows\" Fields: Aeronautics, aerodynamics, computational fluid dynamics, numerical methods Arts et Métiers Paristech , France, 2011 Generalist engineering degree Fields: Mechanics, electrical engineering, casting, machining, project management, finance, IT, etc.","tags":"pages","url":"redoules.github.io/pages/about.html","loc":"redoules.github.io/pages/about.html"},{"title":"Find the file owner","text":"You can find the owner of a file by running the following command. The command will return the owner of the file and the domain import win32api import win32con import win32security def owner ( file ): sd = win32security . GetFileSecurity ( file , win32security . OWNER_SECURITY_INFORMATION ) owner_sid = sd . GetSecurityDescriptorOwner () name , domain , type = win32security . LookupAccountSid ( None , owner_sid ) return ( name , domain ) filename = \"my.file\" print ( f \"The owner of the file {filename} is {owner(filename)[0]}\" ) The owner of the file my . file is my . user","tags":"Python","url":"redoules.github.io/python/file_owner.html","loc":"redoules.github.io/python/file_owner.html"},{"title":"File creation date in Windows","text":"You can find the date of creating of a file by running the following command import os import time def creation_date ( path_to_file ): return time . strftime ( '%Y-%m- %d %H-%M-%S' , time . localtime ( os . path . getctime ( path_to_file ))) creation_date ( \"my.file\" ) '2019-11-04 14-35-54'","tags":"Python","url":"redoules.github.io/python/file_creation_date.html","loc":"redoules.github.io/python/file_creation_date.html"},{"title":"Get min and max distance withing a point cloud","text":"Here we will learn how to find the maximal or minimal distance between two points in a cloud of points. To do so, we will use the pdist function available in the scipy.spatial.distance package. This function computes the pairwise distances between observations in n-dimensional space; in order to find the longest or shortest distance, juste take the max or min. import numpy as np from scipy.spatial.distance import pdist points = np . random . random (( 10 , 3 )) #generate 100 points pairwise = pdist ( points ) # compute the pairwise distance between those points #compute the maximal and minimal distance print ( f \"maximal distance : {np.max(pairwise)}\" ) print ( f \"minimal distance : {np.min(pairwise)}\" ) maximal distance : 1 . 1393617436726384 minimal distance : 0 . 2382615513731064 The pdist function can that different metric for the distance computation. The default metrics are : * braycurtis * canberra * chebyshev * cityblock * correlation * cosine * dice * euclidean * hamming * jaccard * jensenshannon * kulsinski * mahalanobis * matching * minkowski * rogerstanimoto * russellrao * seuclidean * sokalmichener * sokalsneath * sqeuclidean * yule You can also define your own distances with a lambda function np . max ( pdist ( points , lambda u , v : np . sqrt ((( u - v ) ** 2 ) . sum ()))) 1 . 1393617436726384 or with a classical function def dfun ( u , v ): return np . sqrt ((( u - v ) ** 2 ) . sum ()) np . max ( pdist ( points , dfun )) 1 . 1393617436726384","tags":"Python","url":"redoules.github.io/python/point_cloud_distance.html","loc":"redoules.github.io/python/point_cloud_distance.html"},{"title":"Natural sort of list","text":"Natural sort order is an ordering of strings in alphabetical order, except that multi-digit numbers are ordered as a single character. Natural sort order has been promoted as being more human-friendly (\"natural\") than the machine-oriented pure alphabetical order. For example, in alphabetical sorting \"z11\" would be sorted before \"z2\" because \"1\" is sorted as smaller than \"2\", while in natural sorting \"z2\" is sorted before \"z11\" because \"2\" is sorted as smaller than \"11\". def natural_sort ( l ): \"\"\" return the list l in a natural sort order \"\"\" convert = lambda text : int ( text ) if text . isdigit () else text . lower () alphanum_key = lambda key : [ convert ( c ) for c in re . split ( \"([0-9]+)\" , key )] return sorted ( l , key = alphanum_key )","tags":"Python","url":"redoules.github.io/python/natural_sort.html","loc":"redoules.github.io/python/natural_sort.html"},{"title":"Save a numpy array to disk","text":"In this article we will learn how to save a numpy array to the disk. We will then see how to load it back from the disk into memory. First, let't import numpy. # Import modules import numpy as np We will generate an array to demonstrate saving and loading. myarray = np . arange ( 10 ) Numpy arrays can be save to the disk to the binary .npy format by using the save method. np . save ( \"C: \\\\ temp \\\\ arr.npy\" , myarray ) Once saved, it can be retrived from the disk by using the load method. my_other_array = np . load ( \"C: \\\\ temp \\\\ arr.npy\" ) my_other_array array ([ 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 ])","tags":"Python","url":"redoules.github.io/python/numpy_save.html","loc":"redoules.github.io/python/numpy_save.html"},{"title":"Setting up MariaDB for Remote Client Access","text":"Some MariaDB packages bind MariaDB to 127.0.0.1 (the loopback IP address) by default as a security measure using the bind-address configuration directive, in that case, one can't connect to the MariaDB server from other hosts or from the same host over TCP/IP on a different interface than the loopback (127.0.0.1). The list of users existing remote users can be accessed with the following SQL statement on the mysql.user table: SELECT User, Host FROM mysql.user WHERE Host <> 'localhost'; +-----------+-----------+ | User | Host | +-----------+-----------+ | Guillaume | % | | root | 127.0.0.1 | | root | ::1 | +-----------+-----------+ 4 rows in set (0.00 sec) We will create a \"root\" user that can connect from anywhere with the local area network (LAN), which has addresses in the subnet 192.168.1.0/24. This is an improvement because opening a MariaDB server up to the Internet and granting access to all hosts is bad practice. GRANT ALL PRIVILEGES ON *.* TO 'root'@'192.168.1.%' IDENTIFIED BY 'my-new-password' WITH GRANT OPTION; (% is a wildcard)","tags":"SQL","url":"redoules.github.io/sql/remote_access.html","loc":"redoules.github.io/sql/remote_access.html"},{"title":"Log experiements","text":"Machine learning is a very iterative process, algorithm have multiples hyperparameters to keep track of. And the performance of the models evolves as you get more data. In order to manage the model lifecycle, we will use mlflow. First, import mlflow import mlflow Mlflow can be run on the local computer in order to try it out but I recommend deploying it on a server. In our case, the server is located on the local network at 192.168.1.5:4444. The mlflow client can connect to it via the set_tracking_url method mlflow . set_tracking_uri ( \"http://192.168.1.5:4444\" ) Mlflow can be used to record and query experiements : code, data, config, results... Let's specify that we are working on my-experiment with the method set_experiment . If the experiement does not exist, it will be created. mlflow . set_experiment ( \"my-experiment\" ) mlflow . log_param ( \"num_dimensions\" , 8 ) mlflow . log_param ( \"regularization\" , 0.1 ) Metrics can be logged as well in mlflow, just use the log_metric method. mlflow . log_metric ( \"accuracy\" , 0.1 ) mlflow . log_metric ( \"accuracy\" , 0.45 ) Metrics can be updated at a later time. The changes will be tracked across versions. You can use MLflow Tracking in any environment (for example, a standalone script or a notebook) to log results to local files or to a server, then compare multiple runs. Using the web UI, you can view and compare the output of multiple runs. Teams can also use the tools to compare results from different users:","tags":"Machine Learning","url":"redoules.github.io/machine-learning/Log_experiments_mlflow.html","loc":"redoules.github.io/machine-learning/Log_experiments_mlflow.html"},{"title":"Filter or select lines of a DataFrame containing values in a list","text":"In this article we will learn to filter the lines of a dataframe based on the values contained in a column of that dataframe. This is simular to the \"Filter\" functionnality of Excel. Let's first create our dataframe : # Import modules import pandas as pd # Example dataframe raw_data = { 'fruit' : [ 'Banana' , 'Orange' , 'Apple' , 'lemon' , \"lime\" , \"plum\" ], 'color' : [ 'yellow' , 'orange' , 'red' , 'yellow' , \"green\" , \"purple\" ], 'kcal' : [ 89 , 47 , 52 , 15 , 30 , 28 ] } df = pd . DataFrame ( raw_data , columns = [ 'fruit' , 'color' , 'kcal' ]) df .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } fruit color kcal 0 Banana yellow 89 1 Orange orange 47 2 Apple red 52 3 lemon yellow 15 4 lime green 30 5 plum purple 28 If we want to extract all the lines where the value of the color column is yellow, we would proceed like so : df [ df [ \"color\" ] == \"yellow\" ] .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } fruit color kcal 0 Banana yellow 89 3 lemon yellow 15 Now, if we want to filter the DataFrame by a list of values we would rather use the isin method like this : df [ df [ \"color\" ] . isin ([ \"yellow\" , \"red\" ])] .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } fruit color kcal 0 Banana yellow 89 2 Apple red 52 3 lemon yellow 15","tags":"Python","url":"redoules.github.io/python/select_lines_values_list.html","loc":"redoules.github.io/python/select_lines_values_list.html"},{"title":"List all sections in a config file","text":"A config file is partionned in sections.Here is an examples of a config file named config.ini : [section1] var_a:hello var_b:world [section2] myvariable: 42 There are two sections in this config file, you can access to them in python by calling the sections method of the ConfigParser class import configparser config = configparser . ConfigParser () config . read ( \"config.ini\" ) config . sections () [ 'section1' , 'section2' ]","tags":"Python","url":"redoules.github.io/python/config_list.html","loc":"redoules.github.io/python/config_list.html"},{"title":"Getting traffic data from google maps","text":"Goal of the project We will scrap google maps in order to find the travel time from a grid of points to a couple of destinations. This way, we will find the most optimal points to minimize both journeys. This code can be used to pinpoint the best locations to pick a home when two people are working at different locations. By scrapping google maps, we can take into account how the traffic impacts the travel time. You can download the project by going to the GitHub repository Scrapping google maps Since google maps is a dynamic website, we cannot use simple tools such as wget or curl. Even webparsers such as scrappy don't render the DOM hence cannot work in this situation. The easiest way to scrap data from such websites is to take control of a browser by using an automation tool. In this case we will use selenium to take control of Google Chrome with the chromedriver. You have to install selenium with conda install - c conda - forge selenium or pip install selenium you also need to have the chromedriver.exe downloaded. BeautifulSoup is a package we will use to parse the html of the webpage opened in chrome. In order to extract the estimated travel time, we need to inspect the source code of the page in find the element we are interested in. In our case it is section-directions-trip-numbers . In this
.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } B G B BB BG G GB GG We know that at least one of the children is a boy, so only \"GG\" is not possible. The event where the family has a new boy is then \"BB\". Hence the probability is : $$\\frac{BB}{BB+GB+BG}=\\frac{1}{3}$$ Draw 2 cards from a deck Problem You 2 draw cards from a standard 52-card deck without replacing them. What is the probability that both cards are of the same suit? Mathematical explanation There are 13 cards of each suit. Draw one card. It can be anything with probability of 1. Now there are 51 cards left and 12 of them are the same suit as the first card you drew. So the chance the second card matches the 1st is \\(\\frac{12}{51}\\) . Drawing marbles Problem A bag contains 3 red marbles and 4 blue marbles. Then, 2 marbles are drawn from the bag, at random, without replacement. If the first marble drawn is red, what is the probability that the second marble is blue? Mathematical explanation On the first draw, the probabilities are the following : we call B the event \"a blue ball is drawn\" and R the event \"a red ball is drawn\" * \\(P(B)=\\frac{4}{7}\\) * \\(P(R)=\\frac{3}{7}\\) On the second draw, if a red ball has been drawn at first, the probabilities are : * \\(P(B|R)=\\frac{4}{6}\\) * \\(P(R|R)=\\frac{2}{6}\\) Hence, the probability of drawing a blue ball if the first ball drawn was red is \\(\\frac{1}{3}\\) if (!document.getElementById('mathjaxscript_pelican_#%@#$@#')) { var align = \"center\", indent = \"0em\", linebreak = \"false\"; if (false) { align = (screen.width < 768) ? \"left\" : align; indent = (screen.width < 768) ? \"0em\" : indent; linebreak = (screen.width < 768) ? 'true' : linebreak; } var mathjaxscript = document.createElement('script'); mathjaxscript.id = 'mathjaxscript_pelican_#%@#$@#'; mathjaxscript.type = 'text/javascript'; mathjaxscript.src = 'https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.3/latest.js?config=TeX-AMS-MML_HTMLorMML'; mathjaxscript[(window.opera ? \"innerHTML\" : \"text\")] = \"MathJax.Hub.Config({\" + \" config: ['MMLorHTML.js'],\" + \" TeX: { extensions: ['AMSmath.js','AMSsymbols.js','noErrors.js','noUndefined.js'], equationNumbers: { autoNumber: 'AMS' } },\" + \" jax: ['input/TeX','input/MathML','output/HTML-CSS'],\" + \" extensions: ['tex2jax.js','mml2jax.js','MathMenu.js','MathZoom.js'],\" + \" displayAlign: '\"+ align +\"',\" + \" displayIndent: '\"+ indent +\"',\" + \" showMathMenu: true,\" + \" messageStyle: 'normal',\" + \" tex2jax: { \" + \" inlineMath: [ ['\\\\\\\\(','\\\\\\\\)'] ], \" + \" displayMath: [ ['$$','$$'] ],\" + \" processEscapes: true,\" + \" preview: 'TeX',\" + \" }, \" + \" 'HTML-CSS': { \" + \" styles: { '.MathJax_Display, .MathJax .mo, .MathJax .mi, .MathJax .mn': {color: 'inherit ! important'} },\" + \" linebreaks: { automatic: \"+ linebreak +\", width: '90% container' },\" + \" }, \" + \"}); \" + \"if ('default' !== 'default') {\" + \"MathJax.Hub.Register.StartupHook('HTML-CSS Jax Ready',function () {\" + \"var VARIANT = MathJax.OutputJax['HTML-CSS'].FONTDATA.VARIANT;\" + \"VARIANT['normal'].fonts.unshift('MathJax_default');\" + \"VARIANT['bold'].fonts.unshift('MathJax_default-bold');\" + \"VARIANT['italic'].fonts.unshift('MathJax_default-italic');\" + \"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');\" + \"});\" + \"MathJax.Hub.Register.StartupHook('SVG Jax Ready',function () {\" + \"var VARIANT = MathJax.OutputJax.SVG.FONTDATA.VARIANT;\" + \"VARIANT['normal'].fonts.unshift('MathJax_default');\" + \"VARIANT['bold'].fonts.unshift('MathJax_default-bold');\" + \"VARIANT['italic'].fonts.unshift('MathJax_default-italic');\" + \"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');\" + \"});\" + \"}\"; (document.body || document.getElementsByTagName('head')[0]).appendChild(mathjaxscript); }","tags":"Blog","url":"redoules.github.io/blog/Statistics_10days-day3.html","loc":"redoules.github.io/blog/Statistics_10days-day3.html"},{"title":"Day 2 - Probability, Compound Event Probability","text":"Basic probability with dices Problem In this challenge, we practice calculating probability. In a single toss of 2 fair (evenly-weighted) six-sided dice, find the probability that their sum will be at most 9. Mathematical explanation A nice way to think about sums-of-two-dice problems is to lay out the sums in a 6-by-6 grid in the obvious manner. .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } 1 2 3 4 5 6 1 2 3 4 5 6 7 2 3 4 5 6 7 8 3 4 5 6 7 8 9 4 5 6 7 8 9 10 5 6 7 8 9 10 11 6 7 8 9 10 11 12 We see that the identic values are on the same diagonal. The number of elements on the diagonal varies from 1 to 6 and then back to 1. let's call A < x the event : the sum all the 2 tosses is at most x. $$P(A\\leq9)=\\sum_{i=2}^{9} P(A = i)$$ $$P(A\\leq9)=1-P(A\\gt9)$$ $$P(A\\leq9)=1-\\sum_{i=10}^{12} P(A = i)$$ The value of \\(P(A = i) = \\frac{i-1}{36}\\) if \\(i \\leq 7\\) and \\(P(A = i) = \\frac{13-i}{36}\\) hence $$P(A\\leq9)=1-\\sum_{i=10}^{12} \\frac{13-i}{36}$$ $$P(A\\leq9)= 1-\\frac{6}{36}$$ $$P(A\\leq9)= \\frac{5}{6}$$ Let's program it sum ([ 1 for d1 in range ( 1 , 7 ) for d2 in range ( 1 , 7 ) if d1 + d2 <= 9 ]) / 36 0 . 8333333333333334 More dices Problem In a single toss of 2 fair (evenly-weighted) six-sided dice, find the probability that the values rolled by each die will be different and the two dice have a sum of 6. Mathematical explanation Let's consider 2 events : A and B. A compound event is a combination of 2 or more simple events. If A and B are simple events, then A∪B denotes the occurence of either A or B. A∩B denotes the occurence of A and B together. We denote A the event \"the values of each dice is different\". The opposit event is A' \"the values of each dice is the same\". $$P(A) = 1-P(A')$$ $$P(A)=1-\\frac{6}{36}$$ $$P(A)=\\frac{5}{6}$$ We denote B the event \"the two dice have a sum of 6\", this probability has been computed on the first part of the article : $$P(B)=\\frac{5}{36}$$ The probability of having 2 dice different of sum 6 is : $$P(A|B) = 4/5$$ The probability that both A and B occure is equal to P(A∩B). Since \\(P(A|B)=\\frac{P(A∩B)}{P(B)}\\) $$P(A∩B)=P(B)*P(A|B)$$ $$P(A∩B)=5/36*4/5$$ $$P(A∩B)=1/9$$ Let's program it sum ([ 1 for d1 in range ( 1 , 7 ) for d2 in range ( 1 , 7 ) if ( d1 + d2 == 6 ) and ( d1 != d2 )]) / 36 0 . 1111111111111111 Compound Event Probability Problem There are 3 urns labeled X, Y, and Z. Urn X contains 4 red balls and 3 black balls. Urn Y contains 5 red balls and 4 black balls. Urn Z contains 4 red balls and 4 black balls. One ball is drawn from each of the urns. What is the probability that, of the 3 balls drawn, are 2 red and is 1 black? Mathematical explanation Let's write the different probabilities: .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } Red ball Black ball Urne X $$\\frac{4}{7}$$ $$\\frac{3}{7}$$ Urne Y $$\\frac{5}{9}$$ $$\\frac{4}{9}$$ Urne Z $$\\frac{1}{2}$$ $$\\frac{1}{2}$$ Addition rule A and B are said to be mutually exclusive or disjoint if they have no events in common (i.e., and A∩B=∅ and P(A∩B)=0. The probability of any of 2 or more events occurring is the union (∪) of events. Because disjoint probabilities have no common events, the probability of the union of disjoint events is the sum of the events' individual probabilities. A and B are said to be collectively exhaustive if their union covers all events in the sample space (i.e., A∪B=S and P(A∪B)=1). This brings us to our next fundamental rule of probability: if 2 events, A and B, are disjoint, then the probability of either event is the sum of the probabilities of the 2 events (i.e., P(A or B) = P(A)+P(B)) Mutliplication rule If the outcome of the first event (A) has no impact on the second event (B), then they are considered to be independent (e.g., tossing a fair coin). This brings us to the next fundamental rule of probability: the multiplication rule. It states that if two events, A and B, are independent, then the probability of both events is the product of the probabilities for each event (i.e., P(A and B)= P(A)xP(B)). The chance of all events occurring in a sequence of events is called the intersection (∩) of those events. The balls drawn from the urns are independant hence : p = P(2 red (R) and 1 back (B)) $$p = P(RRB) + P(RBR) + P(BRR)$$ Each of those 3 probability if equal to the product of the probability of drawing each ball \\(P(RRB) = P(R|X) * P(R|Y) * P(B|Z) = 4/7*5/9*1/2\\) \\(P(RRB) = 20/126\\) \\(P(RBR) = 16/126\\) \\(P(BRR) = 15/126\\) this leads to \\(p = 51/126\\) and finally $$p = \\frac{17}{42}$$ Let's program it X = 3 * [ \"B\" ] + 4 * [ \"R\" ] Y = 4 * [ \"B\" ] + 5 * [ \"R\" ] Z = 4 * [ \"B\" ] + 4 * [ \"R\" ] target = [ \"BRR\" , \"RRB\" , \"RBR\" ] sum ([ 1 for x in X for y in Y for z in Z if x + y + z in target ]) / sum ([ 1 for x in X for y in Y for z in Z ]) 0 . 40476190476190477 if (!document.getElementById('mathjaxscript_pelican_#%@#$@#')) { var align = \"center\", indent = \"0em\", linebreak = \"false\"; if (false) { align = (screen.width < 768) ? \"left\" : align; indent = (screen.width < 768) ? \"0em\" : indent; linebreak = (screen.width < 768) ? 'true' : linebreak; } var mathjaxscript = document.createElement('script'); mathjaxscript.id = 'mathjaxscript_pelican_#%@#$@#'; mathjaxscript.type = 'text/javascript'; mathjaxscript.src = 'https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.3/latest.js?config=TeX-AMS-MML_HTMLorMML'; mathjaxscript[(window.opera ? \"innerHTML\" : \"text\")] = \"MathJax.Hub.Config({\" + \" config: ['MMLorHTML.js'],\" + \" TeX: { extensions: ['AMSmath.js','AMSsymbols.js','noErrors.js','noUndefined.js'], equationNumbers: { autoNumber: 'AMS' } },\" + \" jax: ['input/TeX','input/MathML','output/HTML-CSS'],\" + \" extensions: ['tex2jax.js','mml2jax.js','MathMenu.js','MathZoom.js'],\" + \" displayAlign: '\"+ align +\"',\" + \" displayIndent: '\"+ indent +\"',\" + \" showMathMenu: true,\" + \" messageStyle: 'normal',\" + \" tex2jax: { \" + \" inlineMath: [ ['\\\\\\\\(','\\\\\\\\)'] ], \" + \" displayMath: [ ['$$','$$'] ],\" + \" processEscapes: true,\" + \" preview: 'TeX',\" + \" }, \" + \" 'HTML-CSS': { \" + \" styles: { '.MathJax_Display, .MathJax .mo, .MathJax .mi, .MathJax .mn': {color: 'inherit ! important'} },\" + \" linebreaks: { automatic: \"+ linebreak +\", width: '90% container' },\" + \" }, \" + \"}); \" + \"if ('default' !== 'default') {\" + \"MathJax.Hub.Register.StartupHook('HTML-CSS Jax Ready',function () {\" + \"var VARIANT = MathJax.OutputJax['HTML-CSS'].FONTDATA.VARIANT;\" + \"VARIANT['normal'].fonts.unshift('MathJax_default');\" + \"VARIANT['bold'].fonts.unshift('MathJax_default-bold');\" + \"VARIANT['italic'].fonts.unshift('MathJax_default-italic');\" + \"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');\" + \"});\" + \"MathJax.Hub.Register.StartupHook('SVG Jax Ready',function () {\" + \"var VARIANT = MathJax.OutputJax.SVG.FONTDATA.VARIANT;\" + \"VARIANT['normal'].fonts.unshift('MathJax_default');\" + \"VARIANT['bold'].fonts.unshift('MathJax_default-bold');\" + \"VARIANT['italic'].fonts.unshift('MathJax_default-italic');\" + \"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');\" + \"});\" + \"}\"; (document.body || document.getElementsByTagName('head')[0]).appendChild(mathjaxscript); }","tags":"Blog","url":"redoules.github.io/blog/Statistics_10days-day2.html","loc":"redoules.github.io/blog/Statistics_10days-day2.html"},{"title":"Day 1 - Quartiles, Interquartile Range and standard deviation","text":"Quartile Definition A quartile is a type of quantile. The first quartile (Q1) is defined as the middle number between the smallest number and the median of the data set. The second quartile (Q2) is the median of the data. The third quartile (Q3) is the middle value between the median and the highest value of the data set. Implementation in python without using the scientific libraries def median ( l ): l = sorted ( l ) if len ( l ) % 2 == 0 : return ( l [ len ( l ) // 2 ] + l [( len ( l ) // 2 - 1 )]) / 2 else : return l [ len ( l ) // 2 ] def quartiles ( l ): # check the input is not empty if not l : raise StatsError ( 'no data points passed' ) # 1. order the data set l = sorted ( l ) # 2. divide the data set in two halves mid = int ( len ( l ) / 2 ) Q2 = median ( l ) if ( len ( l ) % 2 == 0 ): # even Q1 = median ( l [: mid ]) Q3 = median ( l [ mid :]) else : # odd Q1 = median ( l [: mid ]) # same as even Q3 = median ( l [ mid + 1 :]) return ( Q1 , Q2 , Q3 ) L = [ 3 , 7 , 8 , 5 , 12 , 14 , 21 , 13 , 18 ] Q1 , Q2 , Q3 = quartiles ( L ) print ( f \"Sample : {L} \\n Q1 : {Q1}, Q2 : {Q2}, Q3 : {Q3}\" ) Sample : [ 3 , 7 , 8 , 5 , 12 , 14 , 21 , 13 , 18 ] Q1 : 6 . 0 , Q2 : 12 , Q3 : 16 . 0 Interquartile Range Definition The interquartile range of an array is the difference between its first (Q1) and third (Q3) quartiles. Hence the interquartile range is Q3-Q1 Implementation in python without using the scientific libraries print ( f \"Interquatile range : {Q3-Q1}\" ) Interquatile range : 10 . 0 Standard deviation Definition The standard deviation (σ) is a measure that is used to quantify the amount of variation or dispersion of a set of data values. A low standard deviation indicates that the data points tend to be close to the mean (also called the expected value) of the set, while a high standard deviation indicates that the data points are spread out over a wider range of values. The standard deviation can be computed with the formula: where µ is the mean : Implementation in python without using the scientific libraries import math X = [ 10 , 40 , 30 , 50 , 20 ] mean = sum ( X ) / len ( X ) X = [( x - mean ) ** 2 for x in X ] std = math . sqrt ( sum ( X ) / len ( X ) ) print ( f \"The distribution {X} has a standard deviation of {std}\" ) The distribution [ 400 . 0 , 100 . 0 , 0 . 0 , 400 . 0 , 100 . 0 ] has a standard deviation of 14 . 142135623730951","tags":"Blog","url":"redoules.github.io/blog/Statistics_10days-day1.html","loc":"redoules.github.io/blog/Statistics_10days-day1.html"},{"title":"Counting values in an array","text":"Using lists If you want to count the number of occurences of an element in a list you can use the .count() function of the list object arr = [ 1 , 2 , 3 , 3 , 4 , 5 , 3 , 6 , 7 , 7 ] print ( f 'Array : {arr} \\n ' ) print ( f 'The number 3 appears {arr.count(3)} times in the list' ) print ( f 'The number 7 appears {arr.count(7)} times in the list' ) print ( f 'The number 4 appears {arr.count(4)} times in the list' ) Array : [ 1 , 2 , 3 , 3 , 4 , 5 , 3 , 6 , 7 , 7 ] The number 3 appears 3 times in the list The number 7 appears 2 times in the list The number 4 appears 1 times in the list Using collections you can get a dictonnary of the number of occurences of each elements in a list thanks to the collections object like this import collections collections . Counter ( arr ) Counter ( { 1 : 1 , 2 : 1 , 3 : 3 , 4 : 1 , 5 : 1 , 6 : 1 , 7 : 2 } ) Using numpy You can have a simular result with numpy by hacking the unique function import numpy as np arr = np . array ( arr ) unique , counts = np . unique ( arr , return_counts = True ) dict ( zip ( unique , counts )) { 1 : 1 , 2 : 1 , 3 : 3 , 4 : 1 , 5 : 1 , 6 : 1 , 7 : 2 }","tags":"Python","url":"redoules.github.io/python/counting.html","loc":"redoules.github.io/python/counting.html"},{"title":"Building a dictonnary using comprehension","text":"An easy way to create a dictionnary in python is to use the comprehension syntaxe. It can be more expressive hence easier to read. d = { key : value for ( key , value ) in iterable } In the example bellow we use the dictionnary comprehension to build a dictonnary from a source list. iterable = list ( range ( 10 )) d = { str ( value ): value ** 2 for value in iterable } # create a dictionnary linking the string value of a number with the square value of this number print ( d ) { '0' : 0 , '1' : 1 , '2' : 4 , '3' : 9 , '4' : 16 , '5' : 25 , '6' : 36 , '7' : 49 , '8' : 64 , '9' : 81 } of course, you can use an other iterable an repack it with the comprehension syntaxe. In the following example, we convert a list of tuples in a dictonnary. iterable = [( \"France\" , 67.12e6 ), ( \"UK\" , 66.02e6 ), ( \"USA\" , 325.7e6 ), ( \"China\" , 1386e6 ), ( \"Germany\" , 82.79e6 )] population = { key : value for ( key , value ) in iterable } print ( population ) { 'France' : 67120000 . 0 , 'UK' : 66020000 . 0 , 'USA' : 325700000 . 0 , 'China' : 1386000000 . 0 , 'Germany' : 82790000 . 0 }","tags":"Python","url":"redoules.github.io/python/dict_comprehension.html","loc":"redoules.github.io/python/dict_comprehension.html"},{"title":"Extracting unique values from a list or an array","text":"Using lists An easy way to extract the unique values of a list in python is to convert the list to a set. A set is an unordered collection of items. Every element is unique (no duplicates) and must be immutable. my_list = [ 10 , 20 , 30 , 40 , 20 , 50 , 60 , 40 ] print ( f \"Original List : {my_list}\" ) my_set = set ( my_list ) my_new_list = list ( my_set ) # the set is converted back to a list with the list() function print ( f \"List of unique numbers : {my_new_list}\" ) Original List : [ 10 , 20 , 30 , 40 , 20 , 50 , 60 , 40 ] List of unique numbers : [ 40 , 10 , 50 , 20 , 60 , 30 ] Using numpy If you are using numpy you can extract the unique values of an array with the unique function builtin numpy: import numpy as np arr = np . array ( my_list ) print ( f 'Initial numpy array : {arr} \\n ' ) unique_arr = np . unique ( arr ) print ( f 'Numpy array with unique values : {unique_arr}' ) Initial numpy array : [ 10 20 30 40 20 50 60 40 ] Numpy array with unique values : [ 10 20 30 40 50 60 ]","tags":"Python","url":"redoules.github.io/python/unique.html","loc":"redoules.github.io/python/unique.html"},{"title":"Sorting an array","text":"Using lists Python provides an iterator to sort an array sorted() you can use it this way : import random # Random lists from [0-999] interval arr = [ random . randint ( 0 , 1000 ) for r in range ( 10 )] print ( f 'Initial random list : {arr} \\n ' ) reversed_arr = list ( sorted ( arr )) print ( f 'Sorted list : {reversed_arr}' ) Initial random list : [ 277 , 347 , 976 , 367 , 604 , 878 , 148 , 670 , 229 , 432 ] Sorted list : [ 148 , 229 , 277 , 347 , 367 , 432 , 604 , 670 , 878 , 976 ] it is also possible to use the sort function from the list object # Random lists from [0-999] interval arr = [ random . randint ( 0 , 1000 ) for r in range ( 10 )] print ( f 'Initial random list : {arr} \\n ' ) arr . sort () print ( f 'Sorted list : {arr}' ) Initial random list : [ 727 , 759 , 68 , 103 , 23 , 90 , 258 , 737 , 791 , 567 ] Sorted list : [ 23 , 68 , 90 , 103 , 258 , 567 , 727 , 737 , 759 , 791 ] Using numpy If you are using numpy you can sort an array by creating a view on the array: import numpy as np arr = np . random . random ( 5 ) print ( f 'Initial random array : {arr} \\n ' ) sorted_arr = np . sort ( arr ) print ( f 'Sorted array : {sorted_arr}' ) Initial random array : [ 0 . 40021786 0 . 13876208 0 . 19939047 0 . 46015169 0 . 43734158 ] Sorted array : [ 0 . 13876208 0 . 19939047 0 . 40021786 0 . 43734158 0 . 46015169 ]","tags":"Python","url":"redoules.github.io/python/sorting.html","loc":"redoules.github.io/python/sorting.html"},{"title":"Day 0 - Median, mean, mode and weighted mean","text":"A reminder The median The median is the value separating the higher half from the lower half of a data sample. For a data set, it may be thought of as the middle value. For a continuous probability distribution, the median is the value such that a number is equally likely to fall above or below it. The mean The arithmetic mean (or simply mean) of a sample is the sum of the sampled values divided by the number of items. The mode The mode of a set of data values is the value that appears most often. It is the value x at which its probability mass function takes its maximum value. In other words, it is the value that is most likely to be sampled. Implementation in python without using the scientific libraries def median ( l ): l = sorted ( l ) if len ( l ) % 2 == 0 : return ( l [ len ( l ) // 2 ] + l [( len ( l ) // 2 - 1 )]) / 2 else : return l [ len ( l ) // 2 ] def mean ( l ): return sum ( l ) / len ( l ) def mode ( data ): dico = { x : data . count ( x ) for x in list ( set ( data ))} return sorted ( sorted ( dico . items ()), key = lambda x : x [ 1 ], reverse = True )[ 0 ][ 0 ] L = [ 64630 , 11735 , 14216 , 99233 , 14470 , 4978 , 73429 , 38120 , 51135 , 67060 , 4978 , 73429 ] print ( f \"Sample : {L} \\n Mean : {mean(L)}, Median : {median(L)}, Mode : {mode(L)}\" ) Sample : [ 64630 , 11735 , 14216 , 99233 , 14470 , 4978 , 73429 , 38120 , 51135 , 67060 , 4978 , 73429 ] Mean : 43117 . 75 , Median : 44627 . 5 , Mode : 4978 The weighted average The weighted arithmetic mean is similar to an ordinary arithmetic mean (the most common type of average), except that instead of each of the data points contributing equally to the final average, some data points contribute more than others. data = [ 10 , 40 , 30 , 50 , 20 ] weights = [ 1 , 2 , 3 , 4 , 5 ] sum_X = sum ([ x * w for x , w in zip ( data , weights )]) print ( round (( sum_X / sum ( weights )), 1 )) 32 . 0","tags":"Blog","url":"redoules.github.io/blog/Statistics_10days-day0.html","loc":"redoules.github.io/blog/Statistics_10days-day0.html"},{"title":"Create a simple bash function","text":"A basic function The synthaxe to define a function is : #!/bin/bash # Basic function my_function () { echo Text displayed by my_function } #once defined, you can use it like so : my_function and it should return user@bash : ./my_function.sh Text displayed by my_function Function with arguments When used, the arguments are specified directly after the function name. Whithin the function they are accessible this the $ symbol followed by the number of the arguement. Hence $1 will take the value of the first arguement, $2 will take the value of the second arguement and so on. #!/bin/bash # Passing arguments to a function say_hello () { echo Hello $1 } say_hello Guillaume and it should return user@bash : ./function_arguements.sh Hello Guillaume Overriding Commands Using the previous example, let's override the echo function in order to make it say hello. To do so, you just need to name the function with the same name as the command you want to replace. When you are calling the original function, make sure you are using the builtin keyword #!/bin/bash # Overriding a function echo () { builtin echo Hello $1 } echo Guillaume user@bash : ./function_arguements.sh Hello Guillaume Returning values Use the keyword return to send back a value to the main program. The returned value will be stored in the $? variable #!/bin/bash # Retruning a value secret_number () { return 126 } secret_number echo The secret number is $? This code should return user@bash : ./retrun_value.sh The secret number is 126","tags":"Linux","url":"redoules.github.io/linux/simple_bash_function.html","loc":"redoules.github.io/linux/simple_bash_function.html"},{"title":"Number of edges in a Complete graph","text":"A complete graph contains \\(\\frac{n(n-1)}{2}\\) edges where \\(n\\) is the number of vertices (or nodes). if (!document.getElementById('mathjaxscript_pelican_#%@#$@#')) { var align = \"center\", indent = \"0em\", linebreak = \"false\"; if (false) { align = (screen.width < 768) ? \"left\" : align; indent = (screen.width < 768) ? \"0em\" : indent; linebreak = (screen.width < 768) ? 'true' : linebreak; } var mathjaxscript = document.createElement('script'); mathjaxscript.id = 'mathjaxscript_pelican_#%@#$@#'; mathjaxscript.type = 'text/javascript'; mathjaxscript.src = 'https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.3/latest.js?config=TeX-AMS-MML_HTMLorMML'; mathjaxscript[(window.opera ? \"innerHTML\" : \"text\")] = \"MathJax.Hub.Config({\" + \" config: ['MMLorHTML.js'],\" + \" TeX: { extensions: ['AMSmath.js','AMSsymbols.js','noErrors.js','noUndefined.js'], equationNumbers: { autoNumber: 'AMS' } },\" + \" jax: ['input/TeX','input/MathML','output/HTML-CSS'],\" + \" extensions: ['tex2jax.js','mml2jax.js','MathMenu.js','MathZoom.js'],\" + \" displayAlign: '\"+ align +\"',\" + \" displayIndent: '\"+ indent +\"',\" + \" showMathMenu: true,\" + \" messageStyle: 'normal',\" + \" tex2jax: { \" + \" inlineMath: [ ['\\\\\\\\(','\\\\\\\\)'] ], \" + \" displayMath: [ ['$$','$$'] ],\" + \" processEscapes: true,\" + \" preview: 'TeX',\" + \" }, \" + \" 'HTML-CSS': { \" + \" styles: { '.MathJax_Display, .MathJax .mo, .MathJax .mi, .MathJax .mn': {color: 'inherit ! important'} },\" + \" linebreaks: { automatic: \"+ linebreak +\", width: '90% container' },\" + \" }, \" + \"}); \" + \"if ('default' !== 'default') {\" + \"MathJax.Hub.Register.StartupHook('HTML-CSS Jax Ready',function () {\" + \"var VARIANT = MathJax.OutputJax['HTML-CSS'].FONTDATA.VARIANT;\" + \"VARIANT['normal'].fonts.unshift('MathJax_default');\" + \"VARIANT['bold'].fonts.unshift('MathJax_default-bold');\" + \"VARIANT['italic'].fonts.unshift('MathJax_default-italic');\" + \"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');\" + \"});\" + \"MathJax.Hub.Register.StartupHook('SVG Jax Ready',function () {\" + \"var VARIANT = MathJax.OutputJax.SVG.FONTDATA.VARIANT;\" + \"VARIANT['normal'].fonts.unshift('MathJax_default');\" + \"VARIANT['bold'].fonts.unshift('MathJax_default-bold');\" + \"VARIANT['italic'].fonts.unshift('MathJax_default-italic');\" + \"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');\" + \"});\" + \"}\"; (document.body || document.getElementsByTagName('head')[0]).appendChild(mathjaxscript); }","tags":"Mathematics","url":"redoules.github.io/mathematics/Number_edges_Complete_graph.html","loc":"redoules.github.io/mathematics/Number_edges_Complete_graph.html"},{"title":"Reverse an array","text":"Using lists Python provides an iterator to reverse an array reversed() you can use it this way : arr = list ( range ( 5 )) print ( f 'Initial array : {arr} \\n ' ) reversed_arr = list ( reversed ( arr )) print ( f 'Reversed array : {reversed_arr}' ) Initial array : [ 0 , 1 , 2 , 3 , 4 ] Reversed array : [ 4 , 3 , 2 , 1 , 0 ] Using numpy If you are using numpy you can reverse an array by creating a view on the array: import numpy as np arr = np . arange ( 5 ) print ( f 'Initial array : {arr} \\n ' ) reversed_arr = arr [:: - 1 ] print ( f 'Reversed array : {reversed_arr}' ) Initial array : [ 0 1 2 3 4 ] Reversed array : [ 4 3 2 1 0 ]","tags":"Python","url":"redoules.github.io/python/reverse.html","loc":"redoules.github.io/python/reverse.html"},{"title":"Advice for designing your own libraries","text":"Advice for designing your own libraries When designing your own library make sure to think of the following things. I will add new paragraphs to this article as I dicover new good practices. Use standard python objects Try to use standard python objects as much as possible. That way, your library becomes compatible with all the other python libaries. For instance, when I created SAMpy : a library for reading and writing SAMCEF results, it returned dictonnaries, lists and pandas dataframes. Hence the results extracted from SAMCEF where compatible with all the scientific stack of python. Limit the number of functionnalities Following the same logic as before, the objects should do only one thing but do it well. Indeed, having a simple interface will reduce the complexity of your code and make it easier to use your library. Again, with SAMpy, I decided to strictly limit the functionnalities to reading and writing SAMCEF files. Define an exception class for your library You should define your own exceptions in order to make it easier for your users to debug their code thanks to clearer messages that convey more meaning. That way, the user will know if the error comes from your library or something else. Bonus if you group similar exceptions in a hierachy of inerited Exception classes. Example : let's create a Exception related to the age of a person : def check_age ( age ): if age < 0 and age > 130 : raise ValueError If the user inputed an invalid age, the ValueError exception would be thrown. That's fine but imagine you wan't to provide more feedback to your users that don't know the internal of your library. Let's now create a selfexplanatory Exception class AgeInvalidError ( ValueError ): pass def check_age ( age ): if age < 0 and age > 130 : raise AgeInvalidError ( age ) You can also add some helpful text to guide your users along the way: class AgeInvalidError ( ValueError ): print ( \"Age invalid, must be between 0 and 130\" ) pass def check_age ( age ): if age < 0 and age > 130 : raise AgeInvalidError ( age ) If you want to group all the logically linked exceptions, you can create a base class and inherit from it : class BaseAgeInvalidError ( ValueError ): pass class TooYoungError ( BaseAgeInvalidError ): pass class TooOldError ( BaseAgeInvalidError ): pass def check_age ( age ): if age < 0 : raise TooYoungError ( age ) elif age > 130 : raise TooOldError ( age ) Structure your repository You should have a file structure in your repository. It will help other contributers especially future contributers. A nice directory structure for your project should look like this: README . md LICENSE setup . py requirements . txt . / MyPackage . / docs . / tests Some prefer to use reStructured Text, I personnaly prefer Markdown choosealicense.com will help you pick the license to use for your project. For package and distribution management, create a setup.py file a the root of the directory The list of dependencies required to test, build and generate the doc are listed in a pip requirement file placed a the root of the directory and named requirements.txt Put the documentation of your library in the docs directory. Put your tests in the tests directory. Since your tests will need to import your library, I recommend modifying the path to resolve your package property. In order to do so, you can create a context.py file located in the tests directory : import os import sys sys . path . insert ( 0 , os . path . abspath ( os . path . join ( os . path . dirname ( __file__ ), '..' ))) import MyPackage Then within your individual test files you can import your package like so : from .context import MyPackage Finally, your code will go into the MyPackage directory Test your code Once your library is in production, you have to guaranty some level of forward compatibility. Once your interface is defined, write some tests. In the future, when your code is modified, having those tests will make sure that the behaviour of your functions and objects won't be altered. Document your code Of course, you should have a documentation to go along with your library. Make sure to add a lot of commun examples as most users tend to learn from examples. I recommend writing your documentation using Sphinx.","tags":"Python","url":"redoules.github.io/python/design_own_libs.html","loc":"redoules.github.io/python/design_own_libs.html"},{"title":"Safely creating a folder if it doesn't exist","text":"Safely creating a folder if it doesn't exist When you are writing to files in python, if the file doesn't exist it will be created. However, if you are trying to write a file in a directory that doesn't exist, an exception will be returned FileNotFoundError : [ Errno 2 ] No such file or directory : \"directory\" This article will teach you how to make sure the target directory exists. If it doesn't, the function will create that directory. First, let's import os and make sure that the \"test_directory\" doesn't exist import os os . path . exists ( \". \\\\ test_directory\" ) False copy the ensure_dir function in your code. This function will handle the creation of the directory. Credit goes to Parand posted on StackOverflow def ensure_dir ( file_path ): directory = os . path . dirname ( file_path ) if not os . path . exists ( directory ): os . makedirs ( directory ) Let's now use the function and create a folder named \"test_directory\" ensure_dir ( \". \\\\ test_directory\" ) If we test for the existence of the directory, the exists function will now return True os . path . exists ( \". \\\\ test_directory\" ) True","tags":"Python","url":"redoules.github.io/python/ensure_dir.html","loc":"redoules.github.io/python/ensure_dir.html"},{"title":"List all files in a directory","text":"Listing all the files in a directory Let's start with the basics, the most staigthforward way to list all the files in a direcoty is to use a combinaison of the listdir function and isfile form os.path. You can use a list comprehension to store all the results in a list. mypath = \"./test_directory/\" from os import listdir from os.path import isfile , join [ f for f in listdir ( mypath ) if isfile ( join ( mypath , f ))] [ 'logfile.log' , 'myfile.txt' , 'super_music.mp3' , 'textfile.txt' ] Listing all the files of a certain type in a directory similarly, if you want to filter only a certain kind of file based on its extension you can use the endswith method. In the following example, we will filter all the \"txt\" files contained in the directory [ f for f in listdir ( mypath ) if f . endswith ( '.' + \"txt\" )] [ 'myfile.txt' , 'textfile.txt' ] Listing all the files matching a pattern in a directory The glob module finds all the pathnames matching a specified pattern according to the rules used by the Unix shell. You can use the *, ?, and character ranges expressed with [] wildcards import glob glob . glob ( \"*.txt\" ) [ 'myfile.txt' ] Listing files recusively If you want to list all files recursively you can select all the sub-directories using the \"**\" wildcard import glob glob . glob ( mypath + '/**/*.txt' , recursive = True ) [ './test_directory\\\\myfile.txt' , './test_directory\\\\textfile.txt' , './test_directory\\\\subdir1\\\\file_hidden_in_a_sub_direcotry.txt' ] Using a regular expression If you'd rather use a regular expression to select the files, the pathlib library provides the rglob function. from pathlib import Path list ( Path ( \"./test_directory/\" ) . rglob ( \"*.[tT][xX][tT]\" )) [ WindowsPath ( 'test_directory/myfile.txt' ), WindowsPath ( 'test_directory/textfile.txt' ), WindowsPath ( 'test_directory/subdir1/file_hidden_in_a_sub_direcotry.txt' )] Using regular expressions you can for example select multiple types of files. In the following example, we list all the files that finish either with \"txt\" or with \"log\". list ( Path ( \"./test_directory/\" ) . rglob ( \"*.[tl][xo][tg]\" )) [ WindowsPath ( 'test_directory/logfile.log' ), WindowsPath ( 'test_directory/myfile.txt' ), WindowsPath ( 'test_directory/textfile.txt' ), WindowsPath ( 'test_directory/subdir1/file_hidden_in_a_sub_direcotry.txt' )]","tags":"Python","url":"redoules.github.io/python/list_files_directory.html","loc":"redoules.github.io/python/list_files_directory.html"},{"title":"Using Dask on infiniband","text":"InfiniBand (abbreviated IB) is a computer-networking communications standard used in high-performance computing that features very high throughput and very low latency. It is used for data interconnect both among and within computers. InfiniBand is also used as either a direct or switched interconnect between servers and storage systems, as well as an interconnect between storage systems. (source Wikipedia). If you want to leverage this high speed network instead of the regular ethernet network, you have to specify to the scheduler that you want to used infiniband as your interface. Assuming that you Infiniband interface is ib0 , you would call the scheduler like this : dask - scheduler --interface ib0 --scheduler-file ./cluster.yaml you would have to call the worker using the same interface : dask - worker --interface ib0 --scheduler-file ./cluster.yaml","tags":"Python","url":"redoules.github.io/python/dask_infiniband.html","loc":"redoules.github.io/python/dask_infiniband.html"},{"title":"Clearing the current cell in the notebook","text":"In python, you can clear the output of a cell by importing the IPython.display module and using the clear_output function from IPython.display import clear_output print ( \"text to be cleared\" ) clear_output () As you can see, the text \"text to be cleared\" is not displayed because the function clear_output has been called afterward","tags":"Jupyter","url":"redoules.github.io/jupyter/clear_cell.html","loc":"redoules.github.io/jupyter/clear_cell.html"},{"title":"What's inside my .bashrc ?","text":"############ # Anaconda # ############ export PATH = \"/station/guillaume/anaconda3/bin: $PATH \" alias python = '/station/guillaume/anaconda3/bin/python' ######### # Alias # ######### alias ding = 'echo -e \"\\a\"' alias calc = 'python -ic \"from __future__ import division; from math import *\"' alias h = \"history|grep \" alias f = \"find . |grep \" alias p = \"ps aux |grep \" alias cdl = \"cd /data/guillaume\" alias cp = \"rsync -avz --progress\" alias grep = \"grep --color=auto\" alias ls = \"ls -hN --color=auto --group-directories-first\" alias ll = \"ls -hal\" alias sv = \"ssh compute_cluster\" alias ms = \"ls\" alias jl = \"jupyter lab\" alias lst = \"jupyter notebook list\" ########################## # bashrc personnalisation# ########################## force_color_prompt = yes export EDITOR = nano export BROWSER = \"firefox '%s' &\" if [ -n \" $force_color_prompt \" ] ; then if [ -x /usr/bin/tput ] && tput setaf 1 > & /dev/null ; then # We have color support; assume it's compliant with Ecma-48 # (ISO/IEC-6429). (Lack of such support is extremely rare, and such # a case would tend to support setf rather than setaf.) color_prompt = yes else color_prompt = fi fi if [ \" $color_prompt \" = yes ] ; then #\\h : hostname #\\u : user #\\w : current working directory #\\d : date #\\t : time yellow = 226 green = 83 pink = 198 blue = 34 PS1 = \"\\[\\033[38;5;22m\\]\\u\\[ $( tput sgr0 ) \\]\\[\\033[38;5;163m\\]@\\[ $( tput sgr0 ) \\]\\[\\033[38;5;22m\\]\\h\\[ $( tput sgr0 ) \\]\\[\\033[38;5;162m\\]:\\[ $( tput sgr0 ) \\]\\[\\033[38;5;172m\\]{\\[ $( tput sgr0 ) \\]\\[\\033[38;5;39m\\]\\w\\[ $( tput sgr0 ) \\]\\[\\033[38;5;172m\\]}\\[ $( tput sgr0 ) \\]\\[\\033[38;5;162m\\]>\\[ $( tput sgr0 ) \\]\"","tags":"Linux","url":"redoules.github.io/linux/bashrc.html","loc":"redoules.github.io/linux/bashrc.html"},{"title":"Efficient extraction of eigenvalues from a list of tensors","text":"When you manipulate FEM results you generally have either a: * scalar field, * vector field, * tensor field. With tensorial results, it is often useful to extract the eigenvalues in order to find the principal values. I have found that it is easier to store the components of the tensors in a 6 column pandas dataframe (because of the symmetric property of stress and strain tensors) import pandas as pd node = [ 1001 , 1002 , 1003 , 1004 ] #when dealing with FEM results you should remember at which element/node the result is computed (in the example, let's assume that we look at node from 1001 to 1004) tensor1 = [ 1 , 1 , 1 , 0 , 0 , 0 ] #eigen : 1 tensor2 = [ 4 , - 1 , 0 , 2 , 2 , 1 ] #eigen : 5.58443, -1.77931, -0.805118 tensor3 = [ 1 , 6 , 5 , 3 , 3 , 1 ] #eigen : 8.85036, 4.46542, -1.31577 tensor4 = [ 1 , 2 , 3 , 0 , 0 , 0 ] #eigen : 1, 2, 3 df = pd . DataFrame ([ tensor1 , tensor2 , tensor3 , tensor4 ], columns = [ \"XX\" , \"YY\" , \"ZZ\" , \"XY\" , \"XZ\" , \"YZ\" ]) df . index = node df .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } XX YY ZZ XY XZ YZ 1001 1 1 1 0 0 0 1002 4 -1 0 2 2 1 1003 1 6 5 3 3 1 1004 1 2 3 0 0 0 If you want to extract the eigenvalues of a tensor with numpy you have to pass a n by n ndarray to the eigenvalue function. In order to avoid having to loop over each node, this oneliner is highly optimized and will help you invert a large number of tensors efficiently. The steps are basically, create a list of n by n values (here n=3) in the right order => reshape it to a list of tensors => pass it to the eigenvals function import numpy as np from numpy import linalg as LA eigenvals = LA . eigvals ( df [[ \"XX\" , \"XY\" , \"XZ\" , \"XY\" , \"YY\" , \"YZ\" , \"XZ\" , \"YZ\" , \"ZZ\" ]] . values . reshape ( len ( df ), 3 , 3 )) eigenvals array ([[ 1 . , 1 . , 1 . ], [ 5 . 58442834 , - 0 . 80511809 , - 1 . 77931025 ], [ - 1 . 31577211 , 8 . 85035616 , 4 . 46541595 ], [ 1 . , 2 . , 3 . ]])","tags":"Python","url":"redoules.github.io/python/Efficient_extraction_of_eigenvalues_from_a_list_of_tensors.html","loc":"redoules.github.io/python/Efficient_extraction_of_eigenvalues_from_a_list_of_tensors.html"},{"title":"Optimized numpy random number generation on Intel CPU","text":"Python Intel distribution Make sure you have a python intel distribution. When you startup python you should see somethine like : Python 3.6.2 |Intel Corporation| (default, Aug 15 2017, 11:34:02) [MSC v.1900 64 bit (AMD64)] Type 'copyright', 'credits' or 'license' for more information IPython 6.1.0 -- An enhanced Interactive Python. Type '?' for help. If not, you can force the installation of the intel optimized python with : conda update --all conda config --add channels intel conda install numpy --channel intel --override-channels oh and by the way, make sure you a running an Intel CPU ;) Comparing numpy.random with numpy.random_intel Let's now test both the rand function with and without the Intel optimization import numpy as np from numpy import random , random_intel % timeit np . random . rand ( 10 ** 5 ) 1 . 06 ms ± 91 . 5 µ s per loop ( mean ± std . dev . of 7 runs , 1000 loops each ) % timeit np . random_intel . rand ( 10 ** 5 ) 225 µ s ± 3 . 46 µ s per loop ( mean ± std . dev . of 7 runs , 1000 loops each )","tags":"Python","url":"redoules.github.io/python/Optimized_numpy_random_intel.html","loc":"redoules.github.io/python/Optimized_numpy_random_intel.html"},{"title":"How to check Linux process information?","text":"How to check Linux process information (CPU usage, memory, user information, etc.)? You need to use the ps command combined with the grep command. In the example, we want to check the information on the nginx process : ps aux | grep nginx It would return the output : root 9976 0.0 0.0 12272 108 ? S This domain is established to be used for illustrative examples in documents. You may use this \\n domain in examples without prior coordination or asking for permission. mtu 1500 group default qlen 1 link/ether 5c:51:4f:41:7a:b1 inet 169.254.33.33/16 brd 169.254.255.255 scope global dynamic valid_lft forever preferred_lft forever inet6 fe80::390a:f69e:1ba2:2121/64 scope global dynamic valid_lft forever preferred_lft forever 3: eth1: Example Domain
\\n