{"pages":[{"title":"About Guillaume Redoulès","text":"I am a data scientist and a mechanical engineer working on numerical methods for stress computations in the field of rocket propulsion. Prior to that, I've got a MSc in Computational Fluid Dynamics and aerodynamics from Imperial College London. Email: guillaume.redoules@gadz.org Linkedin: Guillaume Redoulès Curriculum Vitae Experience Thermomecanical method and tools engineer , Ariane Group , 2015 - Present In charge of tools and methods related to thermomecanical computations. Focal point for machine learning. Education MSc Advanced Computational Methods for Aeronautics, Flow Management and Fluid-Structure Interaction , Imperial College London, London. 2013 Dissertation: \"Estimator design for fluid flows\" Fields: Aeronautics, aerodynamics, computational fluid dynamics, numerical methods Arts et Métiers Paristech , France, 2011 Generalist engineering degree Fields: Mechanics, electrical engineering, casting, machining, project management, finance, IT, etc.","tags":"pages","url":"redoules.github.io/pages/about.html","loc":"redoules.github.io/pages/about.html"},{"title":"Running multiple calls to a function un parallel with Dask distributed","text":"Running multiple calls to a function un parallel with Dask distributed Dask.distributed is a lightweight library for distributed computing in Python. It allows to create a compute graph. Dask distributed is architectured around 3 parts : the dask-scheduler the dask-worker(s) the dask client Dask architecture The Dask scheduler is a centrally managed, distributed, dynamic task scheduler. It recieves tasks from a/multiple client(s) and spread them across one or multiple dask-worker(s). Dask-scheduler is an event based asynchronous dynamic scheduler, meaning that mutliple clients can submit a list of task to be executed on multiple workers. Internally, the task are represented as a directed acyclic graph. Both new clients and new workers can be connected or disconnected during the execution of the task graph. Tasks can be submited with the function client . submit ( function , * args , ** kwargs ) or by using objects from the dask library such as dask.dataframe, dask.bag or dask.array Setup In this example, we will use a distributed scheduler on a single machine with multiple workers and a single client. We will use the client to submit some tasks to the scheduler. The scheduler will then dispatch those tasks to the workers. The process can be monitored in real time through a web application. For this example, all the computations will be run on a local computer. However dask can scale to a large HPC cluster. First we have to launch the dask-scheduler; from the command line, input dask-scheduler Next, you can load the web dashboard. In order to do so, the scheduler returns the number of the port you have to connect to in the line starting with \"bokeh at :\". The default port is 8787. Since we are running all the programs on the same computer, we just have to login to http://127.0.0.1:8787/status Finally, we have to launch the dask-worker(s). If you want to run the worker(s) on the same computer as the scheduler the type : dask-worker 127 .0.0.1:8786 otherwise, make sure you are inputing the ip address of the computer hosting the dask-scheduler. You can launch as many workers as you want. In this example, we will run 3 workers on the local machine. Use the dask workers within your python code We will now see how to submit multiple calls to a fucntion in parallel on the dask-workers. Import the required libraries and define the function to be executed. import numpy as np import pandas as pd from distributed import Client #function used to do parallel computing on def compute_pi_MonteCarlo ( Nb_Data ): \"\"\" computes the value of pi using the monte carlo method \"\"\" Radius = 1 Nb_Data = int ( round ( Nb_Data )) x = np . random . uniform ( - Radius , Radius , Nb_Data ) y = np . random . uniform ( - Radius , Radius , Nb_Data ) pi_mc = 4 * np . sum ( np . power ( x , 2 ) + np . power ( y , 2 ) < Radius ** 2 ) / Nb_Data err = 100 * np . abs ( pi_mc - np . pi ) / np . pi return [ Nb_Data , pi_mc , err ] In order to connect to the scheduler, we create a client. client = Client ( '127.0.0.1:8786' ) client Client Scheduler: tcp://127.0.0.1:8786 Dashboard: http://127.0.0.1:8787/status Cluster Workers: 3 Cores: 12 Memory: 25.48 GB We submit tasks using the submit method data = [ client . submit ( compute_pi_MonteCarlo , Nb_Data ) for Nb_Data in np . logspace ( 3 , 7 , num = 1200 , dtype = int )] If you look at http://127.0.0.1:8787/status you will see the tasks beeing completed. Once competed, gather the data: data = client . gather ( data ) df = pd . DataFrame ( data ) df . columns = [ \"number of points for MonteCarlo\" , \"value of pi\" , \"error (%)\" ] df . tail () .dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; } number of points for MonteCarlo value of pi error (%) 1195 9697405 3.141296 0.009454 1196 9772184 3.141058 0.017008 1197 9847540 3.141616 0.000739 1198 9923477 3.141009 0.018574 1199 10000000 3.141032 0.017833 There, we have completed a simple example on how to use dask to run multiple functions in parallel. Full source code: import numpy as np import pandas as pd from distributed import Client #function used to do parallel computing on def compute_pi_MonteCarlo ( Nb_Data ): \"\"\" computes the value of pi using the monte carlo method \"\"\" Radius = 1 Nb_Data = int ( round ( Nb_Data )) x = np . random . uniform ( - Radius , Radius , Nb_Data ) y = np . random . uniform ( - Radius , Radius , Nb_Data ) pi_mc = 4 * np . sum ( np . power ( x , 2 ) + np . power ( y , 2 ) < Radius ** 2 ) / Nb_Data err = 100 * np . abs ( pi_mc - np . pi ) / np . pi return [ Nb_Data , pi_mc , err ] #connect to the scheduler client = Client ( '127.0.0.1:8786' ) #submit tasks data = [ client . submit ( compute_pi_MonteCarlo , Nb_Data ) for Nb_Data in np . logspace ( 3 , 7 , num = 1200 , dtype = int )] #gather the results data = client . gather ( data ) df = pd . DataFrame ( data ) df . columns = [ \"number of points for MonteCarlo\" , \"value of pi\" , \"error (%)\" ] df . tail () A word on the environement variables On Windows, to make sure that you can run dask-scheduler and dask-worker from the command line, you have to add the location of the executable to your path. On linux, you can append the location of the dask-worker and scheduler to the path variable with the command export PATH = $PATH :/path/to/dask","tags":"Python","url":"redoules.github.io/python/dask_distributed_parallelism.html","loc":"redoules.github.io/python/dask_distributed_parallelism.html"},{"title":"Plotting data using log axis","text":"Plotting in log axis with matplotlib import matplotlib.pyplot as plt % matplotlib inline import numpy as np x = np . linspace ( 0.1 , 20 ) y = 20 * np . exp ( - x / 10.0 ) Plotting using the standard function then specifying the axis scale One of the easiest way to plot in a log plot is to specify the plot normally and then specify which axis is to be plotted with a log scale. This can be specified by the function set_xscale or set_yscale # Normal plot fig = plt . figure () ax = fig . add_subplot ( 1 , 1 , 1 ) ax . plot ( x , y ) ax . grid () plt . show () # Log x axis plot fig = plt . figure () ax = fig . add_subplot ( 1 , 1 , 1 ) ax . plot ( x , y ) ax . set_xscale ( 'log' ) ax . grid () plt . show () # Log x axis plot fig = plt . figure () ax = fig . add_subplot ( 1 , 1 , 1 ) ax . plot ( x , y ) ax . set_yscale ( 'log' ) ax . grid () plt . show () # Log x axis plot fig = plt . figure () ax = fig . add_subplot ( 1 , 1 , 1 ) ax . plot ( x , y ) ax . set_xscale ( 'log' ) ax . set_yscale ( 'log' ) ax . grid () plt . show () Plotting using the matplotlib defined function Matplotlib has the function : semilogx, semilogy and loglog that can help you avoid having to specify the axis scale. # Plot using semilogx fig = plt . figure () ax = fig . add_subplot ( 1 , 1 , 1 ) ax . semilogx ( x , y ) ax . grid () plt . show () # Plot using semilogy fig = plt . figure () ax = fig . add_subplot ( 1 , 1 , 1 ) ax . semilogy ( x , y ) ax . grid () plt . show () # Plot using loglog fig = plt . figure () ax = fig . add_subplot ( 1 , 1 , 1 ) ax . loglog ( x , y ) ax . grid () plt . show ()","tags":"Python","url":"redoules.github.io/python/logplot.html","loc":"redoules.github.io/python/logplot.html"},{"title":"Downloading a static webpage with python","text":"Downloading a static webpage with python If you are using python legacy (aka python 2) first of all, stop ! Furthermore, this method won't work in python legacy # Import modules from urllib.request import urlopen The webpage source code can be downloaded with the command urlopen url = \"http://example.com/\" #create a HTTP request in order to read the page page = urlopen ( url ) . read () The source code will be stored in the variable page as a string print ( page ) b' <!doctype html> \\n <html> \\n <head> \\n <title> Example Domain </title> \\n\\n <meta charset= \"utf-8\" /> \\n <meta http-equiv= \"Content-type\" content= \"text/html; charset=utf-8\" /> \\n <meta name= \"viewport\" content= \"width=device-width, initial-scale=1\" /> \\n <style type= \"text/css\" > \\n body {\\n background-color: #f0f0f2;\\n margin: 0;\\n padding: 0;\\n font-family: \"Open Sans\", \"Helvetica Neue\", Helvetica, Arial, sans-serif;\\n \\n }\\n div {\\n width: 600px;\\n margin: 5em auto;\\n padding: 50px;\\n background-color: #fff;\\n border-radius: 1em;\\n }\\n a:link, a:visited {\\n color: #38488f;\\n text-decoration: none;\\n }\\n @media (max-width: 700px) {\\n body {\\n background-color: #fff;\\n }\\n div {\\n width: auto;\\n margin: 0 auto;\\n border-radius: 0;\\n padding: 1em;\\n }\\n }\\n </style> \\n </head> \\n\\n <body> \\n <div> \\n <h1> Example Domain </h1> \\n <p> This domain is established to be used for illustrative examples in documents. You may use this\\n domain in examples without prior coordination or asking for permission. </p> \\n <p><a href= \"http://www.iana.org/domains/example\" > More information... </a></p> \\n </div> \\n </body> \\n </html> \\n' Additionally, you can beautifulsoup in order to make it easier to work with html from bs4 import BeautifulSoup soup = BeautifulSoup ( page , 'lxml' ) soup . prettify () print ( soup ) <!DOCTYPE html> < html > < head > < title > Example Domain </ title > < meta charset = \"utf-8\" /> < meta content = \"text/html; charset=utf-8\" http-equiv = \"Content-type\" /> < meta content = \"width=device-width, initial-scale=1\" name = \"viewport\" /> < style type = \"text/css\" > body { background-color : #f0f0f2 ; margin : 0 ; padding : 0 ; font-family : \"Open Sans\" , \"Helvetica Neue\" , Helvetica , Arial , sans-serif ; } div { width : 600 px ; margin : 5 em auto ; padding : 50 px ; background-color : #fff ; border-radius : 1 em ; } a : link , a : visited { color : #38488f ; text-decoration : none ; } @ media ( max-width : 700px ) { body { background-color : #fff ; } div { width : auto ; margin : 0 auto ; border-radius : 0 ; padding : 1 em ; } } </ style > </ head > < body > < div > < h1 > Example Domain </ h1 > < p > This domain is established to be used for illustrative examples in documents. You may use this domain in examples without prior coordination or asking for permission. </ p > < p >< a href = \"http://www.iana.org/domains/example\" > More information... </ a ></ p > </ div > </ body > </ html >","tags":"Python","url":"redoules.github.io/python/download_page.html","loc":"redoules.github.io/python/download_page.html"},{"title":"Getting stock market data","text":"Getting stock market data with pandas Start by importing the packages. We will need pandas and the pandas_datareader. # Import modules import pandas as pd from pandas_datareader import data Datareader allows you to import data from the internet. I have found that Quandl and robinhood works the best as a source for stockmarket data. Note that if you want an other type of data (e.g. GDP, inflation, etc.) other sources exist. #import stock from robinhood aapl_robinhood = data . DataReader ( 'AAPL' , 'robinhood' , '1980-01-01' ) aapl_robinhood . head () .dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; } close_price high_price interpolated low_price open_price session volume symbol begins_at AAPL 2017-08-04 153.996200 154.990700 False 153.306900 153.681100 reg 20559852 2017-08-07 156.379100 156.487400 False 154.272000 154.655900 reg 21870321 2017-08-08 157.629700 159.352900 False 155.847400 156.172300 reg 36205896 2017-08-09 158.594700 158.801500 False 156.674500 156.822200 reg 26131530 2017-08-10 153.543100 158.169600 False 152.861000 158.070700 reg 40804273 #import stock from quandl aapl_quandl = data . DataReader ( 'AAPL' , 'quandl' , '1980-01-01' ) aapl_quandl . head () .dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; } Open High Low Close Volume ExDividend SplitRatio AdjOpen AdjHigh AdjLow AdjClose AdjVolume Date 2018-03-27 173.68 175.15 166.92 168.340 38962839.0 0.0 1.0 173.68 175.15 166.92 168.340 38962839.0 2018-03-26 168.07 173.10 166.44 172.770 36272617.0 0.0 1.0 168.07 173.10 166.44 172.770 36272617.0 2018-03-23 168.39 169.92 164.94 164.940 40248954.0 0.0 1.0 168.39 169.92 164.94 164.940 40248954.0 2018-03-22 170.00 172.68 168.60 168.845 41051076.0 0.0 1.0 170.00 172.68 168.60 168.845 41051076.0 2018-03-21 175.04 175.09 171.26 171.270 35247358.0 0.0 1.0 175.04 175.09 171.26 171.270 35247358.0","tags":"Python","url":"redoules.github.io/python/stock_pandas.html","loc":"redoules.github.io/python/stock_pandas.html"},{"title":"Moving average with pandas","text":"Moving average with pandas # Import modules import pandas as pd from pandas_datareader import data , wb #import packages from pandas_datareader import data aapl = data . DataReader ( 'AAPL' , 'quandl' , '1980-01-01' ) aapl . head () .dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; } Open High Low Close Volume ExDividend SplitRatio AdjOpen AdjHigh AdjLow AdjClose AdjVolume Date 2018-03-27 173.68 175.15 166.92 168.340 38962839.0 0.0 1.0 173.68 175.15 166.92 168.340 38962839.0 2018-03-26 168.07 173.10 166.44 172.770 36272617.0 0.0 1.0 168.07 173.10 166.44 172.770 36272617.0 2018-03-23 168.39 169.92 164.94 164.940 40248954.0 0.0 1.0 168.39 169.92 164.94 164.940 40248954.0 2018-03-22 170.00 172.68 168.60 168.845 41051076.0 0.0 1.0 170.00 172.68 168.60 168.845 41051076.0 2018-03-21 175.04 175.09 171.26 171.270 35247358.0 0.0 1.0 175.04 175.09 171.26 171.270 35247358.0 In order to computer the moving average, we will use the rolling function. #120 days moving average moving_averages = aapl [[ \"Open\" , \"High\" , \"Low\" , \"Close\" , \"Volume\" ]] . rolling ( window = 120 ) . mean () moving_averages . tail () .dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; } Open High Low Close Volume Date 1980-12-18 28.457667 28.551917 28.385000 28.385000 139495.000000 1980-12-17 28.410750 28.502917 28.338083 28.338083 141772.500000 1980-12-16 28.362833 28.453917 28.289167 28.289167 141256.666667 1980-12-15 28.335750 28.426833 28.262083 28.262083 144321.666667 1980-12-12 28.310750 28.402833 28.238167 28.238167 159625.000000 % matplotlib inline import matplotlib.pyplot as plt plt . plot ( aapl . index , aapl . Open , label = 'Open price' ) plt . plot ( moving_averages . index , moving_averages . Open , label = \"120 MA Open price\" ) plt . legend () plt . show ()","tags":"Python","url":"redoules.github.io/python/Moving_average_pandas.html","loc":"redoules.github.io/python/Moving_average_pandas.html"},{"title":"Keywords to use with WHERE","text":"Keywords to use with WHERE #load the extension % load_ext sql #connect to the database % sql sqlite : /// mydatabase . db 'Connected: @mydatabase.db' Assignment operator The assignment operator is =. % sql SELECT * FROM tutyfrutty WHERE color = \"red\" * sqlite:///mydatabase.db Done. index fruit color kcal 2 Apple red 52 7 Cranberry red 308 Comparison operators Comparison operation can be done in a SQL querry. They are the following : Equality : = Greater than : > greater than or equal to : >= less than : < less than or equal to : <= not equal to : <>, != not greater than : !> not less than : !< % sql SELECT * FROM tutyfrutty WHERE kcal = 47 * sqlite:///mydatabase.db Done. index fruit color kcal 1 Orange orange 47 % sql SELECT * FROM tutyfrutty WHERE kcal > 47 * sqlite:///mydatabase.db Done. index fruit color kcal 0 Banana yellow 89 2 Apple red 52 7 Cranberry red 308 % sql SELECT * FROM tutyfrutty WHERE kcal >= 47 * sqlite:///mydatabase.db Done. index fruit color kcal 0 Banana yellow 89 1 Orange orange 47 2 Apple red 52 7 Cranberry red 308 % sql SELECT * FROM tutyfrutty WHERE kcal < 47 * sqlite:///mydatabase.db Done. index fruit color kcal 3 lemon yellow 15 4 lime green 30 5 plum purple 28 % sql SELECT * FROM tutyfrutty WHERE kcal <= 47 * sqlite:///mydatabase.db Done. index fruit color kcal 1 Orange orange 47 3 lemon yellow 15 4 lime green 30 5 plum purple 28 % sql SELECT * FROM tutyfrutty WHERE kcal <> 47 * sqlite:///mydatabase.db Done. index fruit color kcal 0 Banana yellow 89 2 Apple red 52 3 lemon yellow 15 4 lime green 30 5 plum purple 28 7 Cranberry red 308 Logical operators Logical operators test a condition and return a boolean. The logicial operators in SQL are : ALL : true if all the condtions are true AND : true is both conditions are true ANY : true if any one of the conditions are true BETWEEN : true if the operand in withing a range of values EXISTS : true if the subquery contains any rows IN : true if the condition is present in a row LIKE : true if a pattern is matched NOT : True if the operand is false, false otherwise OR : True is either condition is true SOME : true is any of the conditions is true % sql SELECT * FROM tutyfrutty WHERE color = \"yellow\" AND kcal < 100 * sqlite:///mydatabase.db Done. index fruit color kcal 0 Banana yellow 89 3 lemon yellow 15 % sql SELECT * FROM tutyfrutty WHERE color = \"yellow\" OR kcal > 300 * sqlite:///mydatabase.db Done. index fruit color kcal 0 Banana yellow 89 3 lemon yellow 15 7 Cranberry red 308 % sql SELECT * FROM tutyfrutty WHERE fruit LIKE 'l%' * sqlite:///mydatabase.db Done. index fruit color kcal 3 lemon yellow 15 4 lime green 30 % sql SELECT * FROM tutyfrutty WHERE NOT color = \"yellow\" * sqlite:///mydatabase.db Done. index fruit color kcal 1 Orange orange 47 2 Apple red 52 4 lime green 30 5 plum purple 28 7 Cranberry red 308 % sql SELECT * FROM tutyfrutty WHERE kcal BETWEEN 40 AND 100 * sqlite:///mydatabase.db Done. index fruit color kcal 0 Banana yellow 89 1 Orange orange 47 2 Apple red 52 Bitwise operators Some bitwise operators exist in SQL. They will not be demonstrated here. They are the following : AND : & OR : | XOR : &#94; NOT : ~","tags":"SQL","url":"redoules.github.io/sql/WHERE_SQL_keywords.html","loc":"redoules.github.io/sql/WHERE_SQL_keywords.html"},{"title":"Sorting results","text":"Sorting results in SQL Sorting results can be achieved by using a modifier command at the end of the SQL querry #load the extension % load_ext sql #connect to the database % sql sqlite : /// mydatabase . db 'Connected: @mydatabase.db' The results can be sorted with the command ORDER BY SELECT column-list FROM table_name [WHERE condition] [ORDER BY column1, column2, .. columnN] [ASC | DESC] Let's show an example where we extract the fruits that are either yellow or red % sql SELECT * FROM tutyfrutty WHERE color = \"yellow\" OR color = \"red\" * sqlite:///mydatabase.db Done. index fruit color kcal 0 Banana yellow 89 2 Apple red 52 3 lemon yellow 15 7 Cranberry red 308 Ascending sort % sql SELECT * FROM tutyfrutty WHERE color = \"yellow\" OR color = \"red\" ORDER BY kcal ASC * sqlite:///mydatabase.db Done. index fruit color kcal 3 lemon yellow 15 2 Apple red 52 0 Banana yellow 89 7 Cranberry red 308 descending sort % sql SELECT * FROM tutyfrutty WHERE color = \"yellow\" OR color = \"red\" ORDER BY kcal DESC * sqlite:///mydatabase.db Done. index fruit color kcal 7 Cranberry red 308 0 Banana yellow 89 2 Apple red 52 3 lemon yellow 15 Sort by multiple columns You can sort by more than one column. Just specify multiple columns in the ORDER BY keyword. In the example, we will sort alphabetically on the color column first and sort alphabetically on the fruit column % sql SELECT * FROM tutyfrutty ORDER BY color , fruit ASC * sqlite:///mydatabase.db Done. index fruit color kcal 4 lime green 30 1 Orange orange 47 5 plum purple 28 2 Apple red 52 7 Cranberry red 308 0 Banana yellow 89 3 lemon yellow 15","tags":"SQL","url":"redoules.github.io/sql/Sorting_results.html","loc":"redoules.github.io/sql/Sorting_results.html"},{"title":"Filter content of a TABLE","text":"Filter content of a TABLE in SQL In this example, we will display the content of a table but we will filter out the results. Since we are working in the notebook, we will load the sql extension in order to manipulate the database. The database mydatabase.db is a SQLite database already created before the example. #load the extension % load_ext sql #connect to the database % sql sqlite : /// mydatabase . db 'Connected: @mydatabase.db' Filter content matching exactly a condition We want to extract all the entries in a dataframe that match a certain condition, in order to do so, we will use the following command : SELECT * FROM TABLE WHERE column=\"condition\" In our example, we will filter all the entries in the tutyfrutty table whose color is yellow % sql SELECT * FROM tutyfrutty WHERE color = \"yellow\" * sqlite:///mydatabase.db Done. index fruit color kcal 0 Banana yellow 89 3 lemon yellow 15 Complex conditions You can build more complex conditions by using the keywords OR and AND In the following example, we will filter all entries that are either yellow or red % sql SELECT * FROM tutyfrutty WHERE color = \"yellow\" OR color = \"red\" * sqlite:///mydatabase.db Done. index fruit color kcal 0 Banana yellow 89 2 Apple red 52 3 lemon yellow 15 7 Cranberry red 308 Note : when combining multiple conditions with AND and OR, be careful to use parentesis where needed Conditions matching a pattern You can also use the LIKE keyword in order to find all entries that match a certain pattern. In our example, we want to find all fruits begining with a \"l\". In order to do so, we will use the LIKE keyword and the wildcard \"%\" meaning any string % sql SELECT * FROM tutyfrutty WHERE fruit LIKE \"l%\" * sqlite:///mydatabase.db Done. index fruit color kcal 3 lemon yellow 15 4 lime green 30 Numerical conditions When we are working with numerical data, we can use the GREATER THAN > and SMALLER THAN < operators % sql SELECT * FROM tutyfrutty WHERE kcal < 47 * sqlite:///mydatabase.db Done. index fruit color kcal 3 lemon yellow 15 4 lime green 30 5 plum purple 28 If we want the condition to be inclusive we can use the operator <= (alternatively >=) % sql SELECT * FROM tutyfrutty WHERE kcal <= 47 * sqlite:///mydatabase.db Done. index fruit color kcal 1 Orange orange 47 3 lemon yellow 15 4 lime green 30 5 plum purple 28","tags":"SQL","url":"redoules.github.io/sql/display_table_filter.html","loc":"redoules.github.io/sql/display_table_filter.html"},{"title":"Displaying the content of a TABLE","text":"Displaying the content of a TABLE in SQL In this very simple example we will see how to display the content of a table. Since we are working in the notebook, we will load the sql extension in order to manipulate the database. The database mydatabase.db is a SQLite database already created before the example. #load the extension % load_ext sql #connect to the database % sql sqlite : /// mydatabase . db 'Connected: @mydatabase.db' In order to extract all the values from a table, we will use the following command : SELECT * FROM TABLE In our example, we want to display the data contained in the table named tutyfrutty % sql SELECT * FROM tutyfrutty * sqlite:///mydatabase.db Done. index fruit color kcal 0 Banana yellow 89 1 Orange orange 47 2 Apple red 52 3 lemon yellow 15 4 lime green 30 5 plum purple 28 7 Cranberry red 308","tags":"SQL","url":"redoules.github.io/sql/display_table.html","loc":"redoules.github.io/sql/display_table.html"},{"title":"Opening a file with python","text":"Opening a file with python This short article show you how to open a file using python. We will use the with keyword in order to avoid having to close the file. There is no need to import anything in order to open a file. All the function related to file manipulation are part of the python standard library In order to open a file, we will use the function open. This function takes two arguments : the path of the file the mode you want to open the file The mode can be : 'r' : read 'w' : write 'a' : append (writes at the end of the file) 'b' : binary mode 'x' : exclusive creation 't' : text mode (by default) Note that if the file does not exit it will be created if you use the following options \"w\", \"a\", \"x\". If you try to open a non existing file in read mode 'r', a FileNotFoundError will be returned. It is possible to combine multiple options together. For instance, you can open a file in binary mode for writing using the 'wb' option. Python distinguishes between binary and text I/O. Files opened in binary mode return contents as bytes objects without any decoding. In text mode , the contents of the file are returned as str, the bytes having been first decoded using a platform-dependent encoding or using the specified encoding if given. Writing to a file Let's first open (create) a text file a write a string to it. filepath = \". \\\\ myfile.txt\" with open ( filepath , 'w' ) as f : f . write ( \"Hello world !\" ) Reading a file we can now see how to read the content of a file. To do so, we will use the 'r' option with open ( filepath , \"r\" ) as f : content = f . read () print ( content ) Hello world ! A word on the with keyword In python the with keyword is used when working with unmanaged resources (like file streams). The python documentation tells us that : The with statement clarifies code that previously would use try...finally blocks to ensure that clean-up code is executed. In this section, I'll discuss the statement as it will commonly be used. In the next section, I'll examine the implementation details and show how to write objects for use with this statement. The with statement is a control-flow structure whose basic structure is: with expression [ as variable ]: with - block The expression is evaluated, and it should result in an object that supports the context management protocol (that is, has enter () and exit () methods).","tags":"Python","url":"redoules.github.io/python/Opening_file.html","loc":"redoules.github.io/python/Opening_file.html"},{"title":"Opening a SQLite database with python","text":"Opening a SQLite database with python This short article show you how to connect to a SQLite database using python. We will use the with keyword in order to avoid having to close the database. In order to connect to the database, we will have to import sqlite3 import sqlite3 from sqlite3 import Error In python the with keyword is used when working with unmanaged resources (like file streams). The python documentation tells us that : The with statement clarifies code that previously would use try...finally blocks to ensure that clean-up code is executed. In this section, I'll discuss the statement as it will commonly be used. In the next section, I'll examine the implementation details and show how to write objects for use with this statement. The with statement is a control-flow structure whose basic structure is: with expression [ as variable ]: with - block The expression is evaluated, and it should result in an object that supports the context management protocol (that is, has enter () and exit () methods). db_file = \". \\\\ mydatabase.db\" try : with sqlite3 . connect ( db_file ) as conn : print ( \"Connected to the database\" ) #your code here except Error as e : print ( e ) Connected to the database","tags":"Python","url":"redoules.github.io/python/Opening_SQLite_database.html","loc":"redoules.github.io/python/Opening_SQLite_database.html"},{"title":"Reading data from a sql database with pandas","text":"Reading data from a sql database with pandas When manipulating you data using pandas, it is sometimes useful to pull data from a database. In this tutorial, we will see how to querry a dataframe from a sqlite table. Note than it would also work with any other sql database a long as you change the connxion to the one that suits your needs. First let's import pandas and sqlite3 import pandas as pd import sqlite3 from sqlite3 import Error We want to store the table tutyfrutty in our dataframe. To do so, we will query all the elements present in the tutyfrutty TABLE with the command : SELECT * FROM tutyfrutty db_file = \". \\\\ mydatabase.db\" try : with sqlite3 . connect ( db_file ) as conn : df = pd . read_sql ( \"SELECT * FROM tutyfrutty\" , conn ) del df [ \"index\" ] #juste delete the index column that was stored in the table except Error as e : print ( e ) df .dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; } fruit color kcal 0 Banana yellow 89 1 Orange orange 47 2 Apple red 52 3 lemon yellow 15 4 lime green 30 5 plum purple 28 6 Cranberry red 308 7 Cranberry red 308","tags":"Python","url":"redoules.github.io/python/Reading_data_from_a_sql_database_with_pandas.html","loc":"redoules.github.io/python/Reading_data_from_a_sql_database_with_pandas.html"},{"title":"Writing data to a sql database with pandas","text":"Writing data to a sql database with pandas When manipulating you data using pandas, it is sometimes useful to store a dataframe. Pandas provides multiple ways to export dataframes. The most common consist in exporting to a csv, a pickle, to hdf or to excel. However, exporting to a sql database can prove very useful. Indeed, having a well structured database is a great for storing all the data related to your analysis in one place. In this tutorial, we will see how to store a dataframe in a new table of a sqlite dataframe. Note than it would also work with any other sql database a long as you change the connxion to the one that suits your needs. First let's import pandas and sqlite3 import pandas as pd import sqlite3 from sqlite3 import Error # Example dataframe raw_data = { 'fruit' : [ 'Banana' , 'Orange' , 'Apple' , 'lemon' , \"lime\" , \"plum\" ], 'color' : [ 'yellow' , 'orange' , 'red' , 'yellow' , \"green\" , \"purple\" ], 'kcal' : [ 89 , 47 , 52 , 15 , 30 , 28 ] } df = pd . DataFrame ( raw_data , columns = [ 'fruit' , 'color' , 'kcal' ]) df .dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; } fruit color kcal 0 Banana yellow 89 1 Orange orange 47 2 Apple red 52 3 lemon yellow 15 4 lime green 30 5 plum purple 28 Now that the DataFrame has been created, let's push it to the sqlite database called mydatabase.db in a new table called tutyfrutty db_file = \". \\\\ mydatabase.db\" try : with sqlite3 . connect ( db_file ) as conn : df . to_sql ( \"tutyfrutty\" , conn ) except Error as e : print ( e ) except ValueError : print ( \"The TABLE tutyfrutty already exists, read below to understand how to handle this case\" ) Note that if the table tutyfrutty was already existing, the to_sql function will return a ValueError. This is where, the if_exists option comes into play. Let's look at the docstring of this function : \"\"\" if_exists : {'fail', 'replace', 'append'}, default 'fail' - fail: If table exists, do nothing. - replace: If table exists, drop it, recreate it, and insert data. - append: If table exists, insert data. Create if does not exist. \"\"\" Let's say, I want to update my dataframe with some new rows df . loc [ len ( df ) + 1 ] = [ 'Cranberry' , 'red' , 308 ] df .dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; } fruit color kcal 0 Banana yellow 89 1 Orange orange 47 2 Apple red 52 3 lemon yellow 15 4 lime green 30 5 plum purple 28 7 Cranberry red 308 8 Cranberry red 308 I can now replace the table with the new values using the \"replace\" option db_file = \". \\\\ mydatabase.db\" try : with sqlite3 . connect ( db_file ) as conn : df . to_sql ( \"tutyfrutty\" , conn , if_exists = \"replace\" ) except Error as e : print ( e )","tags":"Python","url":"redoules.github.io/python/Writing_data_to_a_sql_database_with_pandas.html","loc":"redoules.github.io/python/Writing_data_to_a_sql_database_with_pandas.html"},{"title":"Creating a sqlite database","text":"Creating a sqlite database When you want to start with using databases SQlite is a great tool. It provides an easy onramp to learn and prototype you database with a SQL compatible database. First, let's import the libraries we need import sqlite3 from sqlite3 import Error SQlite doesn't need a database server, however, you have to start by creating an empty database file import os def check_for_db_file (): if os . path . exists ( \"mydatabase.db\" ): print ( \"the database is ready\" ) else : print ( \"no database found\" ) check_for_db_file () no database found Let's then create a function that will connect to a database, print the verison of sqlite and then close the connexion to the database. def create_database ( db_file ): \"\"\" create a database connection to a SQLite database \"\"\" try : with sqlite3 . connect ( db_file ) as conn : print ( \"database created with sqlite3 version {0}\" . format ( sqlite3 . version )) except Error as e : print ( e ) create_database ( \".\\mydatabase.db\" ) database created with sqlite3 version 2.6.0 check_for_db_file () the database is ready You're all set. From now on, you can open the database and write sql querries into it.","tags":"Python","url":"redoules.github.io/python/Creating_a_sqlite_database.html","loc":"redoules.github.io/python/Creating_a_sqlite_database.html"},{"title":"Setting up the notebook for plotting with matplotlib","text":"Importing Matplotlib First we need to import pyplot, a collection of command style functions that make matplotlib work like MATLAB. Let's, as well, use the magic command %matplotlib inline in order to display the figures in the notebook import matplotlib.pyplot as plt % matplotlib inline # this doubles image size, but we'll do it manually below # %config InlineBackend.figure_format = 'retina' The following parameters are recommended for matplotlib, they will make matplotlib output a better quality image # %load snippets/matplot_setup.py plt . rcParams [ 'savefig.dpi' ] = 300 plt . rcParams [ 'figure.dpi' ] = 163 plt . rcParams [ 'figure.autolayout' ] = False plt . rcParams [ 'figure.figsize' ] = 20 , 12 plt . rcParams [ 'axes.labelsize' ] = 18 plt . rcParams [ 'axes.titlesize' ] = 20 plt . rcParams [ 'font.size' ] = 16 plt . rcParams [ 'lines.linewidth' ] = 2.0 plt . rcParams [ 'lines.markersize' ] = 8 plt . rcParams [ 'legend.fontsize' ] = 14 plt . rcParams [ 'text.usetex' ] = False # True activates latex output in fonts! plt . rcParams [ 'font.family' ] = \"serif\" plt . rcParams [ 'font.serif' ] = \"cm\" plt . rcParams [ 'text.latex.preamble' ] = \" \\\\ usepackage{subdepth}, \\\\ usepackage{type1cm}\" You can change the second line in order to fit your display. 163 dpi corresponds to a Dell Ultra HD 4k P2715Q. You can check your screen's dpi count at http://dpi.lv/","tags":"Python","url":"redoules.github.io/python/Setting_up_the_notebook_for_plotting_with_matplotlib.html","loc":"redoules.github.io/python/Setting_up_the_notebook_for_plotting_with_matplotlib.html"},{"title":"Why using a blockchain is a bad idea for your business","text":"What having a blockchain implies? storage costs : everyone maintaining the ledger needs to store every transaction bandwith costs : everyone has to broadcast every transaction computational costs : every node has to validate the blockchain control : the creator does not control the blockchain, everyone collectively controls it developpement costs : developping on a blockchain is way harder than on a traditionnal database What to ask a business when they tell you that they are using a blockchain? When a business is telling you about their innovative technology leveraging the power of the blockchain this should immedialty spake some questions : What is the consensus algorithm? who is responsible for validating the consensus rules? what is the nature of the participation ? is it open to access? is it open to innovation? is it a public ledger? is it transparent? does it improves acountability? is it cross borders? how is it validated?","tags":"Cryptocurrencies","url":"redoules.github.io/cryptocurrencies/blockchain_bad.html","loc":"redoules.github.io/cryptocurrencies/blockchain_bad.html"},{"title":"Synology NFS share","text":"Setting up a NFS share login to your DSM admin account, open the \"Control Panel\" and go to \"File Services\" Make sure NFS is enabled Back in the control panel, go to \"Shared Folder\" Select the folder you want to share and clic \"Edit\" Go to the \"NFS Permissions tab and clic \"Create\", add the ip of the device you want to mount the mapped drive on. Make sure you copy the \"Mount path\"","tags":"Linux","url":"redoules.github.io/linux/share_nfs_share.html","loc":"redoules.github.io/linux/share_nfs_share.html"},{"title":"Mount a NFS share using fstab","text":"Mount nfs using fstab The fstab file, generally located at /etc/fstab lists the differents partitions and where to load them on the filesystem. You can edit this file as root by using the following command sudo nano /etc/fstab in the following example, we want to mount a NFS v3 share from : server : 192.168.1.2 mountpoint (on the server) : /volumeUSB2/usbshare * mountlocation (on the client) : /mnt we specify 192 .168.1.2:/volumeUSB2/usbshare /mnt nfs nfsvers = 3 ,users 0 0 the client will then automatically mount the share ont /mnt at startup. Related you can reload the fstab file using this method : https://redoules.github.io/linux/Reloading_fstab.html You can create a NFS share on a Synology using the method : https://redoules.github.io/linux/share_nfs_share.html","tags":"Linux","url":"redoules.github.io/linux/mount_nfs_share_fstab.html","loc":"redoules.github.io/linux/mount_nfs_share_fstab.html"},{"title":"Installing bitcoind on raspberry pi","text":"Installing bitcoind on linux Running a full bitcoin node helps the bitcoin network to accept, validate and relay transactions. If you want to volunteer some spare computing and bandwidth resources to run a full node and allow Bitcoin to continue to grow you can grab an inexpensive and power efficient raspberry pi and turn it into a full node. There are plenty of tutorials on the Internet explaining how to install a bitcoin full node; this tutorial won't go over setting up a raspberry pi and using ssh. In order to store the full blockchain we will mount a network drive and tell bitcoind to use this mapped drive as the data directory. Download the bitcoin client Go to https://bitcoin.org/en/download Copy the URL for the ARM 32 bit version and download it onto your raspberry pi. wget https://bitcoin.org/bin/bitcoin-core-0.15.1/bitcoin-0.15.1-arm-linux-gnueabihf.tar.gz Locate the downloaded file and extract it using the arguement xzf tar xzf bitcoin-0.15.1-arm-linux-gnueabihf.tar.gz a new directory bitcoin-0.15.1 will be created, it contrains the files we need to install the software Install the bitcoin client We will install the content by copying the binaries located in the bin folder into /usr/local/bin by using the install command. You must use sudo because it will write data to a system directory sudo install -m 0755 -o root -g root -t /usr/local/bin bitcoin-0.15.1/bin/* Launch the bitcoin core client by running bitcoind -daemon Configuration of the node Start your node at boot Starting you node automatically at boot time is a good idea because it doesn't require a manual action from the user. The simplest way to achive this is to create a cronjob. Run the following command crontab -e Select the text editor of your choice, then add the following line at the end of the file @reboot bitcoind -daemon Save the file and exit; the updated crontab file will be installed for you. Full Node If you can afford to download and store all the blockchain, you can run a full node. At the time of writing, the blockchain is 150Go ( https://blockchain.info/fr/charts/blocks-size ). Tree ways to store this are : use a microSD with 256Go or more add a thumbdrive or an external drive to your raspberry pi * mount a network drive from a NAS If you have purchased a big SD card then you can leave the default location for the blockchain data (~/.bitcoin/). Otherwise, you will have to change the datadir location to where your drive is mounted (in my case I have mounted it to /mnt) In order to configure your bitcoin client, edit/create the file bitcoin.conf located in ~/.bitcoin/ nano ~/.bitcoin/bitcoin.conf copy the following text # From redoules.github.io # This config should be placed in following path: # ~/.bitcoin/bitcoin.conf # [core] # Specify a non-default location to store blockchain and other data. datadir=/mnt # Set database cache size in megabytes; machines sync faster with a larger cache. Recommend setting as high as possible based upon mach$ dbcache=100 # Keep at most <n> unconnectable transactions in memory. maxorphantx=10 # Keep the transaction memory pool below <n> megabytess. maxmempool=50 # [network] # Maintain at most N connections to peers. maxconnections=40 # Tries to keep outbound traffic under the given target (in MiB per 24h), 0 = no limit. maxuploadtarget=5000 Check https://jlopp.github.io/bitcoin-core-config-generator it is a handy site to edit the bitcoin.conf file Pruning node If you don't want to store the entire blockchain you can run a pruning node which reduces storage requirements by enabling pruning (deleting) of old blocks. Let's say you want to allocated at most 5Go to the blockchain, then specify prune=5000 into your bitcoin.conf file. Edit/create the file bitcoin.conf located in ~/.bitcoin/ nano ~/.bitcoin/bitcoin.conf copy the following text # From redoules.github.io # This config should be placed in following path: # ~/.bitcoin/bitcoin.conf # [core] # Set database cache size in megabytes; machines sync faster with a larger cache. Recommend setting as high as possible based upon mach$ dbcache=100 # Keep at most <n> unconnectable transactions in memory. maxorphantx=10 # Keep the transaction memory pool below <n> megabytess. maxmempool=50 # Reduce storage requirements by only storing most recent N MiB of block. This mode is incompatible with -txindex and -rescan. WARNING: Reverting this setting requires re-downloading the entire blockchain. (default: 0 = disable pruning blocks, 1 = allow manual pruning via RPC, greater than 550 = automatically prune blocks to stay under target size in MiB). prune=5000 # [network] # Maintain at most N connections to peers. maxconnections=40 # Tries to keep outbound traffic under the given target (in MiB per 24h), 0 = no limit. maxuploadtarget=5000 Checking if your node is public one of the best way to help the bitcoin network is to allow your node to be visible and to propagate block to other nodes. The bitcoin protocole uses port 8333, other clients should be able to share information with your client. Run ifconfig and check if you have an ipv6 adresse (look for adr inet6:) IPV6 Get the global ipv6 adresse of your raspberry pi Link encap:Ethernet HWaddr xx:xx:xx:xx:xx:xx inet adr:192.168.1.x Bcast:192.168.1.255 Masque:255.255.255.0 adr inet6: xxxx::xxxx:xxxx:xxxx:xxxx/64 Scope:Lien adr inet6: xxxx:xxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx/64 Scope:Global UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:42681744 errors:0 dropped:0 overruns:0 frame:0 TX packets:38447218 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 lg file transmission:1000 RX bytes:3044414780 (2.8 GiB) TX bytes:2599878680 (2.4 GiB) it is located between adr inet4 and Scope:Global adr inet6: xxxx:xxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx/64 Scope:Global Copy this adresse and past it into the search field on https://bitnodes.earn.com/ If your node is visible, it will appear on the website IPV4 If you don't have an ipv6 adresse, you will have to open port 8333 on your router and redirect it to the internal IP of your raspberry pi. It is not detailed here because the configuration depends on your router.","tags":"Cryptocurrencies","url":"redoules.github.io/cryptocurrencies/Installing_bitcoind_on_raspberry_pi.html","loc":"redoules.github.io/cryptocurrencies/Installing_bitcoind_on_raspberry_pi.html"},{"title":"Reloading .bashrc","text":"Reload .bashrc The .bashrc file, located at ~/.bashrc allows a user to personalize its bash shell. If you edit this file, the changes won't be loaded without login out and back in. However, you can use the following command to do it source ~/.bashrc","tags":"Linux","url":"redoules.github.io/linux/Reloading_.bashrc.html","loc":"redoules.github.io/linux/Reloading_.bashrc.html"},{"title":"Reloading fstab","text":"Reload fstab The fstab file, generally located at /etc/fstab lists the differents partitions and where to load them on the filesystem. If you edit this file, the changes won't be automounted. You either have to reboot your system of use the following command as root mount -a","tags":"Linux","url":"redoules.github.io/linux/Reloading_fstab.html","loc":"redoules.github.io/linux/Reloading_fstab.html"},{"title":"Updating all python package with anaconda","text":"Updating anaconda packages All packages managed by conda can be updated with the following command : conda update --all Updating other packages with pip For the other packages, the pip package manager can be used. Unfortunately pip hasn't the same update all fonctionnality. import pip from subprocess import call for dist in pip . get_installed_distributions (): print ( \"updating {0}\" . format ( dist )) call ( \"pip install --upgrade \" + dist . project_name , shell = True )","tags":"Python","url":"redoules.github.io/python/updating_all_python_package_with_anaconda.html","loc":"redoules.github.io/python/updating_all_python_package_with_anaconda.html"},{"title":"Saving a matplotlib figure with a high resolution","text":"creating a matplotlib figure #Importing matplotlib % matplotlib inline import matplotlib.pyplot as plt import numpy as np Drawing a figure # Fixing random state for reproducibility np . random . seed ( 19680801 ) mu , sigma = 100 , 15 x = mu + sigma * np . random . randn ( 10000 ) # the histogram of the data n , bins , patches = plt . hist ( x , 50 , normed = 1 , facecolor = 'g' , alpha = 0.75 ) plt . xlabel ( 'Smarts' ) plt . ylabel ( 'Probability' ) plt . title ( 'Histogram of IQ' ) plt . text ( 60 , . 025 , r '$\\mu=100,\\ \\sigma=15$' ) plt . axis ([ 40 , 160 , 0 , 0.03 ]) plt . grid ( True ) plt . show () Saving the figure normally, one would use the following code plt . savefig ( 'filename.png' ) <matplotlib.figure.Figure at 0x2e45e92f400> The figure in then exported to the file \"filename.png\" with a standard resolution. In adittion, you can specify the dpi arg to some scalar value, for example: plt . savefig ( 'filename_hi_dpi.png' , dpi = 300 ) <matplotlib.figure.Figure at 0x2e462164898>","tags":"Python","url":"redoules.github.io/python/Saving_a_matplotlib_figure_with_a_high_resolution.html","loc":"redoules.github.io/python/Saving_a_matplotlib_figure_with_a_high_resolution.html"},{"title":"Iterating over a DataFrame","text":"Create a sample dataframe # Import modules import pandas as pd # Example dataframe raw_data = { 'fruit' : [ 'Banana' , 'Orange' , 'Apple' , 'lemon' , \"lime\" , \"plum\" ], 'color' : [ 'yellow' , 'orange' , 'red' , 'yellow' , \"green\" , \"purple\" ], 'kcal' : [ 89 , 47 , 52 , 15 , 30 , 28 ] } df = pd . DataFrame ( raw_data , columns = [ 'fruit' , 'color' , 'kcal' ]) df .dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; } fruit color kcal 0 Banana yellow 89 1 Orange orange 47 2 Apple red 52 3 lemon yellow 15 4 lime green 30 5 plum purple 28 Using the iterrows method Pandas DataFrames can return a generator with the iterrrows method. It can then be used to loop over the rows of the DataFrame for index , row in df . iterrows (): print ( \"At line {0} there is a {1} which is {2} and contains {3} kcal\" . format ( index , row [ \"fruit\" ], row [ \"color\" ], row [ \"kcal\" ])) At line 0 there is a Banana which is yellow and contains 89 kcal At line 1 there is a Orange which is orange and contains 47 kcal At line 2 there is a Apple which is red and contains 52 kcal At line 3 there is a lemon which is yellow and contains 15 kcal At line 4 there is a lime which is green and contains 30 kcal At line 5 there is a plum which is purple and contains 28 kcal","tags":"Python","url":"redoules.github.io/python/Iterating_over_a_dataframe.html","loc":"redoules.github.io/python/Iterating_over_a_dataframe.html"},{"title":"Article Recommander","text":"import pandas as pd import numpy as np % matplotlib inline Loading data and preprocessing we first learn the pickled article database. We will be cleaning it and separating the interesting articles from the uninteresting ones. df = pd . read_pickle ( './article.pkl' ) del df [ \"html\" ] del df [ \"image\" ] del df [ \"URL\" ] del df [ \"hash\" ] del df [ \"source\" ] df [ \"label\" ] = df [ \"note\" ] . apply ( lambda x : 0 if x <= 0 else 1 ) df . head ( 5 ) .dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; } authors note resume texte titre label 0 [Danny Bradbury, Marco Santori, Adam Draper, M... -10.0 Black Market Reloaded, a black market site tha... Black Market Reloaded, a black market site tha... Black Market Reloaded back online after source... 0 1 [Emily Spaven, Stan Higgins, Emilyspaven] 1.0 The UK Home Office believes the government sho... The UK Home Office believes the government sho... Home Office: UK Should Create a Crime-Fighting... 1 2 [Pete Rizzo, Alex Batlin, Yessi Bello Perez, P... -10.0 Though lofty in its ideals, lead developer Dan... A new social messaging app is aiming to disrup... Gems Bitcoin App Lets Users Earn Money From So... 0 3 [Nermin Hajdarbegovic, Stan Higgins, Pete Rizz... 3.0 US satellite service provider DISH Network has... US satellite service provider DISH Network has... DISH Becomes World's Largest Company to Accept... 1 4 [Stan Higgins, Bailey Reutzel, Garrett Keirns,... -10.0 An unidentified 28-year-old man was robbed of ... An unidentified 28-year-old man was robbed of ... Bitcoin Stolen at Gunpoint in New York City Ro... 0 Basic statistics on the dataset let's explore the dataset and extract some numbers : * the number of article liked/disliked df [ \"label\" ] . value_counts () 0 879 1 324 Name: label, dtype: int64 Create the full content column df [ 'full_content' ] = df . titre + ' ' + df . resume #exclude the full texte of the article for the moment df . head ( 1 ) .dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; } authors note resume texte titre label full_content 0 [Danny Bradbury, Marco Santori, Adam Draper, M... -10.0 Black Market Reloaded, a black market site tha... Black Market Reloaded, a black market site tha... Black Market Reloaded back online after source... 0 Black Market Reloaded back online after source... from sklearn.model_selection import train_test_split training , testing = train_test_split ( df , # The dataset we want to split train_size = 0.75 , # The proportional size of our training set stratify = df . label , # The labels are used for stratification random_state = 400 # Use the same random state for reproducibility ) training . head ( 5 ) .dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; } authors note resume texte titre label full_content 748 [Jon Brodkin] -10.0 Amazon, Reddit, Mozilla, and other Internet co... Amazon, Reddit, Mozilla, and other Internet co... Amazon and Reddit try to save net neutrality r... 0 Amazon and Reddit try to save net neutrality r... 1183 [Jon Brodkin] -10.0 (The Time Warner involved in this transaction ... A group of mostly Democratic senators led by A... Democrats urge Trump administration to block A... 0 Democrats urge Trump administration to block A... 769 [Joseph Brogan] -10.0 On Twitter, bad news comes at all hours, with ... On Twitter, bad news comes at all hours, with ... Some of the best art on Twitter comes from the... 0 Some of the best art on Twitter comes from the... 57 [Michael Del Castillo, Pete Rizzo, Trond Vidar... -10.0 Publicly traded online travel service Webjet i... Publicly traded online travel service Webjet i... Webjet Ethereum Pilot Targets Hotel Industry's... 0 Webjet Ethereum Pilot Targets Hotel Industry's... 892 [Andrew Cunningham] 10.0 What has changed on the 2017 MacBook, then?\\nI... Andrew Cunningham\\n\\nAndrew Cunningham\\n\\nAndr... Mini-review: The 2017 MacBook could actually b... 1 Mini-review: The 2017 MacBook could actually b... from sklearn.feature_extraction.text import TfidfVectorizer , CountVectorizer from sklearn.svm import LinearSVC , SVC from sklearn.pipeline import Pipeline from sklearn.model_selection import cross_val_predict from utils.plotting import pipeline_performance steps = ( ( 'vectorizer' , TfidfVectorizer ()), ( 'classifier' , LinearSVC ()) ) pipeline = Pipeline ( steps ) predicted_labels = cross_val_predict ( pipeline , training . full_content , training . label ) pipeline_performance ( training . label , predicted_labels ) pipeline = pipeline . fit ( training . titre , training . label ) Accuracy = 80.6% Confusion matrix, without normalization [[624 35] [140 103]] import re from utils.plotting import print_top_features from sklearn.model_selection import GridSearchCV def mask_integers ( s ): return re . sub ( r '\\d+' , 'INTMASK' , s ) steps = ( ( 'vectorizer' , TfidfVectorizer ()), ( 'classifier' , LinearSVC ()) ) pipeline = Pipeline ( steps ) gs_params = { #'vectorizer__use_idf': (True, False), 'vectorizer__lowercase' : [ True , False ], 'vectorizer__stop_words' : [ 'english' , None ], 'vectorizer__ngram_range' : [( 1 , 1 ), ( 1 , 2 ), ( 2 , 2 )], 'vectorizer__preprocessor' : [ mask_integers , None ], 'classifier__C' : np . linspace ( 5 , 20 , 25 ) } gs = GridSearchCV ( pipeline , gs_params , n_jobs = 1 ) gs . fit ( training . full_content , training . label ) print ( gs . best_params_ ) print ( gs . best_score_ ) pipeline1 = gs . best_estimator_ predicted_labels = pipeline1 . predict ( testing . full_content ) pipeline_performance ( testing . label , predicted_labels ) print_top_features ( pipeline1 , n_features = 10 ) aaa = gs . predict ( testing . full_content ) == testing . label aaa = aaa [ testing . label == 1 ] testing [ \"titre\" ] . iloc [ ~ aaa . values ] #pipeline1.predict([\"windows xbox bitcoin\"]) from sklearn.externals import joblib joblib . dump ( pipeline1 , 'classifier.pkl' ) gs . predict ([ 'Google' ]) array([1], dtype=int64) steps = ( ( 'vectorizer' , TfidfVectorizer ()), ( 'classifier' , SVC ()) ) pipeline = Pipeline ( steps ) gs_params = { #'vectorizer__use_idf': (True, False), 'vectorizer__stop_words' : [ 'english' , None ], 'vectorizer__ngram_range' : [( 1 , 1 ), ( 1 , 2 ), ( 2 , 2 )], 'vectorizer__preprocessor' : [ mask_integers , None ], 'classifier__C' : np . linspace ( 5 , 20 , 25 ) } gs = GridSearchCV ( pipeline , gs_params , n_jobs = 1 ) gs . fit ( training . full_content , training . label ) print ( gs . best_params_ ) print ( gs . best_score_ ) pipeline1 = gs . best_estimator_ predicted_labels = pipeline1 . predict ( testing . full_content ) pipeline_performance ( testing . label , predicted_labels ) print_top_features ( pipeline1 , n_features = 10 ) {'classifier__C': 5.0, 'vectorizer__ngram_range': (1, 1), 'vectorizer__preprocessor': <function mask_integers at 0x00000237491B67B8>, 'vectorizer__stop_words': 'english'} 0.711180124224 Accuracy = 71.2% Confusion matrix, without normalization [[153 0] [ 62 0]] --------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-9-3e0781e307fb> in <module>() 25 pipeline_performance(testing.label, predicted_labels) 26 ---> 27 print_top_features(pipeline1, n_features=10) C:\\Users\\Guillaume\\Documents\\Code\\recommandation\\utils\\plotting.py in print_top_features(pipeline, vectorizer_name, classifier_name, n_features) 81 def print_top_features(pipeline, vectorizer_name='vectorizer', classifier_name='classifier', n_features=7): 82 vocabulary = np.array(pipeline.named_steps[vectorizer_name].get_feature_names()) ---> 83 coefs = pipeline.named_steps[classifier_name].coef_[0] 84 top_feature_idx = np.argsort(coefs) 85 top_features = vocabulary[top_feature_idx] C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\svm\\base.py in coef_(self) 483 def coef_(self): 484 if self.kernel != 'linear': --> 485 raise ValueError('coef_ is only available when using a ' 486 'linear kernel') 487 ValueError: coef_ is only available when using a linear kernel from sklearn.naive_bayes import BernoulliNB steps = ( ( 'vectorizer' , TfidfVectorizer ()), ( 'classifier' , BernoulliNB ()) ) pipeline2 = Pipeline ( steps ) gs_params = { 'vectorizer__stop_words' : [ 'english' , None ], 'vectorizer__ngram_range' : [( 1 , 1 ), ( 1 , 2 ), ( 2 , 2 )], 'vectorizer__preprocessor' : [ mask_integers , None ], 'classifier__alpha' : np . linspace ( 0 , 1 , 5 ), 'classifier__fit_prior' : [ True , False ] } gs = GridSearchCV ( pipeline2 , gs_params , n_jobs = 1 ) gs . fit ( training . full_content , training . label ) print ( gs . best_params_ ) print ( gs . best_score_ ) pipeline2 = gs . best_estimator_ predicted_labels = pipeline2 . predict ( testing . full_content ) pipeline_performance ( testing . label , predicted_labels ) print_top_features ( pipeline2 , n_features = 10 ) C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:801: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:801: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:801: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:801: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:820: RuntimeWarning: divide by zero encountered in log neg_prob = np.log(1 - np.exp(self.feature_log_prob_)) C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:823: RuntimeWarning: invalid value encountered in add jll += self.class_log_prior_ + neg_prob.sum(axis=1) C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:801: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:820: RuntimeWarning: divide by zero encountered in log neg_prob = np.log(1 - np.exp(self.feature_log_prob_)) C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:823: RuntimeWarning: invalid value encountered in add jll += self.class_log_prior_ + neg_prob.sum(axis=1) C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:801: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:820: RuntimeWarning: divide by zero encountered in log neg_prob = np.log(1 - np.exp(self.feature_log_prob_)) C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:823: RuntimeWarning: invalid value encountered in add jll += self.class_log_prior_ + neg_prob.sum(axis=1) C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:801: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:801: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:801: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:801: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:820: RuntimeWarning: divide by zero encountered in log neg_prob = np.log(1 - np.exp(self.feature_log_prob_)) C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:823: RuntimeWarning: invalid value encountered in add jll += self.class_log_prior_ + neg_prob.sum(axis=1) C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:801: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:820: RuntimeWarning: divide by zero encountered in log neg_prob = np.log(1 - np.exp(self.feature_log_prob_)) C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:823: RuntimeWarning: invalid value encountered in add jll += self.class_log_prior_ + neg_prob.sum(axis=1) C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:801: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:820: RuntimeWarning: divide by zero encountered in log neg_prob = np.log(1 - np.exp(self.feature_log_prob_)) C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:823: RuntimeWarning: invalid value encountered in add jll += self.class_log_prior_ + neg_prob.sum(axis=1) C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:801: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:801: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:801: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:801: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:820: RuntimeWarning: divide by zero encountered in log neg_prob = np.log(1 - np.exp(self.feature_log_prob_)) C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:823: RuntimeWarning: invalid value encountered in add jll += self.class_log_prior_ + neg_prob.sum(axis=1) C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:801: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:820: RuntimeWarning: divide by zero encountered in log neg_prob = np.log(1 - np.exp(self.feature_log_prob_)) C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:823: RuntimeWarning: invalid value encountered in add jll += self.class_log_prior_ + neg_prob.sum(axis=1) C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:801: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:820: RuntimeWarning: divide by zero encountered in log neg_prob = np.log(1 - np.exp(self.feature_log_prob_)) C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:823: RuntimeWarning: invalid value encountered in add jll += self.class_log_prior_ + neg_prob.sum(axis=1) C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:801: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:801: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:801: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:801: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:820: RuntimeWarning: divide by zero encountered in log neg_prob = np.log(1 - np.exp(self.feature_log_prob_)) C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:823: RuntimeWarning: invalid value encountered in add jll += self.class_log_prior_ + neg_prob.sum(axis=1) C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:801: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:820: RuntimeWarning: divide by zero encountered in log neg_prob = np.log(1 - np.exp(self.feature_log_prob_)) C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:823: RuntimeWarning: invalid value encountered in add jll += self.class_log_prior_ + neg_prob.sum(axis=1) C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:801: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:820: RuntimeWarning: divide by zero encountered in log neg_prob = np.log(1 - np.exp(self.feature_log_prob_)) C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:823: RuntimeWarning: invalid value encountered in add jll += self.class_log_prior_ + neg_prob.sum(axis=1) C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:801: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:801: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:801: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:801: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:801: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:801: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:801: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:801: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:801: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:801: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:801: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:801: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:801: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:801: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:801: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:801: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:820: RuntimeWarning: divide by zero encountered in log neg_prob = np.log(1 - np.exp(self.feature_log_prob_)) C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:823: RuntimeWarning: invalid value encountered in add jll += self.class_log_prior_ + neg_prob.sum(axis=1) C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:801: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:820: RuntimeWarning: divide by zero encountered in log neg_prob = np.log(1 - np.exp(self.feature_log_prob_)) C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:823: RuntimeWarning: invalid value encountered in add jll += self.class_log_prior_ + neg_prob.sum(axis=1) C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:801: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:820: RuntimeWarning: divide by zero encountered in log neg_prob = np.log(1 - np.exp(self.feature_log_prob_)) C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:823: RuntimeWarning: invalid value encountered in add jll += self.class_log_prior_ + neg_prob.sum(axis=1) C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:801: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:801: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:801: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:801: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:820: RuntimeWarning: divide by zero encountered in log neg_prob = np.log(1 - np.exp(self.feature_log_prob_)) C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:823: RuntimeWarning: invalid value encountered in add jll += self.class_log_prior_ + neg_prob.sum(axis=1) C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:801: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:820: RuntimeWarning: divide by zero encountered in log neg_prob = np.log(1 - np.exp(self.feature_log_prob_)) C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:823: RuntimeWarning: invalid value encountered in add jll += self.class_log_prior_ + neg_prob.sum(axis=1) C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:801: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:820: RuntimeWarning: divide by zero encountered in log neg_prob = np.log(1 - np.exp(self.feature_log_prob_)) C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:823: RuntimeWarning: invalid value encountered in add jll += self.class_log_prior_ + neg_prob.sum(axis=1) C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:801: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:801: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:801: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:801: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:820: RuntimeWarning: divide by zero encountered in log neg_prob = np.log(1 - np.exp(self.feature_log_prob_)) C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:823: RuntimeWarning: invalid value encountered in add jll += self.class_log_prior_ + neg_prob.sum(axis=1) C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:801: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:820: RuntimeWarning: divide by zero encountered in log neg_prob = np.log(1 - np.exp(self.feature_log_prob_)) C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:823: RuntimeWarning: invalid value encountered in add jll += self.class_log_prior_ + neg_prob.sum(axis=1) C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:801: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:820: RuntimeWarning: divide by zero encountered in log neg_prob = np.log(1 - np.exp(self.feature_log_prob_)) C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:823: RuntimeWarning: invalid value encountered in add jll += self.class_log_prior_ + neg_prob.sum(axis=1) C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:801: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:801: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:801: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:801: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:820: RuntimeWarning: divide by zero encountered in log neg_prob = np.log(1 - np.exp(self.feature_log_prob_)) C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:823: RuntimeWarning: invalid value encountered in add jll += self.class_log_prior_ + neg_prob.sum(axis=1) C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:801: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:820: RuntimeWarning: divide by zero encountered in log neg_prob = np.log(1 - np.exp(self.feature_log_prob_)) C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:823: RuntimeWarning: invalid value encountered in add jll += self.class_log_prior_ + neg_prob.sum(axis=1) C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:801: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:820: RuntimeWarning: divide by zero encountered in log neg_prob = np.log(1 - np.exp(self.feature_log_prob_)) C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:823: RuntimeWarning: invalid value encountered in add jll += self.class_log_prior_ + neg_prob.sum(axis=1) C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:801: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:801: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:801: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:801: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:801: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:801: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:801: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:801: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:801: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:801: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:801: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:801: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - {'classifier__alpha': 0.25, 'classifier__fit_prior': True, 'vectorizer__ngram_range': (1, 1), 'vectorizer__preprocessor': <function mask_integers at 0x00000237491B67B8>, 'vectorizer__stop_words': 'english'} 0.805900621118 Accuracy = 78.1% Confusion matrix, without normalization [[140 13] [ 34 28]] Top like features: ['use' 'just' 'year' 'price' 'time' 'Bitcoin' 'bitcoin' 'new' 'The' 'INTMASK'] --- Top dislike features: ['ABBA' 'cable' 'cab' 'byte' 'publication' 'bye' 'publications' 'publicity' 'buyer' 'publicizing'] from sklearn.naive_bayes import MultinomialNB steps = ( ( 'vectorizer' , TfidfVectorizer ()), ( 'classifier' , MultinomialNB ()) ) pipeline3 = Pipeline ( steps ) gs_params = { 'vectorizer__stop_words' : [ 'english' , None ], 'vectorizer__ngram_range' : [( 1 , 1 ), ( 1 , 2 ), ( 2 , 2 )], 'vectorizer__preprocessor' : [ mask_integers , None ], 'classifier__alpha' : np . linspace ( 0 , 1 , 5 ), 'classifier__fit_prior' : [ True , False ] } gs = GridSearchCV ( pipeline3 , gs_params , n_jobs = 1 ) gs . fit ( training . full_content , training . label ) print ( gs . best_params_ ) print ( gs . best_score_ ) pipeline3 = gs . best_estimator_ predicted_labels = pipeline3 . predict ( testing . full_content ) pipeline_performance ( testing . label , predicted_labels ) print_top_features ( pipeline3 , n_features = 10 ) C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:699: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:699: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:699: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:699: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:699: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:699: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:699: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:699: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:699: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:699: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:699: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:699: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:699: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:699: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:699: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:699: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:699: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:699: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:699: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:699: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:699: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:699: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:699: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:699: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:699: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:699: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:699: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:699: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:699: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:699: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:699: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:699: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:699: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:699: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:699: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:699: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:699: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:699: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:699: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:699: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:699: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:699: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:699: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:699: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:699: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:699: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:699: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:699: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:699: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:699: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:699: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:699: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:699: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:699: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:699: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:699: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:699: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:699: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:699: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:699: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:699: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:699: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:699: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:699: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:699: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:699: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:699: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:699: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:699: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:699: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:699: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - C:\\Users\\Guillaume\\Anaconda3\\lib\\site-packages\\sklearn\\naive_bayes.py:699: RuntimeWarning: divide by zero encountered in log self.feature_log_prob_ = (np.log(smoothed_fc) - {'classifier__alpha': 0.5, 'classifier__fit_prior': False, 'vectorizer__ngram_range': (1, 1), 'vectorizer__preprocessor': <function mask_integers at 0x00000237491B67B8>, 'vectorizer__stop_words': 'english'} 0.80900621118 Accuracy = 79.1% Confusion matrix, without normalization [[141 12] [ 33 29]] Top like features: ['time' 'Google' 'Pro' 'Apple' 'new' 'The' 'Bitcoin' 'price' 'bitcoin' 'INTMASK'] --- Top dislike features: ['ABBA' 'categories' 'catching' 'catalyst' 'catalog' 'casually' 'casts' 'cast' 'cashier' 'ran']","tags":"Machine Learning","url":"redoules.github.io/machine-learning/Source code for the recommandation engine for articles.html","loc":"redoules.github.io/machine-learning/Source code for the recommandation engine for articles.html"}]}