tag:blogger.com,1999:blog-82306928776209382042024-03-05T23:27:59.670-08:00MachineLearningStoriesUnknownnoreply@blogger.comBlogger32125tag:blogger.com,1999:blog-8230692877620938204.post-49722871759610480112021-04-04T21:20:00.003-07:002021-04-04T21:20:37.810-07:00 Python: Audio to text conversion <p> This is simple program to convert audio into text. I have used speec_recognition library to do it.</p><br /><i><span style="color: blue;">import speech_recognition as sr</span></i><br /><i><span style="color: blue;"><br /></span></i><i><span style="color: blue;">r = sr.Recognizer()</span></i><br /><i><span style="color: blue;">with sr.Microphone() as source:</span></i><br /><i><span style="color: blue;"> print("Speak Anything :")</span></i><br /><i><span style="color: blue;"> audio = r.listen(source)</span></i><br /><i><span style="color: blue;"> try:</span></i><br /><i><span style="color: blue;"> text = r.recognize_google(audio)</span></i><br /><i><span style="color: blue;"> print("You said : {}".format(text))</span></i><br /><i><span style="color: blue;"> except:</span></i><br /><i><span style="color: blue;"> print("Sorry could not recognize what you said")</span></i><br /><br />Output-<br /><span style="font-family: monospace; font-size: 14px; line-height: 16.94px; overflow-wrap: break-word; white-space: pre-wrap; word-break: break-all;">Speak Anything :
You said : internet is required to access the Google API if there is no internet you will not be able to convert voice into text</span><br /><span style="font-family: monospace; font-size: 14px; line-height: 16.94px; overflow-wrap: break-word; white-space: pre-wrap; word-break: break-all;"><br /></span><span style="font-family: monospace; font-size: 14px; line-height: 16.94px; overflow-wrap: break-word; white-space: pre-wrap; word-break: break-all;"><br /></span>to install SpeechRecognition-<br /><a href="https://pypi.org/project/SpeechRecognition/">https://pypi.org/project/SpeechRecognition/</a><br /><br />one of the dependencies is PyAudio which is not supported after 3.6.<br />For python verison 3.6+ one needs to download wheel ( <a href="https://python101.pythonlibrary.org/chapter39_wheels.html">https://python101.pythonlibrary.org/chapter39_wheels.html</a>) and install PyAudio separately.<br /><br /><a href="https://stackoverflow.com/questions/54998028/how-do-i-install-pyaudio-on-python-3-7">https://stackoverflow.com/questions/54998028/how-do-i-install-pyaudio-on-python-3-7</a>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-8230692877620938204.post-12466910173641992182020-06-03T00:50:00.004-07:002020-06-03T02:12:33.018-07:00Exhaustive Literature Study on XAI<font size="6"><b><u>All Frameworks of Explainable AI-</u></b></font><div><br /></div><div><br /><table border="0" cellpadding="0" cellspacing="0" style="border-collapse: collapse; width: 384px;"><colgroup><col style="width: 145pt;" width="193"></col><col style="width: 143pt;" width="191"></col></colgroup><tbody><tr height="33" style="height: 25pt;"><td class="xl65" height="33" style="background: rgb(231, 230, 230); border-color: white; border-image: initial; border-style: solid; border-width: 1pt 1pt 1.5pt; color: windowtext; font-family: arial, sans-serif; font-size: 18pt; height: 25pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 145pt;" width="193">Framework</td><td class="xl65" style="background: rgb(231, 230, 230); border-bottom: 1.5pt solid white; border-image: initial; border-left: none; border-right: 1pt solid white; border-top: 1pt solid white; color: windowtext; font-family: arial, sans-serif; font-size: 18pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 143pt;" width="191">Algorithms used</td></tr><tr height="71" style="height: 53pt;"><td class="xl66" dir="LTR" height="71" style="background: rgb(231, 230, 230); border-bottom: 1pt solid white; border-image: initial; border-left: 1pt solid white; border-right: 1pt solid white; border-top: none; font-family: calibri, sans-serif; font-size: 12pt; height: 53pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: bottom; width: 145pt;" width="193">Aix360</td><td class="xl67" dir="LTR" style="background: rgb(231, 230, 230); border-bottom: 1pt solid white; border-image: initial; border-left: none; border-right: 1pt solid white; border-top: none; font-family: calibri, sans-serif; font-size: 12pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 143pt;" width="191">Protodash, DIP-VAE, CEM, CEM-MAF, TED, BRCG, GLRM</td></tr><tr height="92" style="height: 69pt;"><td class="xl68" dir="LTR" height="92" style="background: rgb(231, 230, 230); border-bottom: 1pt solid white; border-image: initial; border-left: 1pt solid white; border-right: 1pt solid white; border-top: none; font-family: calibri, sans-serif; font-size: 12pt; height: 69pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: bottom; width: 145pt;" width="193">Alibi</td><td class="xl69" dir="LTR" style="background: rgb(231, 230, 230); border-bottom: 1pt solid white; border-image: initial; border-left: none; border-right: 1pt solid white; border-top: none; font-family: calibri, sans-serif; font-size: 12pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 143pt;" width="191">CEM, counter-factual, explanation, anchors, counter factual explanation( prototype)</td></tr><tr height="32" style="height: 24pt;"><td class="xl70" dir="LTR" height="32" style="background: rgb(231, 230, 230); border-bottom: 1pt solid white; border-image: initial; border-left: 1pt solid white; border-right: 1pt solid white; border-top: none; color: #24292e; font-family: calibri, sans-serif; font-size: 12pt; height: 24pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: bottom; width: 145pt;" width="193">Dalex</td><td class="xl71" style="background: rgb(231, 230, 230); border-bottom: 1pt solid white; border-image: initial; border-left: none; border-right: 1pt solid white; border-top: none; color: windowtext; font-family: arial, sans-serif; font-size: 18pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 143pt;" width="191"> </td></tr><tr height="24" style="height: 18pt;"><td class="xl68" dir="LTR" height="24" style="background: rgb(231, 230, 230); border-bottom: 1pt solid white; border-image: initial; border-left: 1pt solid white; border-right: 1pt solid white; border-top: none; font-family: calibri, sans-serif; font-size: 12pt; height: 18pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: bottom; width: 145pt;" width="193">Eli5</td><td class="xl69" dir="LTR" style="background: rgb(231, 230, 230); border-bottom: 1pt solid white; border-image: initial; border-left: none; border-right: 1pt solid white; border-top: none; font-family: calibri, sans-serif; font-size: 12pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 143pt;" width="191">LIME, grad_cam</td></tr><tr height="69" style="height: 52pt;"><td class="xl68" dir="LTR" height="69" style="background: rgb(231, 230, 230); border-bottom: 1pt solid white; border-image: initial; border-left: 1pt solid white; border-right: 1pt solid white; border-top: none; font-family: calibri, sans-serif; font-size: 12pt; height: 52pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: bottom; width: 145pt;" width="193">H20</td><td class="xl69" dir="LTR" style="background: rgb(231, 230, 230); border-bottom: 1pt solid white; border-image: initial; border-left: none; border-right: 1pt solid white; border-top: none; font-family: calibri, sans-serif; font-size: 12pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 143pt;" width="191">Shapley values, K-LIME, PDP, LOCO, SDT, disparate impact analysis</td></tr><tr height="47" style="height: 35pt;"><td class="xl70" dir="LTR" height="47" style="background: rgb(231, 230, 230); border-bottom: 1pt solid white; border-image: initial; border-left: 1pt solid white; border-right: 1pt solid white; border-top: none; color: #24292e; font-family: calibri, sans-serif; font-size: 12pt; height: 35pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: bottom; width: 145pt;" width="193">Google explainable AI</td><td class="xl69" dir="LTR" style="background: rgb(231, 230, 230); border-bottom: 1pt solid white; border-image: initial; border-left: none; border-right: 1pt solid white; border-top: none; font-family: calibri, sans-serif; font-size: 12pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 143pt;" width="191">Integrated gradients, shapley</td></tr><tr height="24" style="height: 18pt;"><td class="xl68" dir="LTR" height="24" style="background: rgb(231, 230, 230); border-bottom: 1pt solid white; border-image: initial; border-left: 1pt solid white; border-right: 1pt solid white; border-top: none; font-family: calibri, sans-serif; font-size: 12pt; height: 18pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: bottom; width: 145pt;" width="193">MS Azure explainability</td><td class="xl69" dir="LTR" style="background: rgb(231, 230, 230); border-bottom: 1pt solid white; border-image: initial; border-left: none; border-right: 1pt solid white; border-top: none; font-family: calibri, sans-serif; font-size: 12pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 143pt;" width="191">SHAP, mimic, HAN</td></tr><tr height="47" style="height: 35pt;"><td class="xl69" dir="LTR" height="47" style="background: rgb(231, 230, 230); border-bottom: 1pt solid white; border-image: initial; border-left: 1pt solid white; border-right: 1pt solid white; border-top: none; font-family: calibri, sans-serif; font-size: 12pt; height: 35pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 145pt;" width="193">Captum</td><td class="xl69" dir="LTR" style="background: rgb(231, 230, 230); border-bottom: 1pt solid white; border-image: initial; border-left: none; border-right: 1pt solid white; border-top: none; font-family: calibri, sans-serif; font-size: 12pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 143pt;" width="191">Captum, IG, DeepLift, ( for pytoch)</td></tr><tr height="24" style="height: 18pt;"><td class="xl68" dir="LTR" height="24" style="background: rgb(231, 230, 230); border-bottom: 1pt solid white; border-image: initial; border-left: 1pt solid white; border-right: 1pt solid white; border-top: none; font-family: calibri, sans-serif; font-size: 12pt; height: 18pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: bottom; width: 145pt;" width="193">Skater</td><td class="xl69" dir="LTR" style="background: rgb(231, 230, 230); border-bottom: 1pt solid white; border-image: initial; border-left: none; border-right: 1pt solid white; border-top: none; font-family: calibri, sans-serif; font-size: 12pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 143pt;" width="191">Lime, PDP</td></tr><tr height="47" style="height: 35pt;"><td class="xl69" dir="LTR" height="47" style="background: rgb(231, 230, 230); border-bottom: 1pt solid white; border-image: initial; border-left: 1pt solid white; border-right: 1pt solid white; border-top: none; font-family: calibri, sans-serif; font-size: 12pt; height: 35pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 145pt;" width="193">Lucid- tensorflow</td><td class="xl69" dir="LTR" style="background: rgb(231, 230, 230); border-bottom: 1pt solid white; border-image: initial; border-left: none; border-right: 1pt solid white; border-top: none; font-family: calibri, sans-serif; font-size: 12pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 143pt;" width="191">Set of tools ( NN explainability) </td></tr><tr height="24" style="height: 18pt;"><td class="xl69" dir="LTR" height="24" style="background: rgb(231, 230, 230); border-bottom: 1pt solid white; border-image: initial; border-left: 1pt solid white; border-right: 1pt solid white; border-top: none; font-family: calibri, sans-serif; font-size: 12pt; height: 18pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 145pt;" width="193">InterpretML</td><td class="xl69" dir="LTR" style="background: rgb(231, 230, 230); border-bottom: 1pt solid white; border-image: initial; border-left: none; border-right: 1pt solid white; border-top: none; font-family: calibri, sans-serif; font-size: 12pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 143pt;" width="191">Surrogate model building </td></tr></tbody></table></div><div><h1 style="text-align: left;"><br /></h1><h1 style="text-align: left;"><u>All Algorithms of Explainable AI-</u></h1><div><br /></div></div><div><table border="0" cellpadding="0" cellspacing="0" style="border-collapse: collapse; width: 589px;"><colgroup><col style="width: 100pt;" width="133"></col><col style="width: 57pt;" width="76"></col><col style="width: 77pt;" width="103"></col><col style="width: 69pt;" width="92"></col><col style="width: 139pt;" width="185"></col></colgroup><tbody><tr height="55" style="height: 41pt;"><td class="xl67" dir="LTR" height="55" style="background: rgb(231, 230, 230); border: 1pt solid white; font-family: calibri, sans-serif; font-size: 14pt; font-weight: 700; height: 41pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 100pt;" width="133">Algorithm</td><td class="xl72" dir="LTR" style="background: rgb(231, 230, 230); border-color: white; border-image: initial; border-style: solid; border-width: 1pt 1pt 1.5pt; font-family: calibri, sans-serif; font-size: 14pt; font-weight: 700; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 57pt;" width="76">Explainability</td><td class="xl72" dir="LTR" style="background: rgb(231, 230, 230); border-color: white; border-image: initial; border-style: solid; border-width: 1pt 1pt 1.5pt; font-family: calibri, sans-serif; font-size: 14pt; font-weight: 700; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 77pt;" width="103">Type of Data</td><td class="xl72" dir="LTR" style="background: rgb(231, 230, 230); border-color: white; border-image: initial; border-style: solid; border-width: 1pt 1pt 1.5pt; font-family: calibri, sans-serif; font-size: 14pt; font-weight: 700; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 69pt;" width="92">Mechanism </td><td class="xl73" dir="LTR" style="background: rgb(231, 230, 230); border-color: white; border-image: initial; border-style: solid; border-width: 1pt 1pt 1.5pt; font-family: calibri, sans-serif; font-size: 14pt; font-weight: 700; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 139pt;" width="185">Links</td></tr><tr height="71" style="height: 53pt;"><td class="xl75" dir="LTR" height="71" style="background: rgb(231, 230, 230); border-color: white; border-image: initial; border-style: solid; border-width: 1.5pt 1pt 1pt; font-family: calibri, sans-serif; font-size: 12pt; height: 53pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: bottom; width: 100pt;" width="133">ACE( Automatic concept based explanation)</td><td class="xl75" dir="LTR" style="background: rgb(231, 230, 230); border-color: white; border-image: initial; border-style: solid; border-width: 1.5pt 1pt 1pt; font-family: calibri, sans-serif; font-size: 12pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: bottom; width: 57pt;" width="76">Global (G)</td><td class="xl65" dir="LTR" style="background: rgb(231, 230, 230); border-color: white; border-image: initial; border-style: solid; border-width: 1.5pt 1pt 1pt; font-family: calibri, sans-serif; font-size: 14pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 77pt;" width="103">Any</td><td class="xl66" style="background: rgb(231, 230, 230); border-color: white; border-image: initial; border-style: solid; border-width: 1.5pt 1pt 1pt; color: windowtext; font-family: arial, sans-serif; font-size: 18pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: top; width: 69pt;" width="92"> </td><td class="xl74" dir="LTR" style="background: rgb(231, 230, 230); border-color: white; border-image: initial; border-style: solid; border-width: 1.5pt 1pt 1pt; color: #0563c1; font-family: calibri, sans-serif; font-size: 12pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; text-decoration-line: underline; vertical-align: middle; width: 139pt;" width="185"><a href="https://arxiv.org/pdf/1902.03129.pdf">https://arxiv.org/pdf/1902.03129.pdf </a></td></tr><tr height="116" style="height: 87pt;"><td class="xl75" dir="LTR" height="116" style="background: rgb(231, 230, 230); border-color: white; border-image: initial; border-style: solid; border-width: 1.5pt 1pt 1pt; font-family: calibri, sans-serif; font-size: 12pt; height: 87pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: bottom; width: 100pt;" width="133">Anchor</td><td class="xl76" dir="LTR" style="background: rgb(231, 230, 230); border-color: white; border-image: initial; border-style: solid; border-width: 1.5pt 1pt 1pt; font-family: calibri, sans-serif; font-size: 12pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 57pt;" width="76">Local (L)</td><td class="xl67" dir="LTR" style="background: rgb(231, 230, 230); border: 1pt solid white; font-family: calibri, sans-serif; font-size: 14pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 77pt;" width="103">Structured*</td><td class="xl67" dir="LTR" style="background: rgb(231, 230, 230); border: 1pt solid white; font-family: calibri, sans-serif; font-size: 14pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 69pt;" width="92">Optimization</td><td class="xl77" dir="LTR" style="background: rgb(231, 230, 230); border: 1pt solid white; color: #0563c1; font-family: calibri, sans-serif; font-size: 12pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; text-decoration-line: underline; vertical-align: middle; width: 139pt;" width="185"><a href="https://homes.cs.washington.edu/~marcotcr/aaai18.pdf">https://homes.cs.washington.edu/~marcotcr/aaai18.pdf </a></td></tr><tr height="32" style="height: 24pt;"><td class="xl78" dir="LTR" height="32" style="background: rgb(231, 230, 230); border: 1pt solid white; font-family: calibri, sans-serif; font-size: 12pt; height: 24pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: bottom; width: 100pt;" width="133">Autoencoder</td><td class="xl79" dir="LTR" style="background: rgb(231, 230, 230); border: 1pt solid white; font-family: calibri, sans-serif; font-size: 12pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 57pt;" width="76">L</td><td class="xl67" dir="LTR" style="background: rgb(231, 230, 230); border: 1pt solid white; font-family: calibri, sans-serif; font-size: 14pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 77pt;" width="103">Any</td><td class="xl68" style="background: rgb(231, 230, 230); border: 1pt solid white; color: windowtext; font-family: arial, sans-serif; font-size: 18pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: top; width: 69pt;" width="92"> </td><td class="xl80" style="background: rgb(231, 230, 230); border: 1pt solid white; color: windowtext; font-family: arial, sans-serif; font-size: 18pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: top; width: 139pt;" width="185"> </td></tr><tr height="32" style="height: 24pt;"><td class="xl81" dir="LTR" height="32" style="background: rgb(231, 230, 230); border: 1pt solid white; color: #24292e; font-family: calibri, sans-serif; font-size: 12pt; height: 24pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: bottom; width: 100pt;" width="133">CAM/</td><td class="xl78" dir="LTR" style="background: rgb(231, 230, 230); border: 1pt solid white; font-family: calibri, sans-serif; font-size: 12pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: bottom; width: 57pt;" width="76">L</td><td class="xl69" dir="LTR" style="background: rgb(231, 230, 230); border-bottom: none; border-image: initial; border-left: 1pt solid white; border-right: 1pt solid white; border-top: 1pt solid white; font-family: calibri, sans-serif; font-size: 14pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 77pt;" width="103">Image</td><td class="xl69" dir="LTR" style="background: rgb(231, 230, 230); border-bottom: none; border-image: initial; border-left: 1pt solid white; border-right: 1pt solid white; border-top: 1pt solid white; font-family: calibri, sans-serif; font-size: 14pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 69pt;" width="92">Sensitivity </td><td class="xl82" style="background: rgb(231, 230, 230); border-bottom: none; border-image: initial; border-left: 1pt solid white; border-right: 1pt solid white; border-top: 1pt solid white; color: windowtext; font-family: arial, sans-serif; font-size: 18pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: top; width: 139pt;" width="185"> </td></tr><tr height="32" style="height: 24pt;"><td class="xl78" dir="LTR" height="32" style="background: rgb(231, 230, 230); border: 1pt solid white; font-family: calibri, sans-serif; font-size: 12pt; height: 24pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: bottom; width: 100pt;" width="133">GradCAM/</td><td class="xl79" dir="LTR" style="background: rgb(231, 230, 230); border: 1pt solid white; font-family: calibri, sans-serif; font-size: 12pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 57pt;" width="76"> </td><td class="xl70" dir="LTR" style="background: rgb(231, 230, 230); border-bottom: none; border-image: initial; border-left: 1pt solid white; border-right: 1pt solid white; border-top: none; font-family: calibri, sans-serif; font-size: 14pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 77pt;" width="103"> </td><td class="xl70" dir="LTR" style="background: rgb(231, 230, 230); border-bottom: none; border-image: initial; border-left: 1pt solid white; border-right: 1pt solid white; border-top: none; font-family: calibri, sans-serif; font-size: 14pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 69pt;" width="92"> </td><td class="xl83" style="background: rgb(231, 230, 230); border-bottom: none; border-image: initial; border-left: 1pt solid white; border-right: 1pt solid white; border-top: none; color: windowtext; font-family: arial, sans-serif; font-size: 18pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: top; width: 139pt;" width="185"> </td></tr><tr height="32" style="height: 24pt;"><td class="xl78" dir="LTR" height="32" style="background: rgb(231, 230, 230); border: 1pt solid white; font-family: calibri, sans-serif; font-size: 12pt; height: 24pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: bottom; width: 100pt;" width="133">GradCAM++</td><td class="xl79" dir="LTR" style="background: rgb(231, 230, 230); border: 1pt solid white; font-family: calibri, sans-serif; font-size: 12pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 57pt;" width="76"> </td><td class="xl71" dir="LTR" style="background: rgb(231, 230, 230); border-bottom: 1pt solid white; border-image: initial; border-left: 1pt solid white; border-right: 1pt solid white; border-top: none; font-family: calibri, sans-serif; font-size: 14pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 77pt;" width="103"> </td><td class="xl71" dir="LTR" style="background: rgb(231, 230, 230); border-bottom: 1pt solid white; border-image: initial; border-left: 1pt solid white; border-right: 1pt solid white; border-top: none; font-family: calibri, sans-serif; font-size: 14pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 69pt;" width="92"> </td><td class="xl84" style="background: rgb(231, 230, 230); border-bottom: 1pt solid white; border-image: initial; border-left: 1pt solid white; border-right: 1pt solid white; border-top: none; color: windowtext; font-family: arial, sans-serif; font-size: 18pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: top; width: 139pt;" width="185"> </td></tr><tr height="115" style="height: 86pt;"><td class="xl81" dir="LTR" height="115" style="background: rgb(231, 230, 230); border: 1pt solid white; color: #24292e; font-family: calibri, sans-serif; font-size: 12pt; height: 86pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: bottom; width: 100pt;" width="133">Permutation Importance</td><td class="xl79" dir="LTR" style="background: rgb(231, 230, 230); border: 1pt solid white; font-family: calibri, sans-serif; font-size: 12pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 57pt;" width="76">G</td><td class="xl67" dir="LTR" style="background: rgb(231, 230, 230); border: 1pt solid white; font-family: calibri, sans-serif; font-size: 14pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 77pt;" width="103">structured</td><td class="xl68" style="background: rgb(231, 230, 230); border: 1pt solid white; color: windowtext; font-family: arial, sans-serif; font-size: 18pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: top; width: 69pt;" width="92"> </td><td class="xl77" dir="LTR" style="background: rgb(231, 230, 230); border: 1pt solid white; color: #0563c1; font-family: calibri, sans-serif; font-size: 12pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; text-decoration-line: underline; vertical-align: middle; width: 139pt;" width="185"><a href="https://scikit-learn.org/dev/modules/permutation_importance.html">https://scikit-learn.org/dev/modules/permutation_importance.html</a></td></tr><tr height="32" style="height: 24pt;"><td class="xl78" dir="LTR" height="32" style="background: rgb(231, 230, 230); border: 1pt solid white; font-family: calibri, sans-serif; font-size: 12pt; height: 24pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: bottom; width: 100pt;" width="133">Decision Trees</td><td class="xl79" dir="LTR" style="background: rgb(231, 230, 230); border: 1pt solid white; font-family: calibri, sans-serif; font-size: 12pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 57pt;" width="76">L / G</td><td class="xl67" dir="LTR" style="background: rgb(231, 230, 230); border: 1pt solid white; font-family: calibri, sans-serif; font-size: 14pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 77pt;" width="103">Structured</td><td class="xl68" style="background: rgb(231, 230, 230); border: 1pt solid white; color: windowtext; font-family: arial, sans-serif; font-size: 18pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: top; width: 69pt;" width="92"> </td><td class="xl80" style="background: rgb(231, 230, 230); border: 1pt solid white; color: windowtext; font-family: arial, sans-serif; font-size: 18pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: top; width: 139pt;" width="185"> </td></tr><tr height="108" style="height: 81pt;"><td class="xl78" dir="LTR" height="108" style="background: rgb(231, 230, 230); border: 1pt solid white; font-family: calibri, sans-serif; font-size: 12pt; height: 81pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: bottom; width: 100pt;" width="133">DeepLift</td><td class="xl79" dir="LTR" style="background: rgb(231, 230, 230); border: 1pt solid white; font-family: calibri, sans-serif; font-size: 12pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 57pt;" width="76">L / G</td><td class="xl67" dir="LTR" style="background: rgb(231, 230, 230); border: 1pt solid white; font-family: calibri, sans-serif; font-size: 14pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 77pt;" width="103">Any</td><td class="xl67" dir="LTR" style="background: rgb(231, 230, 230); border: 1pt solid white; font-family: calibri, sans-serif; font-size: 14pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 69pt;" width="92">Decomposition</td><td class="xl67" dir="LTR" style="background: rgb(231, 230, 230); border: 1pt solid white; font-family: calibri, sans-serif; font-size: 14pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 139pt;" width="185"><a href="https://arxiv.org/pdf/1704.02685.pdf"><span style="color: black; font-size: 14pt; text-decoration-line: none;">https://arxiv.org/pdf/1704.02685.pdf </span></a></td></tr><tr height="32" style="height: 24pt;"><td class="xl78" dir="LTR" height="32" style="background: rgb(231, 230, 230); border: 1pt solid white; font-family: calibri, sans-serif; font-size: 12pt; height: 24pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: bottom; width: 100pt;" width="133">GAM/GA2M</td><td class="xl79" dir="LTR" style="background: rgb(231, 230, 230); border: 1pt solid white; font-family: calibri, sans-serif; font-size: 12pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 57pt;" width="76">L / G</td><td class="xl67" dir="LTR" style="background: rgb(231, 230, 230); border: 1pt solid white; font-family: calibri, sans-serif; font-size: 14pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 77pt;" width="103">Structured</td><td class="xl68" style="background: rgb(231, 230, 230); border: 1pt solid white; color: windowtext; font-family: arial, sans-serif; font-size: 18pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: top; width: 69pt;" width="92"> </td><td class="xl80" style="background: rgb(231, 230, 230); border: 1pt solid white; color: windowtext; font-family: arial, sans-serif; font-size: 18pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: top; width: 139pt;" width="185"> </td></tr><tr height="115" style="height: 86pt;"><td class="xl79" dir="LTR" height="115" style="background: rgb(231, 230, 230); border: 1pt solid white; font-family: calibri, sans-serif; font-size: 12pt; height: 86pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 100pt;" width="133">GEF( Generative Explanation framework)</td><td class="xl79" dir="LTR" style="background: rgb(231, 230, 230); border: 1pt solid white; font-family: calibri, sans-serif; font-size: 12pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 57pt;" width="76">L </td><td class="xl67" dir="LTR" style="background: rgb(231, 230, 230); border: 1pt solid white; font-family: calibri, sans-serif; font-size: 14pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 77pt;" width="103">Text</td><td class="xl68" style="background: rgb(231, 230, 230); border: 1pt solid white; color: windowtext; font-family: arial, sans-serif; font-size: 18pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: top; width: 69pt;" width="92"> </td><td class="xl77" dir="LTR" style="background: rgb(231, 230, 230); border: 1pt solid white; color: #0563c1; font-family: calibri, sans-serif; font-size: 12pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; text-decoration-line: underline; vertical-align: middle; width: 139pt;" width="185"><a href="https://arxiv.org/pdf/1811.00196.pdf">https://arxiv.org/pdf/1811.00196.pdf ( no code available)</a></td></tr><tr height="137" style="height: 103pt;"><td class="xl79" dir="LTR" height="137" style="background: rgb(231, 230, 230); border: 1pt solid white; font-family: calibri, sans-serif; font-size: 12pt; height: 103pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 100pt;" width="133">ICE</td><td class="xl79" dir="LTR" style="background: rgb(231, 230, 230); border: 1pt solid white; font-family: calibri, sans-serif; font-size: 12pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 57pt;" width="76">L / G</td><td class="xl67" dir="LTR" style="background: rgb(231, 230, 230); border: 1pt solid white; font-family: calibri, sans-serif; font-size: 14pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 77pt;" width="103">Structured</td><td class="xl67" dir="LTR" style="background: rgb(231, 230, 230); border: 1pt solid white; font-family: calibri, sans-serif; font-size: 14pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 69pt;" width="92">Visualization</td><td class="xl77" dir="LTR" style="background: rgb(231, 230, 230); border: 1pt solid white; color: #0563c1; font-family: calibri, sans-serif; font-size: 12pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; text-decoration-line: underline; vertical-align: middle; width: 139pt;" width="185"><a href="http://savvastjortjoglou.com/intrepretable-machine-learning-nfl-combine.html">http://savvastjortjoglou.com/intrepretable-machine-learning-nfl-combine.html</a></td></tr><tr height="69" style="height: 52pt;"><td class="xl79" dir="LTR" height="69" style="background: rgb(231, 230, 230); border: 1pt solid white; font-family: calibri, sans-serif; font-size: 12pt; height: 52pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 100pt;" width="133">Integrated Gradients</td><td class="xl79" dir="LTR" style="background: rgb(231, 230, 230); border: 1pt solid white; font-family: calibri, sans-serif; font-size: 12pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 57pt;" width="76">L</td><td class="xl67" dir="LTR" style="background: rgb(231, 230, 230); border: 1pt solid white; font-family: calibri, sans-serif; font-size: 14pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 77pt;" width="103">Any</td><td class="xl67" dir="LTR" style="background: rgb(231, 230, 230); border: 1pt solid white; font-family: calibri, sans-serif; font-size: 14pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 69pt;" width="92">Invariance</td><td class="xl77" dir="LTR" style="background: rgb(231, 230, 230); border: 1pt solid white; color: #0563c1; font-family: calibri, sans-serif; font-size: 12pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; text-decoration-line: underline; vertical-align: middle; width: 139pt;" width="185"><a href="https://arxiv.org/pdf/1703.01365.pdf">https://arxiv.org/pdf/1703.01365.pdf </a></td></tr><tr height="71" style="height: 53pt;"><td class="xl85" dir="LTR" height="71" style="background: rgb(231, 230, 230); border-color: white; border-image: initial; border-style: solid; border-width: 1.5pt 1pt 1pt; font-family: calibri, sans-serif; font-size: 14pt; height: 53pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 100pt;" width="133">LIME</td><td class="xl65" dir="LTR" style="background: rgb(231, 230, 230); border-color: white; border-image: initial; border-style: solid; border-width: 1.5pt 1pt 1pt; font-family: calibri, sans-serif; font-size: 14pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 57pt;" width="76">L</td><td class="xl75" dir="LTR" style="background: rgb(231, 230, 230); border-color: white; border-image: initial; border-style: solid; border-width: 1.5pt 1pt 1pt; font-family: calibri, sans-serif; font-size: 12pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: bottom; width: 77pt;" width="103">Any</td><td class="xl75" dir="LTR" style="background: rgb(231, 230, 230); border-color: white; border-image: initial; border-style: solid; border-width: 1.5pt 1pt 1pt; font-family: calibri, sans-serif; font-size: 12pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: bottom; width: 69pt;" width="92">Optimization</td><td class="xl76" dir="LTR" style="background: rgb(231, 230, 230); border-color: white; border-image: initial; border-style: solid; border-width: 1.5pt 1pt 1pt; font-family: calibri, sans-serif; font-size: 12pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 139pt;" width="185"><a href="https://arxiv.org/pdf/1602.04938.pdf"><span style="color: black; text-decoration-line: none;">https://arxiv.org/pdf/1602.04938.pdf </span></a></td></tr><tr height="69" style="height: 52pt;"><td class="xl86" dir="LTR" height="69" style="background: rgb(231, 230, 230); border: 1pt solid white; font-family: calibri, sans-serif; font-size: 14pt; height: 52pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 100pt;" width="133">LRP</td><td class="xl67" dir="LTR" style="background: rgb(231, 230, 230); border: 1pt solid white; font-family: calibri, sans-serif; font-size: 14pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 57pt;" width="76">L / G</td><td class="xl78" dir="LTR" style="background: rgb(231, 230, 230); border: 1pt solid white; font-family: calibri, sans-serif; font-size: 12pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: bottom; width: 77pt;" width="103">Any</td><td class="xl78" dir="LTR" style="background: rgb(231, 230, 230); border: 1pt solid white; font-family: calibri, sans-serif; font-size: 12pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: bottom; width: 69pt;" width="92">Decomposition </td><td class="xl79" dir="LTR" style="background: rgb(231, 230, 230); border: 1pt solid white; font-family: calibri, sans-serif; font-size: 12pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 139pt;" width="185"><a href="https://arxiv.org/pdf/1604.00825.pdf"><span style="color: black; text-decoration-line: none;">https://arxiv.org/pdf/1604.00825.pdf </span></a></td></tr><tr height="109" style="height: 82pt;"><td class="xl86" dir="LTR" height="109" style="background: rgb(231, 230, 230); border: 1pt solid white; font-family: calibri, sans-serif; font-size: 14pt; height: 82pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 100pt;" width="133">LOCO</td><td class="xl67" dir="LTR" style="background: rgb(231, 230, 230); border: 1pt solid white; font-family: calibri, sans-serif; font-size: 14pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 57pt;" width="76">L</td><td class="xl81" dir="LTR" style="background: rgb(231, 230, 230); border: 1pt solid white; color: #24292e; font-family: calibri, sans-serif; font-size: 12pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: bottom; width: 77pt;" width="103">Structured*</td><td class="xl81" dir="LTR" style="background: rgb(231, 230, 230); border: 1pt solid white; color: #24292e; font-family: calibri, sans-serif; font-size: 12pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: bottom; width: 69pt;" width="92"> </td><td class="xl65" dir="LTR" style="background: rgb(231, 230, 230); border-color: white; border-image: initial; border-style: solid; border-width: 1.5pt 1pt 1pt; font-family: calibri, sans-serif; font-size: 14pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 139pt;" width="185"><a href="https://arxiv.org/pdf/1604.04173.pdf"><span style="color: black; font-size: 14pt; text-decoration-line: none;">https://arxiv.org/pdf/1604.04173.pdf </span></a></td></tr><tr height="69" style="height: 52pt;"><td class="xl86" dir="LTR" height="69" style="background: rgb(231, 230, 230); border: 1pt solid white; font-family: calibri, sans-serif; font-size: 14pt; height: 52pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 100pt;" width="133">LSTMVis</td><td class="xl67" dir="LTR" style="background: rgb(231, 230, 230); border: 1pt solid white; font-family: calibri, sans-serif; font-size: 14pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 57pt;" width="76">G</td><td class="xl78" dir="LTR" style="background: rgb(231, 230, 230); border: 1pt solid white; font-family: calibri, sans-serif; font-size: 12pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: bottom; width: 77pt;" width="103">Text</td><td class="xl78" dir="LTR" style="background: rgb(231, 230, 230); border: 1pt solid white; font-family: calibri, sans-serif; font-size: 12pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: bottom; width: 69pt;" width="92"> </td><td class="xl79" dir="LTR" style="background: rgb(231, 230, 230); border: 1pt solid white; font-family: calibri, sans-serif; font-size: 12pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 139pt;" width="185"><a href="https://arxiv.org/pdf/1811.00196.pdf"><span style="color: black; text-decoration-line: none;">https://arxiv.org/pdf/1811.00196.pdf </span></a></td></tr><tr height="115" style="height: 86pt;"><td class="xl86" dir="LTR" height="115" style="background: rgb(231, 230, 230); border: 1pt solid white; font-family: calibri, sans-serif; font-size: 14pt; height: 86pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 100pt;" width="133">MMD-critic</td><td class="xl68" style="background: rgb(231, 230, 230); border: 1pt solid white; color: windowtext; font-family: arial, sans-serif; font-size: 18pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: top; width: 57pt;" width="76"> </td><td class="xl78" dir="LTR" style="background: rgb(231, 230, 230); border: 1pt solid white; font-family: calibri, sans-serif; font-size: 12pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: bottom; width: 77pt;" width="103"> </td><td class="xl78" dir="LTR" style="background: rgb(231, 230, 230); border: 1pt solid white; font-family: calibri, sans-serif; font-size: 12pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: bottom; width: 69pt;" width="92">Prototypes and Criticisms</td><td class="xl79" dir="LTR" style="background: rgb(231, 230, 230); border: 1pt solid white; font-family: calibri, sans-serif; font-size: 12pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 139pt;" width="185"><a href="https://people.csail.mit.edu/beenkim/papers/KIM2016NIPS_MMD.pdf"><span style="color: black; text-decoration-line: none;">https://people.csail.mit.edu/beenkim/papers/KIM2016NIPS_MMD.pdf </span></a></td></tr><tr height="47" style="height: 35pt;"><td class="xl86" dir="LTR" height="47" style="background: rgb(231, 230, 230); border: 1pt solid white; font-family: calibri, sans-serif; font-size: 14pt; height: 35pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 100pt;" width="133">PDP</td><td class="xl67" dir="LTR" style="background: rgb(231, 230, 230); border: 1pt solid white; font-family: calibri, sans-serif; font-size: 14pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 57pt;" width="76">G</td><td class="xl81" dir="LTR" style="background: rgb(231, 230, 230); border: 1pt solid white; color: #24292e; font-family: calibri, sans-serif; font-size: 12pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: bottom; width: 77pt;" width="103">Structured</td><td class="xl81" dir="LTR" style="background: rgb(231, 230, 230); border: 1pt solid white; color: #24292e; font-family: calibri, sans-serif; font-size: 12pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: bottom; width: 69pt;" width="92">Visualization</td><td class="xl79" dir="LTR" style="background: rgb(231, 230, 230); border: 1pt solid white; font-family: calibri, sans-serif; font-size: 12pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 139pt;" width="185"> </td></tr><tr height="32" style="height: 24pt;"><td class="xl86" dir="LTR" height="32" style="background: rgb(231, 230, 230); border: 1pt solid white; font-family: calibri, sans-serif; font-size: 14pt; height: 24pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 100pt;" width="133">PCA</td><td class="xl68" style="background: rgb(231, 230, 230); border: 1pt solid white; color: windowtext; font-family: arial, sans-serif; font-size: 18pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: top; width: 57pt;" width="76"> </td><td class="xl78" dir="LTR" style="background: rgb(231, 230, 230); border: 1pt solid white; font-family: calibri, sans-serif; font-size: 12pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: bottom; width: 77pt;" width="103">Any</td><td class="xl78" dir="LTR" style="background: rgb(231, 230, 230); border: 1pt solid white; font-family: calibri, sans-serif; font-size: 12pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: bottom; width: 69pt;" width="92">Correlation</td><td class="xl79" dir="LTR" style="background: rgb(231, 230, 230); border: 1pt solid white; font-family: calibri, sans-serif; font-size: 12pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 139pt;" width="185"> </td></tr><tr height="69" style="height: 52pt;"><td class="xl86" dir="LTR" height="69" style="background: rgb(231, 230, 230); border: 1pt solid white; font-family: calibri, sans-serif; font-size: 14pt; height: 52pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 100pt;" width="133">SHAP</td><td class="xl67" dir="LTR" style="background: rgb(231, 230, 230); border: 1pt solid white; font-family: calibri, sans-serif; font-size: 14pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 57pt;" width="76">L / G</td><td class="xl79" dir="LTR" style="background: rgb(231, 230, 230); border: 1pt solid white; font-family: calibri, sans-serif; font-size: 12pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 77pt;" width="103">Any</td><td class="xl79" dir="LTR" style="background: rgb(231, 230, 230); border: 1pt solid white; font-family: calibri, sans-serif; font-size: 12pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 69pt;" width="92"> </td><td class="xl79" dir="LTR" style="background: rgb(231, 230, 230); border: 1pt solid white; font-family: calibri, sans-serif; font-size: 12pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 139pt;" width="185"><a href="https://arxiv.org/pdf/1705.07874.pdf"><span style="color: black; text-decoration-line: none;">https://arxiv.org/pdf/1705.07874.pdf </span></a></td></tr><tr height="69" style="height: 52pt;"><td class="xl86" dir="LTR" height="69" style="background: rgb(231, 230, 230); border: 1pt solid white; font-family: calibri, sans-serif; font-size: 14pt; height: 52pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 100pt;" width="133">TCAV</td><td class="xl68" style="background: rgb(231, 230, 230); border: 1pt solid white; color: windowtext; font-family: arial, sans-serif; font-size: 18pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: top; width: 57pt;" width="76"> </td><td class="xl78" dir="LTR" style="background: rgb(231, 230, 230); border: 1pt solid white; font-family: calibri, sans-serif; font-size: 12pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: bottom; width: 77pt;" width="103">Any</td><td class="xl78" dir="LTR" style="background: rgb(231, 230, 230); border: 1pt solid white; font-family: calibri, sans-serif; font-size: 12pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: bottom; width: 69pt;" width="92">Sensitivity</td><td class="xl79" dir="LTR" style="background: rgb(231, 230, 230); border: 1pt solid white; font-family: calibri, sans-serif; font-size: 12pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 139pt;" width="185"><a href="https://arxiv.org/pdf/1711.11279.pdf"><span style="color: black; text-decoration-line: none;">https://arxiv.org/pdf/1711.11279.pdf </span></a></td></tr><tr height="55" style="height: 41pt;"><td class="xl86" dir="LTR" height="55" style="background: rgb(231, 230, 230); border: 1pt solid white; font-family: calibri, sans-serif; font-size: 14pt; height: 41pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 100pt;" width="133">treeinterpreter</td><td class="xl67" dir="LTR" style="background: rgb(231, 230, 230); border: 1pt solid white; font-family: calibri, sans-serif; font-size: 14pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 57pt;" width="76">L / G</td><td class="xl67" dir="LTR" style="background: rgb(231, 230, 230); border: 1pt solid white; font-family: calibri, sans-serif; font-size: 14pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 77pt;" width="103">Structured</td><td class="xl67" dir="LTR" style="background: rgb(231, 230, 230); border: 1pt solid white; font-family: calibri, sans-serif; font-size: 14pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 69pt;" width="92">Optimization</td><td class="xl80" style="background: rgb(231, 230, 230); border: 1pt solid white; color: windowtext; font-family: arial, sans-serif; font-size: 18pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: top; width: 139pt;" width="185"> </td></tr><tr height="32" style="height: 24pt;"><td class="xl86" dir="LTR" height="32" style="background: rgb(231, 230, 230); border: 1pt solid white; font-family: calibri, sans-serif; font-size: 14pt; height: 24pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 100pt;" width="133">T-SNE</td><td class="xl68" style="background: rgb(231, 230, 230); border: 1pt solid white; color: windowtext; font-family: arial, sans-serif; font-size: 18pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: top; width: 57pt;" width="76"> </td><td class="xl67" dir="LTR" style="background: rgb(231, 230, 230); border: 1pt solid white; font-family: calibri, sans-serif; font-size: 14pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 77pt;" width="103">Any</td><td class="xl67" dir="LTR" style="background: rgb(231, 230, 230); border: 1pt solid white; font-family: calibri, sans-serif; font-size: 14pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 69pt;" width="92">Clustering</td><td class="xl80" style="background: rgb(231, 230, 230); border: 1pt solid white; color: windowtext; font-family: arial, sans-serif; font-size: 18pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: top; width: 139pt;" width="185"> </td></tr><tr height="68" style="height: 51pt;"><td class="xl87" dir="LTR" height="68" style="background: rgb(231, 230, 230); border: 1pt solid white; font-family: calibri, sans-serif; font-size: 14pt; height: 51pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 100pt;" width="133">XRAI</td><td class="xl69" dir="LTR" style="background: rgb(231, 230, 230); border: 1pt solid white; font-family: calibri, sans-serif; font-size: 14pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 57pt;" width="76">L</td><td class="xl69" dir="LTR" style="background: rgb(231, 230, 230); border: 1pt solid white; font-family: calibri, sans-serif; font-size: 14pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: middle; width: 77pt;" width="103">Image</td><td class="xl88" style="background: rgb(231, 230, 230); border: 1pt solid white; color: windowtext; font-family: arial, sans-serif; font-size: 18pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; vertical-align: top; width: 69pt;" width="92"> </td><td class="xl89" dir="LTR" style="background: rgb(231, 230, 230); border: 1pt solid white; color: #0563c1; font-family: calibri, sans-serif; font-size: 12pt; padding-left: 1px; padding-right: 1px; padding-top: 1px; text-decoration-line: underline; vertical-align: middle; width: 139pt;" width="185"><a href="https://arxiv.org/pdf/1906.02825.pdf">https://arxiv.org/pdf/1906.02825.pdf </a></td></tr></tbody></table><br /></div><div>here is the link if you want to see how to use some of above algos o IRIS dataset- <a href="http://machinelearningstories.blogspot.com/2020/05/desmontaje-del-conjunto-de-datos-de.html" target="_blank">XAI on iris dataset</a></div>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-8230692877620938204.post-32101122092836587622020-05-27T21:35:00.010-07:002020-05-28T21:41:44.637-07:00Stripping iris dataset with 6 explainability Algorithms./ Desmontaje del conjunto de datos de iris con 6 algoritmos explicables.<div><b style="background-color: white; color: #202122; font-family: sans-serif;"><font size="4"><br /></font></b></div><p class="MsoNormal" style="font-family: calibri, sans-serif; margin: 0cm 0cm 0.0001pt;"><font size="4">Explainable AI (XAI) refers to methods and techniques in the application of AI, such that the results of the solution can be understood by human experts. It contrasts with the concept of the 'blackbox in machine learning where even their designers cannot explain why the AI arrived at a specific decision. XAI is an implementation of the <i>social right to explanation.</i><o:p></o:p></font></p><p class="MsoNormal" style="font-family: calibri, sans-serif; margin: 0cm 0cm 0.0001pt;"><b><font size="4"><br /></font></b></p><p class="MsoNormal" style="margin: 0cm 0cm 0.0001pt;"><font size="4"><span style="font-family: calibri, sans-serif;">Here I have taken iris dataset to build a Random Forest. My focus is on providing data explainability, model </span><font face="Calibri, sans-serif">explainability aka global explainability</font><span style="font-family: calibri, sans-serif;"> and prediction aka local </span><font face="Calibri, sans-serif">explainability</font><span style="font-family: calibri, sans-serif;">. </span></font></p><p class="MsoNormal" style="font-family: calibri, sans-serif; font-size: 12pt; margin: 0cm 0cm 0.0001pt;"><br /></p><p class="MsoNormal" style="font-family: calibri, sans-serif; font-size: 12pt; margin: 0cm 0cm 0.0001pt;">Here is your Iris datset- </p><p class="MsoNormal" style="font-family: calibri, sans-serif; font-size: 12pt; margin: 0cm 0cm 0.0001pt;"><br /></p><p class="MsoNormal" style="margin: 0cm 0cm 0.0001pt;"><font color="#3367d6" face="Calibri, sans-serif"><i>## loading data set</i></font></p><p class="MsoNormal" style="margin: 0cm 0cm 0.0001pt;"><font color="#3367d6" face="Calibri, sans-serif"><i>from sklearn import datasets</i></font></p><p class="MsoNormal" style="margin: 0cm 0cm 0.0001pt;"><font color="#3367d6" face="Calibri, sans-serif"><i>iris = datasets.load_iris()</i></font></p><p class="MsoNormal" style="margin: 0cm 0cm 0.0001pt;"><font color="#3367d6" face="Calibri, sans-serif"><i>X_df= pd.DataFrame(X, columns=iris.feature_names)</i></font></p><p class="MsoNormal" style="margin: 0cm 0cm 0.0001pt;"><font color="#3367d6" face="Calibri, sans-serif"><i>X=iris.data</i></font></p><p class="MsoNormal" style="margin: 0cm 0cm 0.0001pt;"><font color="#3367d6" face="Calibri, sans-serif"><i>Y = iris.target</i></font></p><p class="MsoNormal" style="font-family: calibri, sans-serif; font-size: 12pt; margin: 0cm 0cm 0.0001pt;"><br /></p><p class="MsoNormal" style="font-family: calibri, sans-serif; font-size: 12pt; margin: 0cm 0cm 0.0001pt;"> <a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiF91VVQGYn36SVxgL5fwtYo0fbKQw8r6kqhHECQN_TiVRxOlVB7d9jKucCLDPVxtlnPKWhpnsp8s20QCUjQvYVdsioA22FhUspwCPpLt9t3aNLXUAYGQ7XHdueneCx7ssy5XMbgkXiEBY/" style="font-family: Times; font-size: medium; margin-left: 1em; margin-right: 1em; text-align: center;"><img border="0" data-original-height="720" data-original-width="1012" height="285" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiF91VVQGYn36SVxgL5fwtYo0fbKQw8r6kqhHECQN_TiVRxOlVB7d9jKucCLDPVxtlnPKWhpnsp8s20QCUjQvYVdsioA22FhUspwCPpLt9t3aNLXUAYGQ7XHdueneCx7ssy5XMbgkXiEBY/w400-h285/Screenshot+2020-05-27+at+9.13.11+PM.png" width="400" /></a></p><p class="MsoNormal" style="font-family: calibri, sans-serif; font-size: 12pt; margin: 0cm 0cm 0.0001pt;"><br /></p><p class="MsoNormal" style="font-family: calibri, sans-serif; margin: 0cm 0cm 0.0001pt;"><b><font size="4">1)<u> <i>Data explainability</i></u> through IBM AIX 360's Protodash- (https://arxiv.org/abs/1707.01212)-</font></b></p><p class="MsoNormal" style="font-family: calibri, sans-serif; margin: 0cm 0cm 0.0001pt;"><b><font size="4"><br /></font></b></p><p class="MsoNormal" style="font-family: calibri, sans-serif; margin: 0cm 0cm 0.0001pt;"><font size="4">This algo provides prototypes( samples) of original dataset. So one can see only 10 data points representing entire dataset( million observations) .</font></p><p class="MsoNormal" style="font-family: calibri, sans-serif; margin: 0cm 0cm 0.0001pt;"><font size="4">Here I want to have only 10 data points as representative of entire dataset.</font></p><p class="MsoNormal" style="font-family: calibri, sans-serif; margin: 0cm 0cm 0.0001pt;"><b><font size="4"><br /></font></b></p><p class="MsoNormal" style="margin: 0cm 0cm 0.0001pt;"><font color="#3367d6" size="4"><br /></font></p><p class="MsoNormal" style="margin: 0cm 0cm 0.0001pt;"><font color="#3367d6" face="Calibri, sans-serif" size="2"><i>from aix360.algorithms.protodash import ProtodashExplainer, get_Gaussian_Data</i></font></p><p class="MsoNormal" style="margin: 0cm 0cm 0.0001pt;"><font color="#3367d6" face="Calibri, sans-serif" size="2"><i>explainer = ProtodashExplainer()</i></font></p><p class="MsoNormal" style="margin: 0cm 0cm 0.0001pt;"><font color="#3367d6" face="Calibri, sans-serif" size="2"><i>(W, S, _) = explainer.explain(X, X, m=10)</i></font></p><p class="MsoNormal" style="margin: 0cm 0cm 0.0001pt;"><font color="#3367d6" face="Calibri, sans-serif" size="2"><i><br /></i></font></p><p class="MsoNormal" style="margin: 0cm 0cm 0.0001pt;"><font color="#3367d6" face="Calibri, sans-serif" size="2"><i># Display the prototypes along with their computed weights</i></font></p><p class="MsoNormal" style="margin: 0cm 0cm 0.0001pt;"><font color="#3367d6" face="Calibri, sans-serif" size="2"><i>inc_prototypes = X_df.iloc[S, :].copy()</i></font></p><p class="MsoNormal" style="margin: 0cm 0cm 0.0001pt;"><font color="#3367d6" face="Calibri, sans-serif" size="2"><i># Compute normalized importance weights for prototypes</i></font></p><p class="MsoNormal" style="margin: 0cm 0cm 0.0001pt;"><font color="#3367d6" face="Calibri, sans-serif" size="2"><i>inc_prototypes["Weights of Prototypes"] = np.around(W/np.sum(W), 2) </i></font></p><p class="MsoNormal" style="margin: 0cm 0cm 0.0001pt;"></p><p class="MsoNormal" style="margin: 0cm 0cm 0.0001pt;"><font color="#3367d6" face="Calibri, sans-serif" size="2"><i>inc_prototypes</i></font></p><p class="MsoNormal" style="margin: 0cm 0cm 0.0001pt;"><font color="#3367d6" face="Calibri, sans-serif"><i><br /></i></font></p><p class="MsoNormal" style="margin: 0cm 0cm 0.0001pt;"><font color="#3367d6" face="Calibri, sans-serif"></font></p><div class="separator" style="clear: both; text-align: center;"><font color="#3367d6" face="Calibri, sans-serif"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh7v-VyQqQNnMwq8owWKB8xl9KfDk35KL2PYEbKNhumMn4jTMV1J_-RL0NiE0rb13D0gwS6mg7Ty8WKVJJKJRgv5lpQv7tQWetZte_qf_e3Loe-v9bVRRl9b0QH-KaU3005ySpFkgqxhpo/" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="642" data-original-width="1264" height="204" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh7v-VyQqQNnMwq8owWKB8xl9KfDk35KL2PYEbKNhumMn4jTMV1J_-RL0NiE0rb13D0gwS6mg7Ty8WKVJJKJRgv5lpQv7tQWetZte_qf_e3Loe-v9bVRRl9b0QH-KaU3005ySpFkgqxhpo/w400-h204/Screenshot+2020-05-27+at+9.25.45+PM.png" width="400" /></a></font></div><div class="separator" style="clear: both; text-align: center;"><font color="#3367d6" face="Calibri, sans-serif"><br /></font></div><font color="#3367d6" face="Calibri, sans-serif"><i><font size="2"><br /></font></i></font><p></p><p class="MsoNormal" style="font-family: calibri, sans-serif; margin: 0cm 0cm 0.0001pt;"></p><ul style="text-align: left;"><li><font size="4">The data is represented using 10 observations.</font></li><li><font size="4">These 10 data points are coming from input data only. See the row index.</font></li><li><font size="4">Weights of prototype represents total percentage of similar data points in entire data. Sum of these should be 1.</font></li></ul><div><font size="4">-----------------------------------------------------------------------------------------------</font></div><p></p><p class="MsoNormal" style="font-family: calibri, sans-serif; margin: 0cm 0cm 0.0001pt;"><font size="4"><br /></font></p><p class="MsoNormal" style="font-family: calibri, sans-serif; margin: 0cm 0cm 0.0001pt;"><font size="4"><br /></font></p><p class="MsoNormal" style="font-family: calibri, sans-serif; margin: 0cm 0cm 0.0001pt;"><font size="4"><br /></font></p><p class="MsoNormal" style="font-family: calibri, sans-serif; margin: 0cm 0cm 0.0001pt;"><font size="4"><b>2) <i><u>Global Explainability</u></i> through SHAP's treeexplainer- (</b></font><span style="background-color: #e9ebf5; font-family: "segoe ui", "segoe ui web", arial, verdana, sans-serif; font-size: 12px;">https://arxiv.org/pdf/1705.07874.pdf)</span></p><p class="MsoNormal" style="font-family: calibri, sans-serif; margin: 0cm 0cm 0.0001pt;"><font size="4"><b><br /></b></font></p><p class="MsoNormal" style="font-family: calibri, sans-serif; font-size: 12pt; margin: 0cm 0cm 0.0001pt;"><span style="font-size: 13.5pt;">SHAP (SHapley Additive exPlanations) is a game theoretic approach to explain the output of any machine learning model. It connects optimal credit allocation with local explanations using the classic Shapley values from game theory and their related extensions<o:p></o:p></span></p><p class="MsoNormal" style="font-family: calibri, sans-serif; font-size: 12pt; margin: 0cm 0cm 0.0001pt;"><span style="font-size: 13.5pt;">global explanation refers to explaining over-all feature importance in classification.It gives most important variables in model building.<o:p></o:p></span></p><p class="MsoNormal" style="font-family: calibri, sans-serif; margin: 0cm 0cm 0.0001pt;"><font size="4"><b style="background-color: white;"><br /></b></font></p><p class="MsoNormal" style="margin: 0cm 0cm 0.0001pt;"><font color="#3367d6" face="Calibri, sans-serif" size="2"><i style="background-color: white;">import shap</i></font></p><p class="MsoNormal" style="margin: 0cm 0cm 0.0001pt;"><font color="#3367d6" face="Calibri, sans-serif" size="2"><i style="background-color: white;">explainer = shap.TreeExplainer(model, data=X_df)</i></font></p><p class="MsoNormal" style="margin: 0cm 0cm 0.0001pt;"><font color="#3367d6" face="Calibri, sans-serif" size="2"><i style="background-color: white;">shap_values = explainer.shap_values(X_df, check_additivity=False)</i></font></p><p class="MsoNormal" style="margin: 0cm 0cm 0.0001pt;"></p><p class="MsoNormal" style="margin: 0cm 0cm 0.0001pt;"><font color="#3367d6" face="Calibri, sans-serif" size="2"><i style="background-color: white;">shap.summary_plot(shap_values, X-df, plot_type="bar")</i></font></p><p class="MsoNormal" style="margin: 0cm 0cm 0.0001pt;"><font color="#3367d6" face="Calibri, sans-serif" size="2"><i style="background-color: white;"><br /></i></font></p><p class="MsoNormal" style="margin: 0cm 0cm 0.0001pt;"><font color="#3367d6" face="Calibri, sans-serif" size="2"></font></p><div class="separator" style="clear: both; text-align: center;"><font color="#3367d6" face="Calibri, sans-serif" size="2"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjyGIAPXsKNXdOZvLB-0cciDP7Cr1QthvIBJ-o4y-vB8kwoUW2YjB2RR-1VIaU0xSsLoFhwUTjcgIip5Ee8OOf4KdLWprx84N2vPkulS3wpllIxJxjG5cJ50Ro2o3TyRpMFXJHW_9OeOv0/" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="448" data-original-width="1206" height="149" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjyGIAPXsKNXdOZvLB-0cciDP7Cr1QthvIBJ-o4y-vB8kwoUW2YjB2RR-1VIaU0xSsLoFhwUTjcgIip5Ee8OOf4KdLWprx84N2vPkulS3wpllIxJxjG5cJ50Ro2o3TyRpMFXJHW_9OeOv0/w400-h149/Screenshot+2020-05-27+at+9.39.42+PM.png" width="400" /></a></font></div><div class="separator" style="clear: both; text-align: center;"><br /></div><p class="MsoNormal" style="font-family: calibri, sans-serif; font-size: 12pt; margin: 0cm 0cm 0.0001pt;"></p><ul style="text-align: left;"><li><span style="font-size: 12pt;">colors in the bar represents relative contribution in classifying three classes. </span></li><li>Over all petal length and width are more important that sepal length and width. </li><li>This is kind of variable importance we get many many algos.</li></ul><div><span style="font-size: large;">-----------------------------------------------------------------------------------------------</span></div><p class="MsoNormal" style="font-family: calibri, sans-serif; font-size: 12pt; margin: 0cm 0cm 0.0001pt;"><br /></p><p class="MsoNormal" style="font-family: calibri, sans-serif; font-size: 12pt; margin: 0cm 0cm 0.0001pt;"><br /></p><p class="MsoNormal" style="font-family: calibri, sans-serif; font-size: 12pt; margin: 0cm 0cm 0.0001pt;"><br /></p><p class="MsoNormal" style="font-family: calibri, sans-serif; margin: 0cm 0cm 0.0001pt;"><font size="4" style="background-color: white;"><b>3) Global explainability through IBM360 LRR ( Logistic rule regression )- ( </b></font><a href="http://proceedings.mlr.press/v97/wei19a.html" rel="nofollow" style="background-color: white; box-shadow: none; box-sizing: border-box; color: #0366d6; font-family: -apple-system, system-ui, "Segoe UI", Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji"; font-size: 16px; outline: none;">Wei et al., 2019</a>)</p><p class="MsoNormal" style="font-family: calibri, sans-serif; margin: 0cm 0cm 0.0001pt;"><font size="4" style="background-color: white;"><b><br /></b></font></p><p class="MsoNormal" style="font-family: calibri, sans-serif; margin: 0cm 0cm 0.0001pt;"><font size="4" style="background-color: white;"></font></p><pre style="background-color: #f7f7f7; border-radius: 2px; border: 1px solid rgb(204, 204, 204); box-sizing: border-box; caret-color: rgb(0, 0, 0); font-size: 14px; line-height: 1.21429em; margin-bottom: 9px; margin-top: 0px; overflow-wrap: break-word; overflow: auto; padding: 0.4em; text-size-adjust: auto; white-space: pre-wrap; word-break: break-all;">Logistic Rule Regression is a directly interpretable supervised learning
method that performs logistic regression on rule-based features</pre><p class="MsoNormal" style="font-family: calibri, sans-serif; margin: 0cm 0cm 0.0001pt;"><font size="4" style="background-color: white;"><b><br /></b></font></p><p class="MsoNormal" style="margin: 0cm 0cm 0.0001pt;"><font color="#3367d6" face="Calibri, sans-serif" size="2"># Generalized Linear Rule Models</font></p><p class="MsoNormal" style="margin: 0cm 0cm 0.0001pt;"><font color="#3367d6" face="Calibri, sans-serif" size="2">from aix360.algorithms.rbm import FeatureBinarizer</font></p><p class="MsoNormal" style="margin: 0cm 0cm 0.0001pt;"><font color="#3367d6" face="Calibri, sans-serif" size="2">from aix360.algorithms.rbm import LogisticRuleRegression</font></p><p class="MsoNormal" style="margin: 0cm 0cm 0.0001pt;"><font color="#3367d6" face="Calibri, sans-serif" size="2">from sklearn.metrics import accuracy_score</font></p><p class="MsoNormal" style="margin: 0cm 0cm 0.0001pt;"><font color="#3367d6" face="Calibri, sans-serif" size="2">fb = FeatureBinarizer(negations=True, returnOrd=True)</font></p><p class="MsoNormal" style="margin: 0cm 0cm 0.0001pt;"><font color="#3367d6" face="Calibri, sans-serif" size="2">dfTrain, dfTrainStd = fb.fit_transform(X_df)</font></p><p class="MsoNormal" style="margin: 0cm 0cm 0.0001pt;"><font color="#3367d6" face="Calibri, sans-serif" size="2"><br /></font></p><p class="MsoNormal" style="margin: 0cm 0cm 0.0001pt;"><font color="#3367d6" face="Calibri, sans-serif" size="2">lrr = LogisticRuleRegression(lambda0=0.005, lambda1=0.001, useOrd=True, maxSolverIter=10000)</font></p><p class="MsoNormal" style="margin: 0cm 0cm 0.0001pt;"><font color="#3367d6" face="Calibri, sans-serif" size="2">lrr.fit(dfTrain, Y, dfTrainStd)</font></p><p class="MsoNormal" style="margin: 0cm 0cm 0.0001pt;"><font color="#3367d6" face="Calibri, sans-serif" size="2">print('Training accuracy:', accuracy_score(Y, lrr.predict(dfTrain, dfTrainStd)))</font></p><p class="MsoNormal" style="margin: 0cm 0cm 0.0001pt;"><font color="#3367d6" face="Calibri, sans-serif" size="2">print('where z is a linear combination of the following rules/numerical features:')</font></p><p class="MsoNormal" style="margin: 0cm 0cm 0.0001pt;"><font color="#3367d6" face="Calibri, sans-serif" size="2">lrr.fit(dfTrain, y, dfTrainStd)</font></p><p class="MsoNormal" style="margin: 0cm 0cm 0.0001pt;"><font color="#3367d6" face="Calibri, sans-serif" size="2" style="background-color: white;"></font></p><p class="MsoNormal" style="margin: 0cm 0cm 0.0001pt;"><font color="#3367d6" face="Calibri, sans-serif" size="2">lrr.explain()</font></p><p class="MsoNormal" style="margin: 0cm 0cm 0.0001pt;"><font color="#3367d6" face="Calibri, sans-serif" size="2"><br /></font></p><p class="MsoNormal" style="margin: 0cm 0cm 0.0001pt;"></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgk-0rmHvvxDwYthYkEjmuSh2mLCI0m7twKXgNtZgr5gKwFry6q9JPVlcPVrLQf8gU-V0ZrSbmmJlZcJXOrcim79xCT-J1JCS5ZuEotKwkalvOp16lkjY9zeI_vAmybem5KGrNzzpKY274/" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="670" data-original-width="748" height="359" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgk-0rmHvvxDwYthYkEjmuSh2mLCI0m7twKXgNtZgr5gKwFry6q9JPVlcPVrLQf8gU-V0ZrSbmmJlZcJXOrcim79xCT-J1JCS5ZuEotKwkalvOp16lkjY9zeI_vAmybem5KGrNzzpKY274/w400-h359/Screenshot+2020-05-28+at+9.09.47+AM.png" width="400" /></a></div><font color="#3367d6" face="Calibri, sans-serif" size="2"><br /></font><p></p><p class="MsoNormal" style="font-family: calibri, sans-serif; font-size: 12pt; margin: 0cm 0cm 0.0001pt;"></p><ul style="text-align: left;"><li>so LRR builds a surrogate model and provides importance of rules created from the features.</li><li>This is important as one can see overall importance of a variable but even the over-all importance may vary for qualities/range of the same variable.</li><li>In the above example sepal width <=3 is better classifier than sepal width <=3.2.</li></ul><p></p><p class="MsoNormal" style="font-family: calibri, sans-serif; font-size: 12pt; margin: 0cm 0cm 0.0001pt;"><br /></p><p class="MsoNormal" style="font-family: calibri, sans-serif; font-size: 12pt; margin: 0cm 0cm 0.0001pt;"></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj3Zz2zCTR7-GqjZs5yOXovugU6Sd5NITmRfzYdmJ9iig2Th0QjYA6GFnhQIbLo2ag0sR_X2LGMDhWiYHmK2t_vetkOqwBbAyXTMOQ82vs6vh6GSq3pVVRHbuLKwU-MQve1cwmbJBBWEDA/" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="556" data-original-width="886" height="251" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj3Zz2zCTR7-GqjZs5yOXovugU6Sd5NITmRfzYdmJ9iig2Th0QjYA6GFnhQIbLo2ag0sR_X2LGMDhWiYHmK2t_vetkOqwBbAyXTMOQ82vs6vh6GSq3pVVRHbuLKwU-MQve1cwmbJBBWEDA/w400-h251/Screenshot+2020-05-28+at+9.17.46+AM.png" width="400" /></a></div><div class="separator" style="clear: both; text-align: center;"><br /></div><blockquote style="border: none; margin: 0px 0px 0px 40px; padding: 0px;"><div class="separator" style="clear: both; text-align: center;"><blockquote style="border: none; margin: 0px 0px 0px 40px; padding: 0px;"><div class="separator" style="clear: both; text-align: center;"><br /></div></blockquote></div></blockquote><div class="separator" style="clear: both; text-align: center;"><span style="font-size: large; text-align: left;">-----------------------------------------------------------------------------------------------</span></div><div><br /></div><div> </div><p class="MsoNormal" style="font-family: calibri, sans-serif; font-size: 12pt; margin: 0cm 0cm 0.0001pt;"><b><font size="4">4) <i><u>Local explanation</u></i> through LIME Tabular- </font></b><span style="background-color: #cfd5ea; font-family: "segoe ui", "segoe ui web", arial, verdana, sans-serif; font-size: 12px;">https://arxiv.org/pdf/1602.04938.pdf</span></p><p class="MsoNormal" style="font-family: calibri, sans-serif; font-size: 12pt; margin: 0cm 0cm 0.0001pt;"><b><font size="4"><br /></font></b></p><p class="MsoNormal" style="font-family: calibri, sans-serif; font-size: 12pt; margin: 0cm 0cm 0.0001pt;"><font size="4">As we see in LRR, a variable may be important in classifying most of the instances but may not be important for all the instances. ( instance=data point).</font></p><p class="MsoNormal" style="font-family: calibri, sans-serif; font-size: 12pt; margin: 0cm 0cm 0.0001pt;"><font size="4">To see the feature importance of a particular prediction, we should look for local explainability.</font></p><p class="MsoNormal" style="font-family: calibri, sans-serif; font-size: 12pt; margin: 0cm 0cm 0.0001pt;"><font size="4"><br /></font></p><p class="MsoNormal" style="font-family: calibri, sans-serif; font-size: 12pt; margin: 0cm 0cm 0.0001pt;"><font size="4">There is a lot of work already happened.I am presenting a few-</font></p><p class="MsoNormal" style="font-family: calibri, sans-serif; font-size: 12pt; margin: 0cm 0cm 0.0001pt;"><b><font size="4"><br /></font></b></p><p class="MsoNormal" style="font-family: calibri, sans-serif; font-size: 12pt; margin: 0cm 0cm 0.0001pt;"><font color="#3367d6" face="calibri, sans-serif" size="2">import lime</font></p><p class="MsoNormal" style="font-family: calibri, sans-serif; font-size: 12pt; margin: 0cm 0cm 0.0001pt;"><font color="#3367d6" face="calibri, sans-serif" size="2">explainer = lime.lime_tabular.LimeTabularExplainer(X,</font></p><p class="MsoNormal" style="font-family: calibri, sans-serif; font-size: 12pt; margin: 0cm 0cm 0.0001pt;"><font color="#3367d6" face="calibri, sans-serif" size="2"> feature_names=X_df.columns, class_names=iris.target_names, discretize_continuous=True)</font></p><p class="MsoNormal" style="font-family: calibri, sans-serif; font-size: 12pt; margin: 0cm 0cm 0.0001pt;"><font color="#3367d6" face="calibri, sans-serif" size="2"><br /></font></p><p class="MsoNormal" style="font-family: calibri, sans-serif; font-size: 12pt; margin: 0cm 0cm 0.0001pt;"><font color="#3367d6" face="calibri, sans-serif" size="2">exp = explainer.explain_instance(X_df.iloc[1,:], model.predict_proba, num_features=10, top_labels=1)</font></p><p class="MsoNormal" style="font-family: calibri, sans-serif; font-size: 12pt; margin: 0cm 0cm 0.0001pt;"></p><p class="MsoNormal" style="font-family: calibri, sans-serif; font-size: 12pt; margin: 0cm 0cm 0.0001pt;"><font color="#3367d6" face="calibri, sans-serif" size="2">exp.show_in_notebook(show_table=True, show_all=False)</font></p><p class="MsoNormal" style="font-family: calibri, sans-serif; font-size: 12pt; margin: 0cm 0cm 0.0001pt;"><font color="#3367d6" face="calibri, sans-serif" size="2"><br /></font></p><p class="MsoNormal" style="font-family: calibri, sans-serif; font-size: 12pt; margin: 0cm 0cm 0.0001pt;"><br /></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEigM5y1ahZsp4POfqHJZTQ9PTIXSnnsPrzEp82xNCZfInQ0WObiOBZkpDS_Kf4hWcYzjKGvw62MI0tY7GBd5ruNG4XOU9FlWSnHV5FZ9od4eROHTafo8rKFjbSsblCCLlXxwB9672mp_1o/" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="328" data-original-width="1650" height="80" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEigM5y1ahZsp4POfqHJZTQ9PTIXSnnsPrzEp82xNCZfInQ0WObiOBZkpDS_Kf4hWcYzjKGvw62MI0tY7GBd5ruNG4XOU9FlWSnHV5FZ9od4eROHTafo8rKFjbSsblCCLlXxwB9672mp_1o/w400-h80/Screenshot+2020-05-28+at+9.19.43+AM.png" width="400" /></a></div><p class="MsoNormal" style="font-family: calibri, sans-serif; font-size: 12pt; margin: 0cm 0cm 0.0001pt;"><br /></p><p class="MsoNormal" style="font-family: calibri, sans-serif; margin: 0cm 0cm 0.0001pt;"></p><ul style="text-align: left;"><li><font size="4">Predicted probabilities are output class probabilities.</font></li><li><font size="4">Horizontal bar plot shows features contributing to output class which is setosa in above example. Coffecient .49, .41 represents relative importance of features,</font></li><li><font size="4">Values of feature for that instance is also given in 3rd table.[ feature-value table]</font></li><li><font size="4">Explanation is for a datapoint- </font><span style="color: #3367d6; font-family: calibri, sans-serif; font-size: small;">X_df.iloc[1,:]. </span></li></ul><div><span style="font-size: large;">-----------------------------------------------------------------------------------------------</span></div><div><span style="font-size: large;"><br /></span></div><div><br /></div><p class="MsoNormal" style="font-family: calibri, sans-serif; margin: 0cm 0cm 0.0001pt;"><font size="4"><b><br /></b></font></p><p class="MsoNormal" style="font-family: calibri, sans-serif; margin: 0cm 0cm 0.0001pt;"><font size="4"><b>5) Local explanation through treeinterpreter- (</b></font><a href="https://pypi.org/project/treeinterpreter/">https://pypi.org/project/treeinterpreter/</a>)</p><p class="MsoNormal" style="font-family: calibri, sans-serif; margin: 0cm 0cm 0.0001pt;"><br /></p><p class="MsoNormal" style="font-family: calibri, sans-serif; font-size: 12pt; margin: 0cm 0cm 0.0001pt;"><span style="font-size: 13.5pt;">TreeInterpreter decomposes the predictions into the bias term (which is just the trainset mean) and individual feature contributions, so one can see which features contributed to the difference and by how much.</span></p><p class="MsoNormal" style="font-family: calibri, sans-serif; font-size: 12pt; margin: 0cm 0cm 0.0001pt;"><span style="font-size: 13.5pt;">[line 2 in below code]<o:p></o:p></span></p><p class="MsoNormal" style="font-family: calibri, sans-serif; font-size: 12pt; margin: 0cm 0cm 0.0001pt;"><span style="font-size: 13.5pt;"><br /></span></p><p class="MsoNormal" style="margin: 0cm 0cm 0.0001pt;"><font color="#3367d6" face="Calibri, sans-serif" size="2">from treeinterpreter import treeinterpreter as ti</font></p><p class="MsoNormal" style="margin: 0cm 0cm 0.0001pt;"><font color="#3367d6" face="Calibri, sans-serif" size="2">prediction, bias, contributions = ti.predict(model, X_test)</font></p><p class="MsoNormal" style="margin: 0cm 0cm 0.0001pt;"><font color="#3367d6" face="Calibri, sans-serif" size="2"># converting 3 d to 2 d, 1 instance at a time-</font></p><p class="MsoNormal" style="margin: 0cm 0cm 0.0001pt;"><font color="#3367d6" face="Calibri, sans-serif" size="2">contributions= contributions[0]</font></p><p class="MsoNormal" style="margin: 0cm 0cm 0.0001pt;"><font color="#3367d6" face="Calibri, sans-serif" size="2">pd_contribution= pd.DataFrame(contributions)</font></p><p class="MsoNormal" style="margin: 0cm 0cm 0.0001pt;"><font color="#3367d6" face="Calibri, sans-serif" size="2"><br /></font></p><p class="MsoNormal" style="margin: 0cm 0cm 0.0001pt;"><font color="#3367d6" face="Calibri, sans-serif" size="2">pd_contribution.columns= iris.target_names</font></p><p class="MsoNormal" style="margin: 0cm 0cm 0.0001pt;"><font color="#3367d6" face="Calibri, sans-serif" size="2">pd_contribution.index= iris.feature_names</font></p><p class="MsoNormal" style="margin: 0cm 0cm 0.0001pt;"><font color="#3367d6" face="Calibri, sans-serif" size="2">pd_contribution['Overall Importance']=abs(pd_contribution['setosa'])+abs(pd_contribution['versicolor'])+abs(pd_contribution['virginica'])</font></p><p class="MsoNormal" style="margin: 0cm 0cm 0.0001pt;"><font color="#3367d6" face="Calibri, sans-serif" size="2">pd_contribution.sort_values('Overall Importance', ascending=False, inplace=True)</font></p><p class="MsoNormal" style="margin: 0cm 0cm 0.0001pt;"></p><p class="MsoNormal" style="margin: 0cm 0cm 0.0001pt;"><font color="#3367d6" face="Calibri, sans-serif" size="2">print(pd_contribution)</font></p><p class="MsoNormal" style="margin: 0cm 0cm 0.0001pt;"><font color="#3367d6" face="Calibri, sans-serif" size="2"><br /></font></p><p class="MsoNormal" style="margin: 0cm 0cm 0.0001pt;"></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgkkUnL7eTH0oNggVj90kYIdcjkMNmIUZsaOYn3XiRwNJOBKSP4XnIoKL6WRMB7Pb446oy9nI6vA0-PHfeDQpGlRsakiSZ041PiEm0eKWtx8YM-_c5Nljx1vKQ4J5XPb0tLrxS8dlEtoEg/" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="214" data-original-width="1220" height="70" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgkkUnL7eTH0oNggVj90kYIdcjkMNmIUZsaOYn3XiRwNJOBKSP4XnIoKL6WRMB7Pb446oy9nI6vA0-PHfeDQpGlRsakiSZ041PiEm0eKWtx8YM-_c5Nljx1vKQ4J5XPb0tLrxS8dlEtoEg/w400-h70/Screenshot+2020-05-28+at+9.27.59+AM.png" width="400" /></a></div><font color="#3367d6" face="Calibri, sans-serif" size="2"><br /></font><p></p><p class="MsoNormal" style="font-family: calibri, sans-serif; font-size: 12pt; margin: 0cm 0cm 0.0001pt;"><br /></p><p class="MsoNormal" style="font-family: calibri, sans-serif; font-size: 12pt; margin: 0cm 0cm 0.0001pt;"></p><ul style="text-align: left;"><li>table shows the importance of features in classifying all the classes for a datapoint(first data point here as I have taken <span style="color: #3367d6; font-family: calibri, sans-serif; font-size: small;">contributions= contributions[0]. )</span></li><li>Overall importance shows how well a feature is doing in classifying all the classes. </li></ul><div><span style="font-size: large;">-----------------------------------------------------------------------------------------------</span></div><div><span style="font-size: large;"><br /></span></div><p></p><p class="MsoNormal" style="font-family: calibri, sans-serif; font-size: 12pt; margin: 0cm 0cm 0.0001pt;"><span style="color: #3367d6; font-family: calibri, sans-serif; font-size: small;"><br /></span></p><p class="MsoNormal" style="font-family: calibri, sans-serif; font-size: 12pt; margin: 0cm 0cm 0.0001pt;"><span style="color: #3367d6; font-family: calibri, sans-serif; font-size: small;"><br /></span></p><p class="MsoNormal" style="font-family: calibri, sans-serif; font-size: 12pt; margin: 0cm 0cm 0.0001pt;"><b><font size="4">6) Local explanation through SHAP Kernal Explainer-</font></b></p><p class="MsoNormal" style="font-family: calibri, sans-serif; font-size: 12pt; margin: 0cm 0cm 0.0001pt;"><a href="https://github.com/slundberg/shap/blob/master/README.md">https://github.com/slundberg/shap/blob/master/README.md</a></p><p class="MsoNormal" style="font-family: calibri, sans-serif; font-size: 12pt; margin: 0cm 0cm 0.0001pt;"><br /></p><p class="MsoNormal" style="font-family: calibri, sans-serif; font-size: 12pt; margin: 0cm 0cm 0.0001pt;"><font size="4">SHAP has 7 different explainability algos. Kernal Shap is one of them. </font><span style="font-family: calibri, sans-serif; font-size: 13.5pt;">It uses a specially-weighted local linear regression to estimate SHAP values for any model. So it is model agnostic also.[ works for any blackbox model]</span></p><p class="MsoNormal" style="font-family: "times new roman", serif; font-size: 12pt; margin: 0cm 0cm 0.0001pt;"><o:p></o:p></p><p class="MsoNormal" style="font-family: calibri, sans-serif; font-size: 12pt; margin: 0cm 0cm 0.0001pt;"><br /></p><p class="MsoNormal" style="font-family: calibri, sans-serif; font-size: 12pt; margin: 0cm 0cm 0.0001pt;"><font color="#3367d6" face="calibri, sans-serif" size="2">import shap</font></p><p class="MsoNormal" style="font-family: calibri, sans-serif; font-size: 12pt; margin: 0cm 0cm 0.0001pt;"><font color="#3367d6" face="calibri, sans-serif" size="2">explainer = shap.KernelExplainer(model.predict_proba, X, link="logit")</font></p><p class="MsoNormal" style="font-family: calibri, sans-serif; font-size: 12pt; margin: 0cm 0cm 0.0001pt;"><font color="#3367d6" face="calibri, sans-serif" size="2">x_test_instance= X[149,:]</font></p><p class="MsoNormal" style="font-family: calibri, sans-serif; font-size: 12pt; margin: 0cm 0cm 0.0001pt;"><font color="#3367d6" face="calibri, sans-serif" size="2">shap_values = explainer.shap_values(X, nsamples=100) </font></p><p class="MsoNormal" style="font-family: calibri, sans-serif; font-size: 12pt; margin: 0cm 0cm 0.0001pt;"><font color="#3367d6" face="calibri, sans-serif" size="2">shap_values[2][149,:]</font></p><p class="MsoNormal" style="font-family: calibri, sans-serif; font-size: 12pt; margin: 0cm 0cm 0.0001pt;"><font color="#3367d6" face="calibri, sans-serif" size="2">shap.force_plot(explainer.expected_value[2], shap_values[2][149,:], x_test_instance, iris.feature_names,</font></p><p class="MsoNormal" style="font-family: calibri, sans-serif; font-size: 12pt; margin: 0cm 0cm 0.0001pt;"><font color="#3367d6" face="calibri, sans-serif" size="2"> link="logit")</font></p><p class="MsoNormal" style="font-family: calibri, sans-serif; font-size: 12pt; margin: 0cm 0cm 0.0001pt;"><font color="#3367d6" face="calibri, sans-serif" size="2"><br /></font></p><p class="MsoNormal" style="font-family: calibri, sans-serif; font-size: 12pt; margin: 0cm 0cm 0.0001pt;"></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh9bsivo96uq_3esLOu5ZcWzDEjIeLDEsi0cFkIxnsOlK7cDayMqtCso5fXYHCxBToI6Fryld4_1XnYgIW39-Mbq6w6-KW4YspDUyIAroP5FgcjIDTJdv5WxqOUgWcY1Mx5R85nDAkdzFM/" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="286" data-original-width="1154" height="99" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh9bsivo96uq_3esLOu5ZcWzDEjIeLDEsi0cFkIxnsOlK7cDayMqtCso5fXYHCxBToI6Fryld4_1XnYgIW39-Mbq6w6-KW4YspDUyIAroP5FgcjIDTJdv5WxqOUgWcY1Mx5R85nDAkdzFM/w400-h99/Screenshot+2020-05-28+at+9.47.22+AM.png" width="400" /></a></div><div class="separator" style="clear: both; text-align: center;"><br /></div><p></p><blockquote style="border: none; margin: 0px 0px 0px 40px; padding: 0px;"><blockquote style="border: none; margin: 0px 0px 0px 40px; padding: 0px;"><p class="MsoNormal" style="font-family: calibri, sans-serif; font-size: 12pt; margin: 0cm 0cm 0.0001pt;"></p><div class="separator" style="clear: both; text-align: center;"><span style="font-family: calibri, sans-serif; font-size: large; text-align: left;"> </span></div><p></p></blockquote></blockquote><p class="MsoNormal" style="font-family: calibri, sans-serif; margin: 0cm 0cm 0.0001pt;"></p><ul style="text-align: left;"><li><font size="4">The above explanation shows three features each contributing to push the model output from the base value(.333) (the average model output over the training dataset we passed) towards zero.</font></li><li><font size="4">Features pushing class label higher are shown in red. </font></li></ul><div><font size="4"><br /></font></div><div><span style="font-size: large;">-----------------------------------------------------------------------------------------------</span></div><p class="MsoNormal" style="font-family: calibri, sans-serif; margin: 0cm 0cm 0.0001pt;"><font size="4"><br /></font></p><p class="MsoNormal" style="font-family: calibri, sans-serif; margin: 0cm 0cm 0.0001pt;"><font size="4">There are many frameworks available for explainability. like- </font></p><p class="MsoNormal" style="font-family: calibri, sans-serif; margin: 0cm 0cm 0.0001pt;"><br /></p><p class="MsoNormal" style="font-family: calibri, sans-serif; margin: 0cm 0cm 0.0001pt;"><font size="4"></font></p><table border="1" bordercolor="#888" cellspacing="0" style="border-collapse: collapse; border-color: rgb(136, 136, 136); border-width: 1px;"><tbody><tr><td style="min-width: 60px;"> <span class="NormalTextRun SCXP267895668 BCX7" style="line-height: 0px; position: relative; vertical-align: -0.269094px;">Aix360</span></td><td style="min-width: 60px;"> <span class="NormalTextRun SCXP61382789 BCX7" style="line-height: 0px; position: relative; vertical-align: -0.269094px;">Alibi</span></td><td style="min-width: 60px;"> <span class="SpellingError SCXP242174858 BCX7" style="line-height: 0px; position: relative; vertical-align: -0.269094px;">Dalex</span></td><td style="min-width: 60px;"> <span class="NormalTextRun SCXP5243947 BCX7" style="line-height: 0px; position: relative; vertical-align: -0.269094px;">Eli5</span></td></tr><tr><td style="min-width: 60px;"> <span class="NormalTextRun SCXP268203759 BCX7" style="line-height: 0px; position: relative; vertical-align: -0.269094px;">H20</span></td><td style="min-width: 60px;"> <span class="NormalTextRun SCXP45096740 BCX7" style="line-height: 0px; position: relative; vertical-align: -0.269094px;">Google explainable AI</span></td><td style="min-width: 60px;"> <span class="NormalTextRun SCXP191904399 BCX7" style="line-height: 0px; position: relative; vertical-align: -0.269094px;">Skater</span></td><td style="min-width: 60px;"> <span class="NormalTextRun BCX7 SCXP10018919" style="-webkit-tap-highlight-color: transparent; -webkit-user-drag: none; line-height: 0px; margin: 0px; padding: 0px; position: relative; touch-action: pan-x pan-y; user-select: text; vertical-align: -0.269094px;">Lucid- </span><span class="TextRun BCX7 SCXP10018919" data-contrast="none" data-scheme-color="@000000,," data-usefontface="true" lang="EN-IN" style="-webkit-tap-highlight-color: transparent; -webkit-user-drag: none; background-color: #e9ebf5; font-family: calibri, calibri_msfontservice, calibri_msfontservice, sans-serif; font-size: 16.1px; font-variant-ligatures: none !important; line-height: 19px; margin: 0px; padding: 0px 0px 0.269094px; touch-action: pan-x pan-y; user-select: text; vertical-align: 0.269094px;" xml:lang="EN-IN"><span class="SpellingError BCX7 SCXP10018919" style="-webkit-tap-highlight-color: transparent; -webkit-user-drag: none; background-image: url("data:image/gif;base64,R0lGODlhBQAEAJECAP////8AAAAAAAAAACH5BAEAAAIALAAAAAAFAAQAAAIIlGAXCCHrTCgAOw=="); background-position: 0% 100%; background-repeat: repeat-x; border-bottom: 1px solid transparent; line-height: 0px; margin: 0px; padding: 0px; position: relative; touch-action: pan-x pan-y; user-select: text; vertical-align: -0.269094px;">tensorflow</span></span></td></tr><tr><td> <span class="SpellingError SCXP149434299 BCX7" style="line-height: 0px; position: relative; vertical-align: -0.269094px;">Captum</span></td><td> <span class="NormalTextRun BCX7 SCXP82421165" style="-webkit-tap-highlight-color: transparent; -webkit-user-drag: none; line-height: 0px; margin: 0px; padding: 0px; position: relative; touch-action: pan-x pan-y; user-select: text; vertical-align: -0.269094px;">MS Azure </span><span class="TextRun BCX7 SCXP82421165" data-contrast="none" data-scheme-color="@000000,," data-usefontface="true" lang="EN-IN" style="-webkit-tap-highlight-color: transparent; -webkit-user-drag: none; background-color: #cfd5ea; font-family: calibri, calibri_msfontservice, calibri_msfontservice, sans-serif; font-size: 16.1px; font-variant-ligatures: none !important; line-height: 19px; margin: 0px; padding: 0px 0px 0.269094px; touch-action: pan-x pan-y; user-select: text; vertical-align: 0.269094px;" xml:lang="EN-IN"><span class="SpellingError BCX7 SCXP82421165" style="-webkit-tap-highlight-color: transparent; -webkit-user-drag: none; background-image: url("data:image/gif;base64,R0lGODlhBQAEAJECAP////8AAAAAAAAAACH5BAEAAAIALAAAAAAFAAQAAAIIlGAXCCHrTCgAOw=="); background-position: 0% 100%; background-repeat: repeat-x; border-bottom: 1px solid transparent; line-height: 0px; margin: 0px; padding: 0px; position: relative; touch-action: pan-x pan-y; user-select: text; vertical-align: -0.269094px;">explainability</span></span></td><td> <span class="SpellingError SCXP31745639 BCX7" style="line-height: 0px; position: relative; vertical-align: -0.269094px;">InterpretML</span></td><td> LIME/SHAP</td></tr></tbody></table><p class="MsoNormal" style="font-family: calibri, sans-serif; margin: 0cm 0cm 0.0001pt;"><font size="4"><br /></font></p><p class="MsoNormal" style="font-family: calibri, sans-serif; margin: 0cm 0cm 0.0001pt;"><font size="4">do try these and kill the dataset next time.</font></p><p class="MsoNormal" style="font-family: calibri, sans-serif; margin: 0cm 0cm 0.0001pt;"><font size="4"><br /></font></p><p class="MsoNormal" style="font-family: calibri, sans-serif; margin: 0cm 0cm 0.0001pt;"><font size="4"><br /></font></p><p class="MsoNormal" style="font-family: calibri, sans-serif; margin: 0cm 0cm 0.0001pt;"><font size="4">another article on what to include from all the above crap in any ML model-</font></p><p class="MsoNormal" style="font-family: calibri, sans-serif; margin: 0cm 0cm 0.0001pt;"><font size="4"><br /></font></p><p class="MsoNormal" style="font-family: calibri, sans-serif; margin: 0cm 0cm 0.0001pt;"><a href="http://machinelearningstories.blogspot.com/2019/12/explainability-in-data-science-data.html" target="_blank">Explainability in Data Science:- Data, Model & Prediction</a></p><p class="MsoNormal" style="font-family: calibri, sans-serif; font-size: 12pt; margin: 0cm 0cm 0.0001pt;"><br /></p><p class="MsoNormal" style="font-family: calibri, sans-serif; font-size: 12pt; margin: 0cm 0cm 0.0001pt;"><br /></p><p class="MsoNormal" style="font-family: calibri, sans-serif; font-size: 12pt; margin: 0cm 0cm 0.0001pt;"> </p><p class="MsoNormal" style="font-family: calibri, sans-serif; font-size: 12pt; margin: 0cm 0cm 0.0001pt;"><br /></p><p class="MsoNormal" style="font-family: calibri, sans-serif; font-size: 12pt; margin: 0cm 0cm 0.0001pt;"><br /></p>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-8230692877620938204.post-24400250148703266012020-04-04T02:46:00.003-07:002020-04-04T02:58:01.365-07:00Forecasting total deaths from Covonavirus<div dir="ltr" style="text-align: left;" trbidi="on">
Almost 60, 000 people have died of Coronavirus and we have not reached even the peak of expected distribution of deaths. Expected distribution is somewhat bell shaped curve. Like plot of China's death because of Coronavirus-<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjgktor83f-l5G03yfTrxPIWYntrBo-i7jaXGclfX4rwSyPXIBP3E9c3iz_9qO7Nc0AFir7gj-FBMFETRsp6f52qOgmdT6Yh7-7-g_Twe-YgYv_ZQF3Ivg82V3uys9eW_aVmueaBmb-tB4/s1600/Screenshot+2020-04-04+at+1.33.34+PM.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="926" data-original-width="1482" height="247" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjgktor83f-l5G03yfTrxPIWYntrBo-i7jaXGclfX4rwSyPXIBP3E9c3iz_9qO7Nc0AFir7gj-FBMFETRsp6f52qOgmdT6Yh7-7-g_Twe-YgYv_ZQF3Ivg82V3uys9eW_aVmueaBmb-tB4/s400/Screenshot+2020-04-04+at+1.33.34+PM.png" width="400" /></a></div>
Here I have taken machine-learning based approach to forecast total deaths from corona virus.<br />
<br />
python( 3.6) code to do analysis-<br />
<br />
Step 1 - loading required dataset( John Hopkins dataset from Github)<br />
<i><span style="color: blue;"> def load_data():</span></i><br />
<i><span style="color: blue;"> </span></i><br />
<i><span style="color: blue;"> import pandas as pd</span></i><br />
<i><span style="color: blue;"><br /></span></i>
<i><span style="color: blue;"> url='https://raw.githubusercontent.com/CSSEGISandData/COVID- 19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv'</span></i><br />
<br />
<i><span style="color: blue;"> corona_data = pd.read_csv(url, sep=',') </span></i><br />
<i><span style="color: blue;"> corona_data.head()</span></i><br />
<i><span style="color: blue;"> return(corona_data)</span></i><br />
<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh3FFSFG8M0DJ2As1l-cuC0QUoXjT6a3f5xELpIxPDS2yzmkiCjazAbvtHihgIewOojaZ1lsDk5YTVJ1D3ib0TyIBPPrv4wzuaXxP97jTL-0q0pQB43hqT9jdbUAMw26zn4qy_rD-Ob4zg/s1600/Screenshot+2020-04-04+at+1.50.49+PM.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="428" data-original-width="1406" height="120" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh3FFSFG8M0DJ2As1l-cuC0QUoXjT6a3f5xELpIxPDS2yzmkiCjazAbvtHihgIewOojaZ1lsDk5YTVJ1D3ib0TyIBPPrv4wzuaXxP97jTL-0q0pQB43hqT9jdbUAMw26zn4qy_rD-Ob4zg/s400/Screenshot+2020-04-04+at+1.50.49+PM.png" width="400" /></a></div>
data is updated daily, so one will get till date information of cases.<br />
<br />
Step 2 data preprocessing- visit github page-<br />
<br />
<a href="https://github.com/ArpitSisodia/expected-deaths-from-Covid-19/blob/master/Covid%2019%20-%20ExperimentArpit.ipynb" target="_blank">Pre Processing code</a><br />
<br />
<br />
Step 3 forecasting deaths-<br />
<br />
multiplicative ets model-<br />
<i><span style="color: blue;">fit2_mul = ExponentialSmoothing(plot_dealts1, seasonal_periods=None, trend='mul', seasonal=None).fit(use_boxcox=True)</span></i><br />
<i><span style="color: blue;"><br /></span></i>
additive ets model -<br />
<i><span style="color: blue;">fit2_add = ExponentialSmoothing(plot_dealts1, seasonal_periods=None, trend='add', seasonal=None).fit(use_boxcox=True)</span></i><br />
<i><span style="color: blue;"><br /></span></i>
Step 4- As this is not exact time series analysis. As actual distribution of deaths is bell shaped curve. We need to assume that peak will come in x number of days. Then we calculate number of deaths till peak and just multiply it with 2, to get total deaths.<br />
<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiuDfs-vbUfuLSGcMt4iyrPpWYqaZUf13N0_iP-LxPIa62AQ9h6W3MGJ1ztzsklfxornwb-EzKxpTsn820e8sBpze5Z_RcSoVVVBrlwM_PWTVKPFwymvMr9n3QDPK3jEc9tM-qgVVmAecY/s1600/Screenshot+2020-04-04+at+2.57.13+PM.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="524" data-original-width="878" height="237" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiuDfs-vbUfuLSGcMt4iyrPpWYqaZUf13N0_iP-LxPIa62AQ9h6W3MGJ1ztzsklfxornwb-EzKxpTsn820e8sBpze5Z_RcSoVVVBrlwM_PWTVKPFwymvMr9n3QDPK3jEc9tM-qgVVmAecY/s400/Screenshot+2020-04-04+at+2.57.13+PM.png" width="400" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<br />
the above plot is expected number of people dies if corona virus takes 15 more days to reach its peak.( additive etc model)<br />
<br />
total deaths with additive etc model assuming it will take 10 days to reach its peak- <span style="caret-color: rgb(0, 0, 0); font-size: 14px; white-space: pre-wrap;"><b>162996</b></span><br />
<br />
total deaths with additive etc model assuming it will take 15 days to reach its peak- <span style="caret-color: rgb(0, 0, 0); font-size: 14px; white-space: pre-wrap;"><b>280068</b></span><br />
<br />
total deaths with additive etc model assuming it will take 20 days to reach its peak- <span style="caret-color: rgb(0, 0, 0); font-size: 14px; white-space: pre-wrap;"><b>404939</b></span><br />
<br />
total deaths with multiplicative etc model assuming it will take 10 days to reach its peak- <span style="caret-color: rgb(0, 0, 0); font-size: 14px; white-space: pre-wrap;"><b>185368</b></span><br />
<br />
total deaths with multiplicative etc model assuming it will take 15 days to reach its peak- <span style="caret-color: rgb(0, 0, 0); font-size: 14px; white-space: pre-wrap;"><b>335326</b></span><br />
<br />
total deaths with multiplicative etc model assuming it will take 20 days to reach its peak- <span style="caret-color: rgb(0, 0, 0); font-size: 14px; white-space: pre-wrap;"><b>614004</b></span><br />
<span style="caret-color: rgb(0, 0, 0); font-size: 14px; white-space: pre-wrap;"><b><br /></b></span>
<br />
<div style="text-align: left;">
<span style="font-size: 14px; white-space: pre-wrap;"><b> </b> In optimistic scenario, if we get the peak in just 5 day so total deaths would be <b>99408</b> with additive and </span><span style="caret-color: rgb(0, 0, 0); font-size: 14px; white-space: pre-wrap;"><b>103785 </b>with multiplicative model. The total death trend will be-</span></div>
<div style="text-align: left;">
<span style="caret-color: rgb(0, 0, 0); font-size: 14px; white-space: pre-wrap;"><br /></span></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEijqp9zDS3bieqP7wG6LbhOAaKi-TX1FItabVKcAVUJAG6HaA7cpaWqCsa1ZZ-EU2UOXRgB-XX2I2SfxLbw6kIg0DnlLH0o9DN291BQyUAdmxREnXJvu3yv58BM2IndbSBxloIf3liECuQ/s1600/Screenshot+2020-04-04+at+2.57.13+PM.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="524" data-original-width="878" height="237" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEijqp9zDS3bieqP7wG6LbhOAaKi-TX1FItabVKcAVUJAG6HaA7cpaWqCsa1ZZ-EU2UOXRgB-XX2I2SfxLbw6kIg0DnlLH0o9DN291BQyUAdmxREnXJvu3yv58BM2IndbSBxloIf3liECuQ/s400/Screenshot+2020-04-04+at+2.57.13+PM.png" width="400" /></a></div>
<div style="text-align: left;">
<span style="caret-color: rgb(0, 0, 0); font-size: 14px; white-space: pre-wrap;"><br /></span></div>
<span style="caret-color: rgb(0, 0, 0); font-size: 14px; white-space: pre-wrap;"><br /></span>
<span style="font-size: 14px; white-space: pre-wrap;">Complete code is present at <a href="https://github.com/ArpitSisodia/expected-deaths-from-Covid-19/blob/master/Covid%2019%20-%20ExperimentArpit.ipynb" target="_blank">Python Code Corona Virus Deaths</a></span><br />
<br />
TO know about the Explainable AI- <a href="http://machinelearningstories.blogspot.com/2019/12/explainability-in-data-science-data.html" target="_blank">X_AI</a><br />
<br />
<br /></div>
Unknownnoreply@blogger.com2tag:blogger.com,1999:blog-8230692877620938204.post-76887310402751973032019-12-04T19:47:00.000-08:002019-12-04T19:48:27.069-08:00Explainability in Data Science:- Data, Model & Prediction <div dir="ltr" style="text-align: left;" trbidi="on">
<div style="text-align: left;">
XAI( Explainable AI ) is grabbing lime-light in machine learning. How can we be sure that image classification algo is learning faces not background ? Customer wants to know why loan is disapproved? Globally important variable might not be responsible/ imp for individual prediction. Here XAI comes to rescue-</div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
We have taken data from <a href="https://github.com/ArpitSisodia/Classfication_explainability/blob/master/Dummy_KPI_data.csv">classification_data</a></div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
This has some sensor values and an output class. </div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
<b>A) Data Explainability</b>- what are the basic understanding required from data perspective. </div>
<div style="text-align: left;">
<br /></div>
<br />
<div class="MsoListParagraphCxSpFirst" style="margin-left: 0.75in; text-indent: -0.25in;">
<span style="line-height: 17.12px;"><span style="mso-list: Ignore;">1)<span style="font-family: "times new roman"; font-stretch: normal; line-height: normal;"> </span></span></span><span style="line-height: 17.12px;">Identify missing values, co-linear feature, feature interaction, zero importance feature, low important feature, single value feature and handle missing values, remove/ handle features accordingly.<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="margin-left: 0.75in; text-indent: -0.25in;">
<span style="line-height: 17.12px;"><span style="mso-list: Ignore;">2)<span style="font-family: "times new roman"; font-stretch: normal; line-height: normal;"> </span></span></span><span style="line-height: 17.12px;">Missing values- no missing values from data description<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="margin-left: 0.75in; text-indent: -0.25in;">
<span style="line-height: 17.12px;"><span style="mso-list: Ignore;">3)<span style="font-family: "times new roman"; font-stretch: normal; line-height: normal;"> </span></span></span><span style="line-height: 17.12px;">No good correlation between variables- can be seen from correlation plots<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="margin-left: 0.75in; text-indent: -0.25in;">
<span style="line-height: 17.12px;"><span style="mso-list: Ignore;">4)<span style="font-family: "times new roman"; font-stretch: normal; line-height: normal;"> </span></span></span><span style="line-height: 17.12px;">Feature interaction- tree based models would approximate integration </span><a href="https://stats.stackexchange.com/questions/147594/do-cart-trees-capture-interactions-among-predictors"><span style="line-height: 17.12px;">interation in CART</span></a><span style="line-height: 17.12px;"><o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="margin-left: 0.75in; text-indent: -0.25in;">
<span style="line-height: 17.12px;"><span style="mso-list: Ignore;">5)<span style="font-family: "times new roman"; font-stretch: normal; line-height: normal;"> </span></span></span><span style="line-height: 17.12px;">Zero importance, low importance, single feature value- handles through RFE and models( RF, XGboost) itself.<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="margin-left: 0.75in; text-indent: -0.25in;">
<span style="line-height: 17.12px;"><span style="mso-list: Ignore;">6)<span style="font-family: "times new roman"; font-stretch: normal; line-height: normal;"> </span></span></span><span style="line-height: 17.12px;">Distribution and sampling of both the class and features is also seen as selection of model will depend of data distribution. Chances are data with lot of categorical variables is more suitable for tree based model.<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpLast" style="margin-left: 0.75in; text-indent: -0.25in;">
<span style="line-height: 17.12px;"><span style="mso-list: Ignore;">7)<span style="font-family: "times new roman"; font-stretch: normal; line-height: normal;"> </span></span></span><span style="line-height: 17.12px;">Box plot itself can identify important feature for classification. We can see sensor 3, 8, 6 looks important whereas 5, 7 may not have good prediction power.<span style="font-size: 9pt;"><o:p></o:p></span></span></div>
<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhu8j4fMeuwjnEfgAFRetqw8h1HzVUwgMVF9FRolQCAnDy-1ZSqgOKe5QBCQZ_Wwo5AHu4BmDjyfeViJmAdN7yd9Jy9JZ7A_J8oTgfgYeOToo__0W8-MJ6oP9qrB2k5aQ3AvpijsS1xeD8/s1600/box_plot.PNG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="590" data-original-width="610" height="386" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhu8j4fMeuwjnEfgAFRetqw8h1HzVUwgMVF9FRolQCAnDy-1ZSqgOKe5QBCQZ_Wwo5AHu4BmDjyfeViJmAdN7yd9Jy9JZ7A_J8oTgfgYeOToo__0W8-MJ6oP9qrB2k5aQ3AvpijsS1xeD8/s400/box_plot.PNG" width="400" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<b style="text-align: left;"><br /></b></div>
<div class="separator" style="clear: both; text-align: center;">
<b style="text-align: left;"><br /></b></div>
<div class="separator" style="clear: both; text-align: center;">
<b style="text-align: left;"><br /></b></div>
<div class="separator" style="clear: both; text-align: center;">
<b style="text-align: left;"><br /></b></div>
<div class="separator" style="clear: both; text-align: center;">
<b style="text-align: left;"><br /></b></div>
<div class="separator" style="clear: both; text-align: center;">
<b style="text-align: left;"><br /></b></div>
<div class="separator" style="clear: both; text-align: justify;">
<b style="text-align: left;">B) Other Approaches- Feature selection/engineering-</b></div>
<div class="separator" style="clear: both; text-align: center;">
<b style="text-align: left;"><br /></b></div>
<div class="MsoListParagraphCxSpFirst" style="margin-left: .75in; mso-add-space: auto; mso-list: l0 level1 lfo1; text-indent: -.25in;">
<span style="line-height: 17.12px;">1)<span style="font-stretch: normal; font-variant-east-asian: normal; font-variant-numeric: normal; line-height: normal;"> </span></span><span style="line-height: 17.12px;">univariate feature selection using chi square test. ( </span><a href="http://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.SelectKBest.html#sklearn.feature_selection.SelectKBest"><span style="line-height: 17.12px;">select k best</span></a><span style="line-height: 17.12px;">)-<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="margin-left: .75in; mso-add-space: auto; mso-list: l0 level1 lfo1; text-indent: -.25in;">
<span style="line-height: 17.12px;">2)<span style="font-stretch: normal; font-variant-east-asian: normal; font-variant-numeric: normal; line-height: normal;"> </span></span><span style="line-height: 17.12px;">Recursive feature Engineering </span><a href="http://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFE.html#sklearn.feature_selection.RFE"><span style="line-height: 17.12px;">RFE</span></a><span style="line-height: 17.12px;">- select n specific features based on underlying model used. ( used)<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="margin-left: .75in; mso-add-space: auto; mso-list: l0 level1 lfo1; text-indent: -.25in;">
<span style="line-height: 17.12px;">3)<span style="font-stretch: normal; font-variant-east-asian: normal; font-variant-numeric: normal; line-height: normal;"> </span></span><span style="line-height: 17.12px;">PCA </span><a href="http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html"><span style="line-height: 17.12px;">PCA</span></a><span style="line-height: 17.12px;">- to reduce corelated feature by linear transformation ( not needed)<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="margin-left: .75in; mso-add-space: auto; mso-list: l0 level1 lfo1; text-indent: -.25in;">
4)<span style="font-stretch: normal; font-variant-east-asian: normal; font-variant-numeric: normal; line-height: normal;"> </span><span style="line-height: 17.12px;">Autoencoders- non linear transformation of features if needed</span> <span style="line-height: 17.12px;">( it will be over-kill here)</span><o:p></o:p></div>
<div class="separator" style="clear: both;">
</div>
<div class="MsoListParagraphCxSpLast" style="margin-left: .75in; mso-add-space: auto; mso-list: l0 level1 lfo1; text-indent: -.25in;">
<span style="line-height: 17.12px;">5)<span style="font-stretch: normal; font-variant-east-asian: normal; font-variant-numeric: normal; line-height: normal;"> </span></span><span style="line-height: 17.12px;">Feature importance by Random forest, DT( In terms of rules), other tree ensemble models like Catboost and Xgboost.- used on our scenario<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpLast" style="margin-left: .75in; mso-add-space: auto; mso-list: l0 level1 lfo1; text-indent: -.25in;">
<span style="line-height: 17.12px;"><br /></span></div>
<div class="MsoListParagraphCxSpLast" style="margin-left: .75in; mso-add-space: auto; mso-list: l0 level1 lfo1; text-indent: -.25in;">
<span style="line-height: 17.12px;"><br /></span></div>
<div class="MsoListParagraphCxSpLast" style="margin-left: .75in; mso-add-space: auto; mso-list: l0 level1 lfo1; text-indent: -.25in;">
<span style="line-height: 17.12px;"><br /></span></div>
<div class="MsoListParagraphCxSpLast" style="margin-left: .75in; mso-add-space: auto; mso-list: l0 level1 lfo1; text-indent: -.25in;">
<span style="line-height: 17.12px;"><br /></span></div>
<div class="MsoListParagraphCxSpLast" style="margin-left: .75in; mso-add-space: auto; mso-list: l0 level1 lfo1; text-indent: -.25in;">
<span style="line-height: 17.12px;"><br /></span></div>
<div class="MsoNormal">
<b>C) Feature Importance on sensor data ( Global)- </b> <span style="line-height: 17.12px;">In practical I take features importance from the domain / business people, as in our scenario sensor 7 ( one of the least important feature) might be electric current in steel mixture plant and to see impact of current in anomalies/fault it has to be on higher sampling( micro/ mili seconds) unlike temperature. Thus we will be missing an important feature as data collection rate is not correct. Such understanding can only come from domain experts. So business understanding and ML both are equally important for feature engineering.<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="line-height: 17.12px;"><br /></span></div>
<div class="MsoListParagraphCxSpLast" style="margin-left: .75in; mso-add-space: auto; mso-list: l0 level1 lfo1; text-indent: -.25in;">
<span style="line-height: 17.12px;"></span></div>
<div class="MsoNormal">
<span style="line-height: 17.12px;">There are white box models like DT and Random Forest to get feature importance from model itself. In our case we have taken coefficient of logistic regression in the beginning.( see all the algos comparison at github- <a href="https://github.com/ArpitSisodia/Classfication_explainability">link</a> Here we are relying on the models that have maximum accuracy - RF and xgboost.<span style="font-size: 9pt;"><o:p></o:p></span></span></div>
<div class="MsoNormal">
<span style="line-height: 17.12px;"><br /></span></div>
<div class="MsoNormal">
<span style="line-height: 17.12px;"><br /></span></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgCC74AfwvDczhYrLw78fopW6WRaSltudY9woTLj6JC9ZWtU3q56-Y735wFudpin5fMPHGd3YSJa9tg4viS2PyNxkHdcLi52DojIFkRZqfNmuaJPgLg3YqIQ_5ElKifOEGBBgtv3STq68Q/s1600/xgb_feature_imp.PNG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="469" data-original-width="857" height="218" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgCC74AfwvDczhYrLw78fopW6WRaSltudY9woTLj6JC9ZWtU3q56-Y735wFudpin5fMPHGd3YSJa9tg4viS2PyNxkHdcLi52DojIFkRZqfNmuaJPgLg3YqIQ_5ElKifOEGBBgtv3STq68Q/s400/xgb_feature_imp.PNG" width="400" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjq-f7W20sG2HGYm2RnEkuysvzixKVd2GV7yxmvniB4ov2CRu3GVRHyqmNAa2hhPnJmrLl62iO-TEeD5LTea6YruCU2IQdoez3Oe-aOQabaQndiAnQJIGMkyA7o0Eu9ebjTiTtZO4ZzZ7s/s1600/over_all_var_imp.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="354" data-original-width="515" height="273" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjq-f7W20sG2HGYm2RnEkuysvzixKVd2GV7yxmvniB4ov2CRu3GVRHyqmNAa2hhPnJmrLl62iO-TEeD5LTea6YruCU2IQdoez3Oe-aOQabaQndiAnQJIGMkyA7o0Eu9ebjTiTtZO4ZzZ7s/s400/over_all_var_imp.png" width="400" /></a></div>
<div class="MsoNormal">
<span style="line-height: 17.12px;"><br /></span></div>
<div class="MsoNormal">
<span style="line-height: 17.12px;">Thus over all we can say that feature- 8,6, 4, 0, 1,3 looks important for classification model. Feature 7 seems having no importance in xgboost as its classification power is captured by other feature. This Important of features was visible in box-plot also.<o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="line-height: 17.12px;"></span></div>
<div class="MsoNormal">
<span style="line-height: 17.12px;">Recursive feature Elimination is useful in selecting subset of features as it tells top feature to keep for modeling. <o:p></o:p></span></div>
<div class="MsoNormal">
<span style="line-height: 17.12px;"><br /></span></div>
<div class="MsoNormal">
<span style="line-height: 17.12px;"><br /></span></div>
<div class="MsoNormal">
<span style="line-height: 17.12px;"><br /></span></div>
<div class="MsoNormal">
<span style="line-height: 17.12px;"><br /></span></div>
<br />
<br />
<div class="MsoNormal">
<b style="mso-bidi-font-weight: normal;">D) Feature Importance on sensor data ( Local)-<o:p></o:p></b></div>
<br />
<br />
<br />
<div class="MsoNormal">
<span style="line-height: 17.12px;">With the advancement of ML and Deep learning, just global importance is not useful. Business, Data scientist are looking for local explanation too. In our analysis, we have used IBM AIX 360 framework to get importance of rules on the features( importance of feature based on the values of feature and output value). The options to use different packages/framework are-<o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<table border="0" cellpadding="0" cellspacing="0" class="MsoNormalTable" style="border-collapse: collapse; mso-padding-alt: 0in 5.4pt 0in 5.4pt; mso-yfti-tbllook: 1184; width: 624px;"><tbody>
<tr style="height: .2in; mso-yfti-firstrow: yes; mso-yfti-irow: 0;"><td nowrap="" style="height: 0.2in; padding: 0in 5.4pt; width: 84.3pt;" valign="bottom" width="112"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0in;">
<span style="color: black; mso-ascii-font-family: Calibri; mso-bidi-font-family: Calibri; mso-fareast-font-family: "Times New Roman"; mso-hansi-font-family: Calibri;">AIX360<o:p></o:p></span></div>
</td><td nowrap="" style="height: 0.2in; padding: 0in 5.4pt; width: 383.7pt;" valign="bottom" width="512"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0in;">
<a href="https://github.com/IBM/AIX360/blob/master/examples/tutorials/HELOC.ipynb"><span style="color: #0563c1; mso-ascii-font-family: Calibri; mso-bidi-font-family: Calibri; mso-fareast-font-family: "Times New Roman"; mso-hansi-font-family: Calibri;">https://github.com/IBM/AIX360/blob/master/examples/tutorials/HELOC.ipynb</span></a><u><span style="color: #0563c1; mso-ascii-font-family: Calibri; mso-bidi-font-family: Calibri; mso-fareast-font-family: "Times New Roman"; mso-hansi-font-family: Calibri;"><o:p></o:p></span></u></div>
</td></tr>
<tr style="height: .2in; mso-yfti-irow: 1;"><td nowrap="" style="height: 0.2in; padding: 0in 5.4pt; width: 84.3pt;" valign="bottom" width="112"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0in;">
<span style="color: black; mso-ascii-font-family: Calibri; mso-bidi-font-family: Calibri; mso-fareast-font-family: "Times New Roman"; mso-hansi-font-family: Calibri;">Skater<o:p></o:p></span></div>
</td><td nowrap="" style="height: 0.2in; padding: 0in 5.4pt; width: 383.7pt;" valign="bottom" width="512"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0in;">
<a href="https://github.com/oracle/Skater"><span style="color: #0563c1; mso-ascii-font-family: Calibri; mso-bidi-font-family: Calibri; mso-fareast-font-family: "Times New Roman"; mso-hansi-font-family: Calibri;">https://github.com/oracle/Skater</span></a><u><span style="color: #0563c1; mso-ascii-font-family: Calibri; mso-bidi-font-family: Calibri; mso-fareast-font-family: "Times New Roman"; mso-hansi-font-family: Calibri;"><o:p></o:p></span></u></div>
</td></tr>
<tr style="height: .2in; mso-yfti-irow: 2;"><td nowrap="" style="height: 0.2in; padding: 0in 5.4pt; width: 84.3pt;" valign="bottom" width="112"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0in;">
<span style="color: black; mso-ascii-font-family: Calibri; mso-bidi-font-family: Calibri; mso-fareast-font-family: "Times New Roman"; mso-hansi-font-family: Calibri;">ELI5<o:p></o:p></span></div>
</td><td nowrap="" style="height: 0.2in; padding: 0in 5.4pt; width: 383.7pt;" valign="bottom" width="512"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0in;">
<a href="https://github.com/TeamHG-Memex/eli5"><span style="color: #0563c1; mso-ascii-font-family: Calibri; mso-bidi-font-family: Calibri; mso-fareast-font-family: "Times New Roman"; mso-hansi-font-family: Calibri;">https://github.com/TeamHG-Memex/eli5</span></a><u><span style="color: #0563c1; mso-ascii-font-family: Calibri; mso-bidi-font-family: Calibri; mso-fareast-font-family: "Times New Roman"; mso-hansi-font-family: Calibri;"><o:p></o:p></span></u></div>
</td></tr>
<tr style="height: .2in; mso-yfti-irow: 3;"><td nowrap="" style="height: 0.2in; padding: 0in 5.4pt; width: 84.3pt;" valign="bottom" width="112"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0in;">
<span style="color: black; mso-ascii-font-family: Calibri; mso-bidi-font-family: Calibri; mso-fareast-font-family: "Times New Roman"; mso-hansi-font-family: Calibri;">Alibi<o:p></o:p></span></div>
</td><td nowrap="" style="height: 0.2in; padding: 0in 5.4pt; width: 383.7pt;" valign="bottom" width="512"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0in;">
<a href="https://github.com/SeldonIO/alibi"><span style="color: #0563c1; mso-ascii-font-family: Calibri; mso-bidi-font-family: Calibri; mso-fareast-font-family: "Times New Roman"; mso-hansi-font-family: Calibri;">https://github.com/SeldonIO/alibi</span></a><u><span style="color: #0563c1; mso-ascii-font-family: Calibri; mso-bidi-font-family: Calibri; mso-fareast-font-family: "Times New Roman"; mso-hansi-font-family: Calibri;"><o:p></o:p></span></u></div>
</td></tr>
<tr style="height: .2in; mso-yfti-irow: 4;"><td nowrap="" style="height: 0.2in; padding: 0in 5.4pt; width: 84.3pt;" valign="bottom" width="112"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0in;">
<span style="color: black; mso-ascii-font-family: Calibri; mso-bidi-font-family: Calibri; mso-fareast-font-family: "Times New Roman"; mso-hansi-font-family: Calibri;">H20<o:p></o:p></span></div>
</td><td nowrap="" style="height: 0.2in; padding: 0in 5.4pt; width: 383.7pt;" valign="bottom" width="512"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0in;">
<a href="https://www.h2o.ai/products-dai-mli/"><span style="color: #0563c1; mso-ascii-font-family: Calibri; mso-bidi-font-family: Calibri; mso-fareast-font-family: "Times New Roman"; mso-hansi-font-family: Calibri;">https://www.h2o.ai/products-dai-mli/</span></a><u><span style="color: #0563c1; mso-ascii-font-family: Calibri; mso-bidi-font-family: Calibri; mso-fareast-font-family: "Times New Roman"; mso-hansi-font-family: Calibri;"><o:p></o:p></span></u></div>
</td></tr>
<tr style="height: .2in; mso-yfti-irow: 5;"><td nowrap="" style="height: 0.2in; padding: 0in 5.4pt; width: 84.3pt;" valign="bottom" width="112"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0in;">
<span style="color: black; mso-ascii-font-family: Calibri; mso-bidi-font-family: Calibri; mso-fareast-font-family: "Times New Roman"; mso-hansi-font-family: Calibri;">MS Azure Explainability<o:p></o:p></span></div>
</td><td nowrap="" style="height: 0.2in; padding: 0in 5.4pt; width: 383.7pt;" valign="bottom" width="512"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0in;">
<a href="https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-machine-learning-interpretability"><span style="color: #0563c1; mso-ascii-font-family: Calibri; mso-bidi-font-family: Calibri; mso-fareast-font-family: "Times New Roman"; mso-hansi-font-family: Calibri;">https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-machine-learning-interpretability</span></a><u><span style="color: #0563c1; mso-ascii-font-family: Calibri; mso-bidi-font-family: Calibri; mso-fareast-font-family: "Times New Roman"; mso-hansi-font-family: Calibri;"><o:p></o:p></span></u></div>
</td></tr>
<tr style="height: 15pt;"><td nowrap="" style="height: 15pt; padding: 0in 5.4pt; width: 84.3pt;" valign="bottom" width="112"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0in;">
<span style="color: #24292e; font-family: "segoe ui" , sans-serif;">DALEX<o:p></o:p></span></div>
</td><td nowrap="" style="height: 15pt; padding: 0in 5.4pt; width: 383.7pt;" valign="bottom" width="512"><div class="MsoNormal" style="line-height: normal; margin-bottom: 0in;">
<a href="https://github.com/ModelOriented/DrWhy"><span style="color: #0563c1; mso-ascii-font-family: Calibri; mso-bidi-font-family: Calibri; mso-fareast-font-family: "Times New Roman"; mso-hansi-font-family: Calibri;">https://github.com/ModelOriented/DrWhy</span></a><u><span style="color: #0563c1; mso-ascii-font-family: Calibri; mso-bidi-font-family: Calibri; mso-fareast-font-family: "Times New Roman"; mso-hansi-font-family: Calibri;"><o:p></o:p></span></u></div>
</td></tr>
</tbody></table>
<br />
<br />
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh2aL-AEUJ88vlNnbhGnJ-95ald0Xsw5ZucwDWOq32QtgE2GWV7P57jdYQxGIjHyRGJE6zoEMTybcth-IFD1Z4iJt6ww58v9se8uragfiBX7KztlwwXJ6hiX0PcAEA8MwscVwc0x1XAkvw/s1600/lrr.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="390" data-original-width="398" height="313" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh2aL-AEUJ88vlNnbhGnJ-95ald0Xsw5ZucwDWOq32QtgE2GWV7P57jdYQxGIjHyRGJE6zoEMTybcth-IFD1Z4iJt6ww58v9se8uragfiBX7KztlwwXJ6hiX0PcAEA8MwscVwc0x1XAkvw/s320/lrr.png" width="320" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhflKzcXf81MjC3Zmhqc0a34AoWk0PSayEPDsbS-xTAoNtu5eT-ABdRUbn0qlO83bVWldeHhJ1naXYEr2Xrl923XgKvWU7FmFIdaq23yr4b8ly12K05VvziyLMp_iIecjfoAMFYBb8A-1o/s1600/data_desc.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="323" data-original-width="1124" height="113" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhflKzcXf81MjC3Zmhqc0a34AoWk0PSayEPDsbS-xTAoNtu5eT-ABdRUbn0qlO83bVWldeHhJ1naXYEr2Xrl923XgKvWU7FmFIdaq23yr4b8ly12K05VvziyLMp_iIecjfoAMFYBb8A-1o/s400/data_desc.png" width="400" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div>
<div class="MsoNormal">
<span style="line-height: 17.12px;">The above image shows feature 8 is most important over-all but when it comes to specific predictions. Subset of feature 6 seems more importance for many predictions. We can get good insights from such rules like- sensor 6 in 1 st and 4<sup>th</sup> quadrant has less importance compare to very strong importance in quadrant 2 and 3. If we know the exact feature name we can get lot of valuable insights.<o:p></o:p></span></div>
</div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div>
<div class="MsoNormal">
<b style="mso-bidi-font-weight: normal;"><span style="line-height: 17.12px;">F) SHAP Values explanation-<span style="font-size: 9pt;"><o:p></o:p></span></span></b></div>
<div class="MsoNormal">
<b style="mso-bidi-font-weight: normal;"><span style="line-height: 17.12px;"><br /></span></b></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjZZS3jolGjM-DIF4o8xFV9mXjs1PlYWhrP9PqTgeYiITSR64tAGMOveECzP9_BFiiQ-UAoec719AX0EyGClDXefOzMn0Uk13QEdSV8jGK-gH_GCfa-gSx542TEaomMeAB6csWnOwjQ-KU/s1600/shap_values.PNG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="510" data-original-width="845" height="241" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjZZS3jolGjM-DIF4o8xFV9mXjs1PlYWhrP9PqTgeYiITSR64tAGMOveECzP9_BFiiQ-UAoec719AX0EyGClDXefOzMn0Uk13QEdSV8jGK-gH_GCfa-gSx542TEaomMeAB6csWnOwjQ-KU/s400/shap_values.PNG" width="400" /></a></div>
<div class="MsoNormal">
<b style="mso-bidi-font-weight: normal;"><span style="line-height: 17.12px;"><br /></span></b></div>
<div class="MsoNormal">
<span style="line-height: 17.12px;">In above plot 10 data points from class 1 is selected, we can clearly see for these data points 6 is more important and importance of 8 is changing based on values of features. At the same time feature 1,2,3,5,7, are almost not useful at all for the prediction. ( 1 Series represents 1 observation)<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="line-height: 17.12px;"><br /></span></div>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh9IH0ILVgL4M_ysijKsmUYe0NcnunK1pN-6FdqTDeJ51rIjD9ZFJvtOJUmeTv4H4HBK4_L7E2FG_cHCSRXO64iz2FBvMVVAyWsfcfKIC-72yIQIiMxY9Ij7ZrGDwRCcg3Y6UjWScdzVQQ/s1600/shap_values_explanation2.PNG" imageanchor="1" style="margin-left: 1em; margin-right: 1em; text-align: center;"><img border="0" data-original-height="387" data-original-width="864" height="178" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh9IH0ILVgL4M_ysijKsmUYe0NcnunK1pN-6FdqTDeJ51rIjD9ZFJvtOJUmeTv4H4HBK4_L7E2FG_cHCSRXO64iz2FBvMVVAyWsfcfKIC-72yIQIiMxY9Ij7ZrGDwRCcg3Y6UjWScdzVQQ/s400/shap_values_explanation2.PNG" width="400" /></a></div>
<div>
<br /></div>
<div>
<div class="MsoNormal">
<span style="line-height: 17.12px;">Above plot has 10 observations from class -1. It shows that for class -1 , feature 0 is also important for few predictions and instead of 8 and 6, 7 and 9 are more important.<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="line-height: 17.12px;"><br /></span></div>
<div class="MsoNormal">
<span style="line-height: 17.12px;">Such finding are more important when we have scenarios like multiple fault prediction, anomalies classification I industrial applications. Once we know the actual name of signals we will get very insightful information.<span style="font-size: 9pt;"><o:p></o:p></span></span></div>
</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj9Ds0GN2iHrfNSVPL3iQERcu0lCnlEQGvG4iAefuG1iDB5_v-7Bf46Fl_YJ7rNHkqkrHheAqZZcxah7RGFQXGeUqVGFf2FBenbRimy1gzhwbHzI2JUc5dsHT9v40t3CFj4QAj3jwD8ddw/s1600/sig_c_contribution.PNG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="411" data-original-width="729" height="225" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj9Ds0GN2iHrfNSVPL3iQERcu0lCnlEQGvG4iAefuG1iDB5_v-7Bf46Fl_YJ7rNHkqkrHheAqZZcxah7RGFQXGeUqVGFf2FBenbRimy1gzhwbHzI2JUc5dsHT9v40t3CFj4QAj3jwD8ddw/s400/sig_c_contribution.PNG" width="400" /></a></div>
<div>
<br /></div>
<div>
<div class="MsoNormal">
<span style="line-height: 17.12px;">Above plot shows how signal 6 is mostly useful in prediction but there are many instances when it has no importance on predicted value. Also feature 6 has more classifying power for class 1 rather than -1. Similar analysis can be done on other features for better and exhaustive understanding of features- importance.<span style="font-size: 9pt;"><o:p></o:p></span></span><br />
<span style="line-height: 17.12px;"><br /></span></div>
</div>
<div>
Detailed code is present on Github- <a href="https://github.com/ArpitSisodia/Classfication_explainability"><b>link to github code</b></a></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
</div>
Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-8230692877620938204.post-2460816675769065902019-11-23T21:25:00.000-08:002019-11-23T21:34:03.211-08:00Automation of customer-care tickets resolution using NLP<div dir="ltr" style="text-align: left;" trbidi="on">
When we call customer care, they keep on connecting with different department like technical department, billing department etc. What if they suggest some quick fixes even though they are not expert in providing solution.<br />
<br />
Keeping this in mind, lets build a simple <b>solution recommendation system</b> for internet servive providers based on cosine similarity of earlier questions with the present question. Higher the similarity, solution might be same.Assumption is questions with similar title, content will have similar solution.<br />
<br />
Following steps would be required to implement solution in Python 3.0+-<br />
<br />
1) load nlp libraries.<br />
<br />
2) create dummy data with some questions and answers.<br />
<br />
3) create a function that calculates cosine similarity of new ticket with all existing ticket titles.<br />
<br />
4) show the solution/answer of the ticket that has maximum similarity with present ticket.<br />
<br />
<br />
<br />
<b>Step 1 - import required libraries-</b><br />
<br />
<i><span style="color: blue;">import pandas as pd </span></i><br />
<i><span style="color: blue;">import nltk</span></i><br />
<i><span style="color: blue;">from nltk.corpus import stopwords </span></i><br />
<i><span style="color: blue;">from nltk.tokenize import word_tokenize</span></i><br />
<i style="color: blue;">nltk.download('punkt') </i># this is tokenizer that converts words in to tokens<br />
<i><span style="color: blue;">nltk.download('stopwords')</span> # all the stop words like verbs, prepositions etc. </i><br />
<span style="color: #0b5394;"><br /></span>
<i><span style="color: #0b5394;"><br /></span></i>
<i><span style="color: #0b5394;"><br /></span></i>
<b>Step 2 -create a dummy dataset-</b><br />
<i><span style="color: #0b5394;"><br /></span></i>
<i><span style="color: blue;">question_ans_data= pd.DataFrame()</span></i><br />
<i><span style="color: blue;"><br /></span></i>
<i><span style="color: blue;">question_ans_data['question']= ['there is no internet','no ineternet since last 2 days','net speed is slow','wrong bill','too much charge']</span></i><br />
<i><span style="color: blue;"><br /></span></i>
<br />
<i><span style="color: blue;">question_ans_data['answer']= ['restart router, check if lights blinking','technician will be sent, check lights, restart router','technician will be sent','will get back to you','will get back to you'] </span></i><br />
<i><span style="color: #0b5394;"><br /></span></i>
have a look at data-<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjfWmuz_2zOu2oc7RG1UR6g4KYend2PaTW48_KO02aph7aM2efJs9Ch1yc_sFsC0zRYvvmykopDoUvYfIn-IPB7oF5mLm9QTvDxkrLVG49nUMKXrMGPjvoQLo684MK1GTz-W0mDyc3rAGA/s1600/ticket_data.PNG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="275" data-original-width="573" height="191" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjfWmuz_2zOu2oc7RG1UR6g4KYend2PaTW48_KO02aph7aM2efJs9Ch1yc_sFsC0zRYvvmykopDoUvYfIn-IPB7oF5mLm9QTvDxkrLVG49nUMKXrMGPjvoQLo684MK1GTz-W0mDyc3rAGA/s400/ticket_data.PNG" width="400" /></a></div>
<span style="color: blue;"><i><br /></i></span>
<i><span style="color: blue;"><br /></span></i>
<i><span style="color: blue;"><br /></span></i>
<i><span style="color: blue;"><br /></span></i>
<b><span style="color: blue;"> </span>Step 3</b>- <b>create a function ( set_con) to do text pre-processing and calculate cosine similarity between 2 strings-</b><br />
<br />
<i><span style="color: blue;">def set_con(X, Y):</span></i><br />
<i><span style="color: blue;"> X_list = word_tokenize(X) </span></i><br />
<i><span style="color: blue;"> Y_list = word_tokenize(Y) # convert string into word tokens</span></i><br />
<i><span style="color: blue;"> sw = stopwords.words('english') </span></i><br />
<i><span style="color: blue;"> l1 =[];l2 =[]</span></i><br />
<i><span style="color: blue;"> X_set = {w for w in X_list if not w in sw} # remove stop words</span></i><br />
<i><span style="color: blue;"> Y_set = {w for w in Y_list if not w in sw}</span></i><br />
<i><span style="color: blue;"> rvector = X_set.union(Y_set) </span></i><br />
<i><span style="color: blue;"><br /></span></i>
<i style="color: blue;"> </i># form a set containing keywords of both strings as pre-process step to calculate cosine similarity ( can be calculated from sklearn.matrics also)<br />
<i><span style="color: blue;"> for w in rvector: </span></i><br />
<i><span style="color: blue;"> if w in X_set: l1.append(1) # create a vector </span></i><br />
<i><span style="color: blue;"> else: l1.append(0) </span></i><br />
<i><span style="color: blue;"> if w in Y_set: l2.append(1) </span></i><br />
<i><span style="color: blue;"> else: l2.append(0) </span></i><br />
<i><span style="color: blue;"> c = 0</span></i><br />
<i><span style="color: blue;"><br /></span></i>
<i style="color: blue;"> </i> # cosine formula<br />
<i><span style="color: blue;"> for i in range(len(rvector)): </span></i><br />
<i><span style="color: blue;"> c+= l1[i]*l2[i] </span></i><br />
<i><span style="color: blue;"> cosine = c / float((sum(l1)*sum(l2))**0.5) </span></i><br />
<i><span style="color: blue;"> return(cosine)</span></i><br />
<i><span style="color: blue;"><br /></span></i>
<i><span style="color: blue;"><br /></span></i>
<i><span style="color: blue;"><br /></span></i>
<b>Step 4 create a subject/title of input ticket as a string-</b><br />
<br />
<i><span style="color: blue;">input_ticket= 'broadband internet not working' </span># input ticket</i><br />
<i><span style="color: blue;"><br /></span></i>
<i><span style="color: blue;"><br /></span></i>
<i><span style="color: blue;"><br /></span></i>
<b>Step 5, find similar most similar ticket title/s with existing ticket-</b><br />
<br />
<i style="color: blue;">question_ans_data['cosine_similiarity']= [set_con(x ,input_ticket) for x in question_ans_data['question']] </i># calculating cosine similarity with existing tickets<br />
<i><span style="color: blue;"><br /></span></i>
<i><span style="color: blue;">sorted_main_df=question_ans_data.sort_values(by=['cosine_similiarity'], ascending=False)</span></i><br />
<i><span style="color: blue;"><br /></span></i>
<i style="color: blue;">output_dataset= sorted_main_df[sorted_main_df['cosine_similiarity'] == max(sorted_main_df['cosine_similiarity'])] </i># most similar tickets based on similarity of questions<br />
<i><span style="color: blue;"><br /></span></i>
<i><span style="color: blue;">output_dataset</span></i><br />
<i><span style="color: blue;"><br /></span></i>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi_j3aiLeDUW0yVppuglvqyLQaYVV5vMD2urkA30y53aZS70rgmDdPKZJo7CNh048_gxopstpiHFD3-G-j-saP9YkO-BTnxsx9qbmVj3Dymnn4U4w8fEPhhuqUpoSfCpySB5uHp1Cth8vA/s1600/op_ticket.PNG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="93" data-original-width="588" height="62" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi_j3aiLeDUW0yVppuglvqyLQaYVV5vMD2urkA30y53aZS70rgmDdPKZJo7CNh048_gxopstpiHFD3-G-j-saP9YkO-BTnxsx9qbmVj3Dymnn4U4w8fEPhhuqUpoSfCpySB5uHp1Cth8vA/s400/op_ticket.PNG" width="400" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: left;">
So if the question is ' there is no internet', solution might be to restart router, check light . Given a large data-set with many tickets and possible solution, this can provide great help for customer care executives. </div>
<br />
product recommendation approach in retail industry-<br />
<br />
<a href="https://machinelearningstories.blogspot.com/2016/11/recommendation-engine-market-basket.html" target="_blank">Product Recommendation using MBA</a><br />
<i><span style="color: blue;"><br /></span></i></div>
Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-8230692877620938204.post-70576080442859876372019-09-25T19:54:00.000-07:002019-09-25T19:54:09.788-07:00Sentiment Analysis using NLTK and Sklearn in Python<div dir="ltr" style="text-align: left;" trbidi="on">
Data can be downloaded from -<br />
<br />
<a href="http://www.cs.cornell.edu/people/pabo/movie-review-data/review_polarity.tar.gz">http://www.cs.cornell.edu/people/pabo/movie-review-data/review_polarity.tar.gz</a><br />
<br />
<b>Step 1</b> - loading required libraries<br />
<br />
<i><span style="color: blue;">import os</span> # to check working path</i><br />
<i><span style="color: blue;">from sklearn.datasets import load_files</span> # load_files automatically labels classes when input data is present in different folders</i><br />
<i><span style="color: blue;">import re</span> # for regular expressions</i><br />
<i><span style="color: blue;">import nltk</span> # for nlp </i><br />
<i><span style="color: blue;">from nltk.stem import WordNetLemmatizer</span> # to use WordNet dataset for stemming</i><br />
<i><span style="color: blue;">nltk.download('wordnet')</span></i><br />
<i><span style="color: blue;">from sklearn.feature_extraction.text import TfidfVectorizer</span> # get tf-idf values</i><br />
<i><span style="color: blue;">from sklearn.model_selection import train_test_split</span> # to split testand train dataset</i><br />
<i><span style="color: blue;">from sklearn.ensemble import RandomForestClassifier</span> # for classification</i><br />
<i><span style="color: blue;">from sklearn.metrics import classification_report, confusion_matrix, accuracy_score</span></i><br />
<span style="color: blue; font-style: italic;">import pickle </span># <i>to save model</i><br />
<i><br /></i>
<i><br /></i>
<i><br /></i>
<b>Step 2 - </b>loading data<br />
<br />
<i><span style="color: blue;">movie_data = load_files("C:\\D\\Learning\\Sentiment Analysis usinf sklearn\\txt_sentoken")</span></i><br />
<i><span style="color: blue;">X,y=movie_data.data, movie_data.target</span></i><br />
<i><span style="color: blue;"><br /></span></i>
<i><span style="color: blue;"><br /></span></i>
<i><span style="color: blue;"><br /></span></i>
<b>Step 3</b>- data preprocessing and converting into tf-idf values ( documents are converted into array of all the words ( tf-idf value of every word in every documents)<br />
<br />
<i><span style="color: blue;">new_X= []</span></i><br />
<i><span style="color: blue;">for data in X:</span></i><br />
<i><span style="color: blue;"> data1= str(data)</span></i><br />
<i style="color: blue;"> data2= re.sub(r'[^\w]', " ", data1) </i># replaces all special characters<br />
<i style="color: blue;"> data3= re.sub(r'[\s+\W+\s]', " ", data2) </i># replaces all single letter word<br />
<i style="color: blue;"> data4= re.sub(r'[ ][ ]+', " ", data3) </i># removes multiple spaces<br />
<i style="color: blue;"> data5 = re.sub(r'^b\s+', '', data4) </i># removes leading b<br />
<i><span style="color: blue;"> document = re.sub(r'\s+[a-zA-Z]\s+', ' ', data5) </span># removes single letter</i><br />
<i><span style="color: blue;"> document_splitted= document.lower()</span></i><br />
<i><span style="color: blue;"> document_splitted= document.split() </span># stemming has to be done on strings</i><br />
<i><span style="color: blue;"> stemmer = WordNetLemmatizer()</span></i><br />
<i><span style="color: blue;"> stemmed_doc= [stemmer.lemmatize(word) for word in document_splitted]</span></i><br />
<span style="color: blue; font-style: italic;"> stemmed_str= " ".join(stemmed_doc) </span># converting list back to str<br />
<i style="color: blue;"> new_X.append(stemmed_str) </i># creating list of documents<br />
<i><span style="color: blue;">vectorizer = TfidfVectorizer()</span></i><br />
<i><span style="color: blue;">X= vectorizer.fit_transform(new_X)</span></i><br />
<i><span style="color: blue;">X_arr= X.toarray() </span></i> <br />
<br />
<br />
<br />
<b>Step 4</b>- Getting train and test set and fitting classification <br />
<br />
<span style="color: blue;"><i>X_train, X_test, y_train, y_test = train_test_split(X_arr, y, test_size= .2)</i></span><br />
<div>
<div>
<span style="color: blue;"><i>classifier = RandomForestClassifier(n_estimators=1000, random_state=0)</i></span></div>
<div>
<span style="color: blue;"><i>classifier.fit(X_train, y_train)</i></span></div>
</div>
<i><span style="color: blue;"><br /></span></i>
<i><span style="color: blue;"><br /></span></i>
<br />
<b>Step 5-</b> model Evaluation-<br />
<i><span style="color: blue;"><br /></span></i>
# model evaluation on train data<br />
<span style="color: blue;"><i>y_predicted= classifier.predict(X_train)</i></span><br />
<span style="color: blue;"><i>cf= confusion_matrix(y_train, y_predicted)</i></span><br />
<i><span style="color: blue;"></span></i><br />
<span style="color: blue;"><i>print(classification_report(y_train, y_predicted)) </i></span><br />
# model evaluation on test data<br />
<span style="color: blue;"><i>y_test_predicted= classifier.predict(X_test)</i></span><br />
<span style="color: blue;"><i>print(confusion_matrix(y_test, y_test_predicted))</i></span><br />
<span style="color: blue;"><i></i></span><br />
<span style="color: blue;"><i>print(classification_report(y_test, y_test_predicted))</i></span><br />
<br />
<br />
<br />
<b>Step 6</b>- storing and loading model again-<br />
<br />
<i><span style="color: blue;">with open('text_classifier', 'wb') as picklefile: </span></i><br />
<i><span style="color: blue;">pickle.dump(classifier,picklefile)</span></i><br />
<i><span style="color: blue;">with open('text_classifier', 'rb') as mfile:</span></i><br />
<i><span style="color: blue;">model= pickle.load(mfile)</span></i><br />
<i><span style="color: blue;"><br /></span></i>
<i><span style="color: blue;"><br /></span></i>
<i><span style="color: blue;"><br /></span></i>
<b>Step 7-</b> test on new document<br />
<i><span style="color: blue;"><br /></span></i>
<span style="color: blue;"><i>file1 = open("nerw_review.txt","r")</i></span><br />
<span style="color: blue;"><i>data_file= file1.readlines()</i></span><br />
<span style="color: blue;"><i><br /></i></span>
<i><span style="color: blue;">X1= vectorizer.transform(data_file) </span># vectorizer.transform is used to convert new doc into tf-idf</i><br />
<span style="color: blue;"><i>predict_review= classifier.predict(X1)</i></span><br />
<i><span style="color: blue;"></span></i><br />
<span style="color: blue;"><i>predict_review.view()</i></span><br />
<br /></div>
Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-8230692877620938204.post-51058842532168712532019-07-09T04:36:00.002-07:002019-07-09T04:52:12.445-07:00Deep Learning with H2O in Python<div dir="ltr" style="text-align: left;" trbidi="on">
H2O.ai is focused on bringing AI to businesses through software. Its flagship
product is H2O, the leading open source platform that makes it easy for
financial services, insurance companies, and healthcare companies to deploy AI
and deep learning to solve complex problems. More than 9,000 organizations and
80,000+ data scientists depend on H2O for critical applications like predictive
maintenance and operational intelligence. The company – which was recently
named to the CB Insights AI 100 – is used by 169 Fortune 500 enterprises,
including 8 of the world’s 10 largest banks, 7 of the 10 largest insurance
companies, and 4 of the top 10 healthcare companies. Notable customers
include Capital One, Progressive Insurance, Transamerica, Comcast, Nielsen
Catalina Solutions, Macy’s, Walgreens, and Kaiser Permanente.<br />
<br />
Using in-memory compression, H2O handles billions of data rows in-memory,
even with a small cluster. To make it easier for non-engineers to create complete
analytic workflows, H2O’s platform includes interfaces for R, Python, Scala,
Java, JSON, and CoffeeScript/JavaScript, as well as a built-in web interface,
Flow. H2O is designed to run in standalone mode, on Hadoop, or within a
Spark Cluster, and typically deploys within minutes.<br />
<br />
H2O includes many common machine learning algorithms, such as generalized
linear modeling (linear regression, logistic regression, etc.), Na¨ıve Bayes, principal
components analysis, k-means clustering, and word2vec. H2O implements bestin-class algorithms at scale, such as distributed random forest, gradient boosting,
and deep learning. H2O also includes a Stacked Ensembles method, which finds
the optimal combination of a collection of prediction algorithms using a process
6 | Installation
known as ”stacking.” With H2O, customers can build thousands of models and
compare the results to get the best predictions.<br />
<br />
<b>Here is an example to use H2O-deeplearning in Python- </b><br />
<b><br /></b>
<b><br /></b>
<br />
Step 1- First of all , we need to install H2o package in Python.<br />
<br />
on anaconda prompt<br />
<span style="color: blue;">pip install h2o</span><br />
<br />
<br />
Step 2- Initialize and start the cluster -<br />
<br />
<pre style="background-color: #f7f7f7; border-radius: 2px; border: none; box-sizing: border-box; color: #333333; font-size: 14px; line-height: inherit; overflow-wrap: break-word; overflow: auto; padding: 0px; white-space: pre-wrap; word-break: break-all;"><span class="n" style="box-sizing: border-box;">h2o</span><span class="o" style="box-sizing: border-box; color: #666666;">.</span><span class="n" style="box-sizing: border-box;">init</span><span class="p" style="box-sizing: border-box;">()</span></pre>
<pre style="background-color: #f7f7f7; border-radius: 2px; border: none; box-sizing: border-box; color: #333333; font-size: 14px; line-height: inherit; overflow-wrap: break-word; overflow: auto; padding: 0px; white-space: pre-wrap; word-break: break-all;"><span class="kn" style="box-sizing: border-box; color: green; font-weight: bold;">from</span> <span class="nn" style="box-sizing: border-box; color: blue; font-weight: bold;">h2o.estimators.deeplearning</span> <span class="k" style="box-sizing: border-box; color: green; font-weight: bold;">import</span> <span class="n" style="box-sizing: border-box;">H2ODeepLearningEstimator</span></pre>
<br />
<br />
Step 3- load train and test data set-<br />
<br />
<pre style="background-color: #f7f7f7; border-radius: 2px; border: none; box-sizing: border-box; font-size: 14px; line-height: inherit; overflow-wrap: break-word; overflow: auto; padding: 0px; white-space: pre-wrap; word-break: break-all;"><span style="color: blue;"><span class="n" style="box-sizing: border-box;">train</span> <span class="o" style="box-sizing: border-box;">=</span> <span class="n" style="box-sizing: border-box;">h2o</span><span class="o" style="box-sizing: border-box;">.</span><span class="n" style="box-sizing: border-box;">import_file</span><span class="p" style="box-sizing: border-box;">(</span><span class="s2" style="box-sizing: border-box;">"https://h2o-public-test-data.s3.amazonaws.com/smalldata/iris/iris_wheader.csv"</span><span class="p" style="box-sizing: border-box;">)</span></span></pre>
<br />
<br />
Step 4- Creating test and train data set using split-<br />
<br />
<pre style="background-color: #f7f7f7; border-radius: 2px; border: none; box-sizing: border-box; color: #333333; font-size: 14px; line-height: inherit; overflow-wrap: break-word; overflow: auto; padding: 0px; white-space: pre-wrap; word-break: break-all;"><span class="n" style="box-sizing: border-box;">splits</span> <span class="o" style="box-sizing: border-box; color: #666666;">=</span> <span class="n" style="box-sizing: border-box;">train</span><span class="o" style="box-sizing: border-box; color: #666666;">.</span><span class="n" style="box-sizing: border-box;">split_frame</span><span class="p" style="box-sizing: border-box;">(</span><span class="n" style="box-sizing: border-box;">ratios</span><span class="o" style="box-sizing: border-box; color: #666666;">=</span><span class="p" style="box-sizing: border-box;">[</span><span class="mf" style="box-sizing: border-box; color: #666666;">0.75</span><span class="p" style="box-sizing: border-box;">],</span> <span class="n" style="box-sizing: border-box;">seed</span><span class="o" style="box-sizing: border-box; color: #666666;">=</span><span class="mi" style="box-sizing: border-box; color: #666666;">1234</span><span class="p" style="box-sizing: border-box;">)</span></pre>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjuIhcbB0T1ESH4EZoqU07sL19Pupu0alym7h4v1vgOb23AvMfwtBoWsJZKrnqFC8dfMolbqx1Z4QOIX40HzdmgmM3LMG5z_0ZWgd71q7KP8uQEL-jNSlPlNfWR62hChtc_r60H0MvX4TM/s1600/head_data.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="450" data-original-width="581" height="308" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjuIhcbB0T1ESH4EZoqU07sL19Pupu0alym7h4v1vgOb23AvMfwtBoWsJZKrnqFC8dfMolbqx1Z4QOIX40HzdmgmM3LMG5z_0ZWgd71q7KP8uQEL-jNSlPlNfWR62hChtc_r60H0MvX4TM/s400/head_data.JPG" width="400" /></a></div>
<br />
<br />
Step 5- Configuring the model-<br />
<br />
<span style="color: blue;">model = H2ODeepLearningEstimator(distribution = "AUTO",activation = "RectifierWithDropout",hidden = [32,32],input_dropout_ratio = 0.2,l1 = 1e-5,epochs = 10)</span><br />
<span style="color: blue;"><br /></span>
<span style="color: blue;"><br /></span>Step 6- train(fit the model)-<br />
<br />
<span style="color: blue;">model.train(x="sepal_len", y=["petal_len"], training_frame=splits[0])</span><br />
<span style="color: blue;"><br /></span>
<span style="color: blue;"><br /></span>Step 7- predicting using trained model and creating a new column in test data-<br />
<span style="color: blue;"><br /></span><span style="color: blue;">(splits[1]['predicted_sepal_len'])=model.predict(splits[1])</span><br />
<span style="color: blue;"><br /></span>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEibmUjoW6kyFdIvK848gL9ASNfMmplfCyMfFv4FA0CxLZFa-oqo_fwGsv-AcbU2nkJ1ji6khQjXoHV7IpA8-bvyh6y8vKxMCFCAthO3npb_l5DQEKcHaEXBoDp0zbV666nAZrKZK3H7eWQ/s1600/head_op.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="473" data-original-width="728" height="258" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEibmUjoW6kyFdIvK848gL9ASNfMmplfCyMfFv4FA0CxLZFa-oqo_fwGsv-AcbU2nkJ1ji6khQjXoHV7IpA8-bvyh6y8vKxMCFCAthO3npb_l5DQEKcHaEXBoDp0zbV666nAZrKZK3H7eWQ/s400/head_op.JPG" width="400" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<br />
One can compare sepal_len ( actual) and predicted_sepal_len ( forecasted ) values.<br />
<span style="color: blue;"><br /></span>
<span style="color: blue;"><br /></span></div>
Unknownnoreply@blogger.com1tag:blogger.com,1999:blog-8230692877620938204.post-12139670701128836942019-07-04T07:26:00.001-07:002019-07-05T04:25:36.286-07:00How to survive in data science and the first steps<div dir="ltr" style="text-align: left;" trbidi="on">
<br />
<div class="MsoNormal">
Few years before I read this article and it made sense in
2012-2017-<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<a href="https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century">https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century</a><o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhWJuMen9FSG0w539ZZ4h3dBG8D3uZxg8-ctK3n5ZWhN3e63QUreE9WslCWsKfdG-w8tTp1nugSjjRVj7DE-tWp6gMi45WFj3JYh4CuuZs_COcVd6yMvGJCOmRBE-GiA-uUTuL7vW1w7M4/s1600/start.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="561" data-original-width="752" height="297" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhWJuMen9FSG0w539ZZ4h3dBG8D3uZxg8-ctK3n5ZWhN3e63QUreE9WslCWsKfdG-w8tTp1nugSjjRVj7DE-tWp6gMi45WFj3JYh4CuuZs_COcVd6yMvGJCOmRBE-GiA-uUTuL7vW1w7M4/s400/start.JPG" width="400" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="MsoNormal">
Days are gone when IT organizations are looking for core data science profile which includes doing
research and complete the POC. There is a lot of hype around data science and
in very near future this profile will become obsolete. People in the data science
profile know it. It’s fancy for other IT profiles because a lot of material is bombarded
by training institutes and start ups. Current demand is short
term( Organizations are in exploration phase, what to do with data
and delivering POCs ). Most of the organization are now looking for ML-Engineer
profile which is the combination of 3 profiles- data engineer, data science and
someone who can deploy in production( in cloud most of the time).</div>
<div class="MsoNormal">
<br /></div>
<br />
<div class="MsoNormal">
The sooner the better. So-called data scientist should move
into data engineering and embrace the cloud. Here I have given small
introduction on how to start working on Azure-Databricks so that people like me
can become a better hiring material. <o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Step-1 Create Azure trial account, Databricks Workspace and
launch the workspace<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="MsoNormal">
<a href="https://docs.azuredatabricks.net/getting-started/try-databricks.html">https://docs.azuredatabricks.net/getting-started/try-databricks.html</a><o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Step-2 Data bricks quick start-<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<a href="https://docs.azuredatabricks.net/getting-started/quick-start.html#quick-start">https://docs.azuredatabricks.net/getting-started/quick-start.html#quick-start</a><o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<o:p><br /></o:p></div>
<div class="MsoNormal">
Step-3 Why not try Keras-<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<a href="https://keras.io/">https://keras.io/</a><o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="font-stretch: normal; font-variant-east-asian: normal; font-variant-numeric: normal; line-height: normal; text-indent: -0.25in;"> A) </span><span style="font-size: 7pt; font-stretch: normal; font-variant-east-asian: normal; font-variant-numeric: normal; line-height: normal; text-indent: -0.25in;"> </span><span style="text-indent: -0.25in;">Sequential model is a data structure given in
Keras. One needs to add layers according to NN model-</span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal" style="text-indent: -24px;">
f <span style="color: blue;"> <span style="font-family: "arial" , "helvetica" , sans-serif;"><b style="text-align: center;"><span style="font-size: 9pt;">from</span></b><span style="background: white; font-size: 9pt; text-align: center;">
keras.models </span><b style="text-align: center;"><span style="font-size: 9pt;">import</span></b><span style="background: white; font-size: 9pt; text-align: center;"> Sequential</span></span></span></div>
<div class="MsoNormal" style="text-indent: -24px;">
<span style="background-color: white; font-size: 9pt; text-align: center;"><span style="color: blue; font-family: "arial" , "helvetica" , sans-serif;"> model = Sequential()</span></span></div>
<div class="MsoNormal" style="text-indent: -24px;">
<span style="background-color: white; font-family: "consolas"; font-size: 9pt; text-align: center;"><br /></span></div>
<div class="MsoNormal" style="text-indent: -24px;">
<span style="background-color: white; font-family: "consolas"; font-size: 9pt; text-align: center;"> B</span><span style="text-indent: -0.25in;">)</span><span style="font-size: 7pt; font-stretch: normal; font-variant-east-asian: normal; font-variant-numeric: normal; line-height: normal; text-indent: -0.25in;">
</span><span style="text-indent: -0.25in;">add the layers according to structure of neural network-</span></div>
<div class="MsoNormal" style="text-indent: -24px;">
<span style="text-indent: -0.25in;"><br /></span></div>
<div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; margin-left: .25in; margin-right: 0in; margin-top: 0in;">
<span style="color: blue;"><b><span style="font-family: "consolas"; font-size: 9pt;">from</span></b><span style="background: white; font-family: "consolas"; font-size: 9pt;">
keras.layers </span><b><span style="font-family: "consolas"; font-size: 9pt;">import</span></b><span style="background: white; font-family: "consolas"; font-size: 9pt;"> Dense<o:p></o:p></span></span></div>
<div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0in; margin-left: .25in; margin-right: 0in; margin-top: 0in;">
<span style="color: blue;"><span style="background: white; font-family: "consolas"; font-size: 9pt;">model.add(Dense(units=</span><span style="font-family: "consolas"; font-size: 9pt;">4</span><span style="background: white; font-family: "consolas"; font-size: 9pt;">,
activation=</span><span style="font-family: "consolas"; font-size: 9pt;">'relu'</span><span style="background: white; font-family: "consolas"; font-size: 9pt;">, input_dim=</span><span style="font-family: "consolas"; font-size: 9pt;">2</span><span style="background: white; font-family: "consolas"; font-size: 9pt;">))<o:p></o:p></span></span></div>
<div class="MsoNormal" style="text-indent: -24px;">
</div>
<div class="MsoNormal" style="margin-left: .25in;">
<span style="color: blue;"><span style="background: white; font-family: "consolas"; font-size: 9pt; line-height: 107%;">model.add(Dense(units=</span><span style="font-family: "consolas"; font-size: 9pt; line-height: 107%;">1</span><span style="background: white; font-family: "consolas"; font-size: 9pt; line-height: 107%;">, activation=</span><span style="font-family: "consolas"; font-size: 9pt; line-height: 107%;">'linear'</span><span style="background: white; font-family: "consolas"; font-size: 9pt; line-height: 107%;">))</span></span></div>
<div class="MsoNormal" style="margin-left: .25in;">
<span style="text-indent: -0.25in;"><span style="font-family: "consolas";"><span style="background-color: white; font-size: 12px;"><br /></span></span></span></div>
<div class="MsoNormal" style="margin-left: .25in;">
<span style="text-indent: -0.25in;"><span style="font-family: "consolas";"><span style="background-color: white; font-size: 12px;">C</span></span>)</span><span style="font-size: 7pt; font-stretch: normal; font-variant-east-asian: normal; font-variant-numeric: normal; line-height: normal; text-indent: -0.25in;">
</span><span style="text-indent: -0.25in;">configure the model by passing arguments-</span></div>
<div class="MsoNormal" style="margin-left: .25in;">
<span style="text-indent: -0.25in;"><br /></span></div>
<div class="MsoNormal" style="margin-left: .25in;">
<span style="background: white; font-family: "consolas"; font-size: 9pt; line-height: 107%;"><span style="color: blue;">model.compile(loss='mean_squared_error',<o:p></o:p></span></span></div>
<div class="MsoNormal" style="margin-left: .25in;">
<span style="background: white; font-family: "consolas"; font-size: 9pt; line-height: 107%;"><span style="color: blue;">
optimizer='sgd',<o:p></o:p></span></span></div>
<div class="MsoNormal" style="margin-left: .25in;">
</div>
<div class="MsoNormal" style="margin-left: .25in;">
<span style="background: white; font-family: "consolas"; font-size: 9pt; line-height: 107%;"><span style="color: blue;">
metrics=['mae', 'mape'])<o:p></o:p></span></span></div>
<div class="MsoNormal" style="margin-left: .25in;">
<span style="background: white; font-family: "consolas"; font-size: 9pt; line-height: 107%;"><br /></span></div>
<div class="MsoNormal" style="margin-left: .25in;">
<span style="background: white; font-family: "consolas"; font-size: 9pt; line-height: 107%;"><br /></span></div>
<div class="MsoNormal" style="margin-left: 0.25in; text-indent: 0px;">
<span style="font-family: "consolas"; font-size: 9pt; line-height: 107%; text-indent: -0.25in;"><span style="background-color: white;">D</span>)<span style="font-family: "times new roman"; font-size: 7pt; font-stretch: normal; line-height: normal;"> </span></span><span style="background: white; font-family: "consolas"; font-size: 9pt; line-height: 107%; text-indent: -0.25in;"><span style="font-family: "calibri" , sans-serif; font-size: 11.0pt; line-height: 107%;">creating
X and Y values</span>-</span></div>
<div class="MsoNormal" style="margin-left: .25in;">
<span style="background: white; font-family: "consolas"; font-size: 9pt; line-height: 107%; text-indent: -0.25in;"><br /></span></div>
<div class="MsoNormal">
<span style="background: white; font-family: "consolas"; font-size: 9pt; line-height: 107%;"> <span style="color: blue;">x1
= np.random.randn(10000, 2) <o:p></o:p></span></span></div>
<div class="MsoNormal">
<span style="background: white; font-family: "consolas"; font-size: 9pt; line-height: 107%;"><span style="color: blue;"> dataframe_X=
pd.DataFrame(x1)<o:p></o:p></span></span></div>
<div class="MsoNormal">
<span style="background: white; font-family: "consolas"; font-size: 9pt; line-height: 107%;"><span style="color: blue;"> dataframe_X.columns
=['x1','x2']<o:p></o:p></span></span></div>
<div class="MsoNormal" style="margin-left: .25in;">
<span style="background: white; font-family: "consolas"; font-size: 9pt; line-height: 107%; text-indent: -0.25in;"><span style="color: blue;">
</span></span></div>
<div class="MsoNormal">
<span style="background: white; font-family: "consolas"; font-size: 9pt; line-height: 107%;"><span style="color: blue;"> Y1
= np.random.randn(10000, 1)</span><o:p></o:p></span></div>
<div class="MsoNormal">
<span style="background: white; font-family: "consolas"; font-size: 9pt; line-height: 107%;"><br /></span></div>
<div class="MsoNormal">
<span style="background: white; font-family: "consolas"; font-size: 9pt; line-height: 107%;"> E</span><span style="text-indent: -0.25in;">)</span><span style="font-size: 7pt; font-stretch: normal; font-variant-east-asian: normal; font-variant-numeric: normal; line-height: normal; text-indent: -0.25in;">
</span><span style="text-indent: -0.25in;">fitting the model by calling model.fit</span></div>
<div class="MsoListParagraph" style="mso-list: l0 level1 lfo1; text-indent: -.25in;">
<o:p></o:p></div>
<div class="MsoNormal">
<span style="background: white; font-family: "consolas"; font-size: 9pt; line-height: 107%; text-indent: -0.25in;"><br /></span></div>
<div class="MsoNormal">
<span style="background: white; font-family: "consolas"; font-size: 9pt; line-height: 107%; text-indent: -0.25in;"></span></div>
<div class="MsoNormal">
<span style="background: white; font-family: "consolas"; font-size: 9pt; line-height: 107%;"> <span style="color: blue;"> model.fit(x_train, y_train, epochs=</span></span><span style="color: blue;"><span class="hljs-number"><span style="font-family: "consolas"; font-size: 9pt; line-height: 107%;"><span style="box-sizing: border-box;">5</span></span></span><span style="background: white; font-family: "consolas"; font-size: 9pt; line-height: 107%;">, batch_size=</span><span class="hljs-number"><span style="font-family: "consolas"; font-size: 9pt; line-height: 107%;"><span style="box-sizing: border-box;">32</span></span></span></span><span style="background: white; font-family: "consolas"; font-size: 9pt; line-height: 107%;"><span style="color: blue;">)</span><o:p></o:p></span></div>
<div class="MsoNormal">
<span style="background: white; font-family: "consolas"; font-size: 9pt; line-height: 107%;"><br /></span></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi7lYvJ7Ymac8vCPj-6RTUbfrtb0H-5DihBJLHh8PPdFGu5HnkEK1ebnpeYxWWdBaMnZSSGFBq2xr0-afsrwRjIfGmTUYfNXDZ0iE_HzG8h3B6noALmwfuaT0Z11QQQQHWAr3JpCI6aH3Y/s1600/kears_fitting.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="259" data-original-width="1456" height="112" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi7lYvJ7Ymac8vCPj-6RTUbfrtb0H-5DihBJLHh8PPdFGu5HnkEK1ebnpeYxWWdBaMnZSSGFBq2xr0-afsrwRjIfGmTUYfNXDZ0iE_HzG8h3B6noALmwfuaT0Z11QQQQHWAr3JpCI6aH3Y/s640/kears_fitting.JPG" width="640" /></a></div>
<div class="MsoNormal">
<span style="background: white; font-family: "consolas"; font-size: 9pt; line-height: 107%;"><br /></span></div>
<div class="MsoListParagraph" style="mso-list: l0 level1 lfo1; text-indent: -.25in;">
<span style="background: white; font-family: "consolas"; font-size: 9pt; line-height: 107%;"><o:p></o:p></span></div>
<div class="MsoNormal">
<span style="background: white; font-family: "consolas"; font-size: 9pt; line-height: 107%;"><br /></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoListParagraph" style="mso-list: l0 level1 lfo1; text-indent: -.25in;">
<!--[if !supportLists]--><span style="font-family: "consolas"; font-size: 9pt; line-height: 107%;"> F</span><span style="text-indent: -0.25in;">)</span><span style="font-size: 7pt; font-stretch: normal; font-variant-east-asian: normal; font-variant-numeric: normal; line-height: normal; text-indent: -0.25in;"> </span><span style="font-family: "calibri" , sans-serif; font-size: 11.0pt; line-height: 107%;">model </span><span style="text-indent: -0.25in;">evaluation</span><span style="text-indent: -0.25in;">-</span></div>
<div class="MsoListParagraph" style="mso-list: l0 level1 lfo1; text-indent: -.25in;">
<o:p></o:p></div>
<div class="MsoListParagraph" style="mso-list: l0 level1 lfo1; text-indent: -.25in;">
<br /></div>
<div class="MsoNormal">
<span style="background: white; font-family: "consolas"; font-size: 9pt; line-height: 107%;"><span style="color: blue;"> evaluation_metrics=
model.evaluate(x_test, y_test)</span></span><o:p></o:p></div>
<div class="MsoNormal">
<span style="background: white; font-family: "consolas"; font-size: 9pt; line-height: 107%;"><br /></span></div>
<div class="MsoListParagraph" style="mso-list: l0 level1 lfo1; text-indent: -.25in;">
<span style="background: white; font-family: "consolas"; font-size: 9pt; line-height: 107%;"><o:p></o:p></span></div>
<div class="MsoListParagraph" style="mso-list: l0 level1 lfo1; text-indent: -.25in;">
<o:p></o:p></div>
<div class="MsoListParagraph" style="mso-list: l0 level1 lfo1; text-indent: -.25in;">
<o:p></o:p></div>
<div class="MsoNormal" style="margin-left: .25in;">
<span style="background: white; font-family: "consolas"; font-size: 9pt; line-height: 107%;"><br /></span></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjlrYZq9y3vrsnTPJbqeJq6WaLtBDVCT0qqhla6Rp3fX1SnDfide4JSC07gE6HB3DF00S1ukGDRdSgqMW4gsTh8zidKqvc9iZX2E9_I50jgQx7ti5vUePj4FSyKtfw7VgooaERs3tVCvVw/s1600/results.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="135" data-original-width="659" height="81" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjlrYZq9y3vrsnTPJbqeJq6WaLtBDVCT0qqhla6Rp3fX1SnDfide4JSC07gE6HB3DF00S1ukGDRdSgqMW4gsTh8zidKqvc9iZX2E9_I50jgQx7ti5vUePj4FSyKtfw7VgooaERs3tVCvVw/s400/results.JPG" width="400" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="MsoListParagraph" style="mso-list: l0 level1 lfo1; text-indent: -.25in;">
<!--[if !supportLists]--><span style="font-family: "consolas"; font-size: 9pt; line-height: 107%;"> G)<span style="font-family: "times new roman"; font-size: 7pt; font-stretch: normal; line-height: normal;"> </span></span><span style="text-indent: -0.25in;">use model for prediction</span></div>
<div class="MsoListParagraph" style="mso-list: l0 level1 lfo1; text-indent: -.25in;">
<o:p></o:p></div>
<div class="MsoListParagraph" style="mso-list: l0 level1 lfo1; text-indent: -.25in;">
<span style="background: white; font-family: "consolas"; font-size: 9pt; line-height: 107%;"><br /></span></div>
<div class="MsoNormal">
<span style="background: white; font-family: "consolas"; font-size: 9pt; line-height: 107%;"> <span style="color: blue;"> predicted_value =
model.predict(dataframe_X)</span><o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="background: white; font-family: "consolas"; font-size: 9pt; line-height: 107%;"> <o:p></o:p></span>f) testing on same data-</div>
<div class="MsoNormal">
<o:p></o:p></div>
<div class="MsoNormal">
<span style="background: white; font-family: "consolas"; font-size: 9pt; line-height: 107%;"><br /></span></div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="MsoNormal">
<span style="background: white; font-family: "consolas"; font-size: 9pt; line-height: 107%;"> <span style="color: blue;">predicted_vals = model.predict(x_test,
batch_size=</span></span><span class="hljs-number"><span style="color: blue;"><span style="font-family: "consolas"; font-size: 9pt; line-height: 107%;"><span style="box-sizing: border-box;">32</span></span><span style="background: white; font-family: "consolas"; font-size: 9pt; line-height: 107%;">)</span></span><o:p></o:p></span></div>
<div class="MsoNormal">
<span class="hljs-number"><span style="background: white; font-family: "consolas"; font-size: 9pt; line-height: 107%;"><br /></span></span></div>
<div class="MsoNormal">
<span class="hljs-number"><span style="background: white; font-family: "consolas"; font-size: 9pt; line-height: 107%;"><br /></span></span></div>
<div class="MsoNormal">
Although this code is written in python but now we have run
first ML program on databricks. One should start replacing python commands with PySpark
commands make it a habit over time. <o:p></o:p></div>
<div class="MsoNormal">
</div>
<div class="MsoNormal">
In production, this notebook will read run time data by
scheduling a job( <a href="https://docs.databricks.com/user-guide/jobs.html#create-a-job"><span style="color: windowtext; text-decoration-line: none;">how to
schedule a job in data bricks</span></a>) and from notebook one can save
predicted values in any database which can further be read by visualization
tool/ another application.<o:p></o:p></div>
<div class="MsoNormal">
<span style="background: white; font-family: "consolas"; font-size: 9pt; line-height: 12.84px;"><br /></span></div>
<div class="MsoNormal">
<span class="hljs-number"><span style="background: white; font-family: "consolas"; font-size: 9pt; line-height: 107%;">
</span></span></div>
<div class="MsoNormal">
Data scientist should come out of pure research, statistics,
R/Python profile to be stay relevant in IT industry. Remember golden words by Charles
Darwin-<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh3jgyj2Cay_AlXwXuxGMgz_sPJ6qNTCHHR0NhEJMV2n-jzRYvONhtn0wEfO6TI42U6PtE1Q9tkH7tdn01zj1gkfPcyml0BoDRGc31OSaREQehnTwJUZS76yj0Amm_PdTvEmLAcPiFL9Pg/s1600/darvin.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="487" data-original-width="736" height="263" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh3jgyj2Cay_AlXwXuxGMgz_sPJ6qNTCHHR0NhEJMV2n-jzRYvONhtn0wEfO6TI42U6PtE1Q9tkH7tdn01zj1gkfPcyml0BoDRGc31OSaREQehnTwJUZS76yj0Amm_PdTvEmLAcPiFL9Pg/s400/darvin.JPG" width="400" /></a></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal" style="margin-left: .25in;">
<span style="background: white; font-family: "consolas"; font-size: 9pt; line-height: 107%;"><br /></span></div>
<div class="MsoNormal" style="margin-left: .25in;">
<span style="background: white; font-family: "consolas"; font-size: 9pt; line-height: 107%;"><br /></span></div>
<div class="MsoListParagraph" style="mso-list: l0 level1 lfo1; text-indent: -.25in;">
<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
</div>
Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-8230692877620938204.post-68808639594792912192018-12-14T02:32:00.000-08:002019-11-29T07:53:36.655-08:00Easiest and most effective way of detection abnormality/ outlier in time-series data <div dir="ltr" style="text-align: left;" trbidi="on">
We have read many blogs on various anomaly detection algorithms. Many a times, we don't need any algorithm to detect abnormality in a system. <br />
<br />
<a href="https://machinelearningstories.blogspot.com/2018/07/anomaly-detection-anomaly-detection-by.html" target="_blank">Different machine learning approaches to detect abnormality in system</a> .<br />
<br />
data scientists are using muti-angle PCA to auto-encoders to detect abnormality in a time series data. There are other complex techniques like <a href="https://machinelearningstories.blogspot.com/2018/08/anomaly-detection-in-high-dimensional.html" target="_blank">ABOD</a> ;used in high dimensional data and <a href="https://machinelearningstories.blogspot.com/2018/09/connectivity-based-outlier-detection.html" target="_blank">CBOF</a> ; used when density based algorithms fail. These techniques are effective only if you know the properties of expected abnormality in system.<br />
<br />
The most effective approach as mentioned in <a href="https://machinelearningstories.blogspot.com/2018/07/anomaly-detection-anomaly-detection-by.html" target="_blank">Anomaly detection approaches</a> , is building an expected rule from the variables involved and any deviation form this rule is indication of abnormality in time series. One can use auto encoder , PCA or regression to build such rules. We are using regression so that audience understand the concept and don't get bogged down by related algorithms.<br />
<br />
<br />
We can take any home appliance for example like Electric Fan. Let's say we know the temperature of fan's motor and current going into it.<br />
<br />
<span style="color: blue;"><i>from sklearn import datasets, linear_model</i></span><br />
<span style="color: blue;"><i>import matplotlib.pyplot as plt</i></span><br />
<span style="color: blue;"><i>import pandas as pd</i></span><br />
<span style="color: blue;"><i>import numpy as np</i></span><br />
<br />
# take data values from normal running scenario, hopefully there is no issue in motor now. Generally this is the time when fan is just installed -<br />
<br />
# creating a dummy data<br />
<span style="color: blue;"><i>data = [[352,88],[350,90],[350,89],[400,95],[400,94], [390,92], [400,93], [352,88],[350,90],[352,91],[400,95],[400,94], [390,92], [400,93],[350,90],[350,89],[400,95]]</i></span><br />
<br />
<i><span style="color: blue;">df = pd.DataFrame(data,columns=['Current','Temp'],dtype=float)</span></i><br />
<br />
# taking independent ( current) and dependent variable ( temperature) for relation ( to build using regression )<br />
<span style="color: blue;"><i>X= df['Current']</i></span><br />
<span style="color: blue;"><i>X1= X.values.reshape(X.size,1)</i></span><br />
<span style="color: blue;"><i>Y= df['Temp']</i></span><br />
<span style="color: blue;"><i>Y1 = Y.values.reshape(Y.size,1)</i></span><br />
<br />
# fitting the regression model<br />
<i><span style="color: blue;">regr = linear_model.LinearRegression()</span></i><br />
<i><span style="color: blue;">regr.fit(X1, Y1)</span></i><br />
<i><span style="color: blue;">predictions =regr.predict(X1)</span></i><br />
<br />
# plotting error and analyzing it<br />
<span style="color: blue;"><i>error =Y1- predictions</i></span><br />
<span style="color: blue;"><i>plt.plot(error)</i></span><br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjI9tYkWMrkxCS7bDOjZlwmqVRC8H47bvkRDYZVcGp-m6p7z1yXpWxJktUALoobUcnw0zKaKUSnP58lJeTQBTcS1UcEuIZt0ITjFGol-AFomlNlRL92yPutAolZ8kt5Clq_RQSjQnSnsYA/s1600/train+Error.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="326" data-original-width="485" height="215" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjI9tYkWMrkxCS7bDOjZlwmqVRC8H47bvkRDYZVcGp-m6p7z1yXpWxJktUALoobUcnw0zKaKUSnP58lJeTQBTcS1UcEuIZt0ITjFGol-AFomlNlRL92yPutAolZ8kt5Clq_RQSjQnSnsYA/s320/train+Error.JPG" width="320" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
plot shows that values are lying randomly between y=0 and error is in between +- 1.5. Seems a good fit. Thus we get a relation between current and temperature of motor. If we know the actual current, we can predict temperature with some accuracy. Now the concept is - 'if actual temp is far more than what it should be ( predicted from current values), then there might be some thermal abnormality in the motor. Lets extend our example further-<br />
<br />
<br />
Taking run time data( run time values of current and temperature) from fan now;<br />
<br />
<span style="color: blue;"><i>test_X= np.array([400,380,370,355, 370,370,350, 360, 355,352,350,350,400,400,390,400,400,380,400,380,390,400,350,350])</i></span><br />
<span style="color: blue;"><i>test_Y= np.array([96,94,93, 92, 93,98,97, 98,97,88,90,89,95,94,92,94,96,94,96,93,92,94,90,90])</i></span><br />
<br />
# predicting temperature for the present values of current ( at run time)<br />
<span style="color: blue;"><i>test_X1= test_X.reshape(test_X.size,1)</i></span><br />
<span style="color: blue;"><i>test_Y1 = test_Y.reshape(test_Y.size,1)</i></span><br />
<span style="color: blue;"><i>run_time_predictions =regr.predict(test_X1)</i></span><br />
<br />
# plotting the errors<br />
<span style="color: blue;"><i>plt.plot(test_Y1- run_time_predictions)</i></span><br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgajOT9eL6CGNRh0do7OXWuhyXIkFLbFOwdkSSo09p4Xs7TH43FyO5ZjV1SgjMKTihgh0Efbna8vPxGvKJWTlhymPaTezZG9ya57xSi8_hKp2cU8V-JUsuOFz5j0qjvB6sqHc7kJCuIIT8/s1600/test_error.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="310" data-original-width="458" height="216" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgajOT9eL6CGNRh0do7OXWuhyXIkFLbFOwdkSSo09p4Xs7TH43FyO5ZjV1SgjMKTihgh0Efbna8vPxGvKJWTlhymPaTezZG9ya57xSi8_hKp2cU8V-JUsuOFz5j0qjvB6sqHc7kJCuIIT8/s320/test_error.JPG" width="320" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
Error seems high for few minutes ( between 5 to 9) . Lets combine both test and train error values.-<br />
<br />
# combining train and test errors to include longer period of time in analysis<br />
<span style="color: blue;"><i>X_values= np.concatenate((X1, test_X1), axis=0)</i></span><br />
<span style="color: blue;"><i>Y_values= np.concatenate((Y1, test_Y1))</i></span><br />
<span style="color: blue;"><i>prections_values= np.concatenate((predictions,run_time_predictions))</i></span><br />
<span style="color: blue;"><i>Error_values= Y_values- prections_values</i></span><br />
<span style="color: blue;"><i>plt.plot(Error_values)</i></span><br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiItrAD0B8KWH0NsOYOzEP1Kd94DEqrcTgLBJZitaWXS7mf_NDA3u6y1tNnuoVdJjY_ESBynVSWvfF1tXoXTcNSjf__aOr3bLmvnNQVOEta1VFeOtEw13qTcUURvi8K_nCu2ZEo9W0vf8Q/s1600/test_train.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="321" data-original-width="461" height="222" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiItrAD0B8KWH0NsOYOzEP1Kd94DEqrcTgLBJZitaWXS7mf_NDA3u6y1tNnuoVdJjY_ESBynVSWvfF1tXoXTcNSjf__aOr3bLmvnNQVOEta1VFeOtEw13qTcUURvi8K_nCu2ZEo9W0vf8Q/s320/test_train.JPG" width="320" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
Y values( errors) near x=20, shows that temperature is far more that expected for specific amount of current flow. This has to be investigated further. ( coolant might not be working, spark is happening etc) . After 23-24, motor is running fine again as error is randomly distributed along y=0.<br />
<br />
Thus at run time high error ( positive, ie. actual more than expected) is an indicator of abnormal system. I don't know how it came like somebody is showing middle finger, but exactly the middle finger is abnormal here. haha!!<br />
<br />
The Github link for the same is present at - <a href="https://github.com/ArpitSisodia/Anomaly_Detection/blob/master/Anomaly%20Detection%20through%20Linear%20Regression.ipynb">Python_Regression_Anomaly_Detection</a><br />
<br />
Read about the mother of all time series algorithms here- <a href="https://machinelearningstories.blogspot.com/2018/08/forecasthybrid-daddy-of-all-time-series.html" target="_blank">ForecastHybrid</a><br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br /></div>
Unknownnoreply@blogger.com3Bengaluru, Karnataka, India12.9715987 77.59456269999998312.4764182 76.949115699999979 13.4667792 78.240009699999987tag:blogger.com,1999:blog-8230692877620938204.post-86481306993983104162018-10-06T12:47:00.000-07:002018-10-06T12:47:08.640-07:00Religious demographics of India in future: A Machine Learning View<div dir="ltr" style="text-align: left;" trbidi="on">
According to Sachar Committee ( ref-1) report in 2005, the religious demographics of India for next 100 years is below-<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiBHkj7fNL5H0y5Y1qGwpl9cNrH0ZH2OahOWKm0qNUSXeJg7VHaZlpx1uLnK1ia-sd3n6GkN2WxYNaenXOTJt5MUBjfEb63Pg5YtGhRSn3kSSZl3fUToexQ4zGyIC6wNQ4lJMo4YgOLOFA/s1600/Sachar_report.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="441" data-original-width="747" height="235" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiBHkj7fNL5H0y5Y1qGwpl9cNrH0ZH2OahOWKm0qNUSXeJg7VHaZlpx1uLnK1ia-sd3n6GkN2WxYNaenXOTJt5MUBjfEb63Pg5YtGhRSn3kSSZl3fUToexQ4zGyIC6wNQ4lJMo4YgOLOFA/s400/Sachar_report.JPG" width="400" /></a></div>
<br />
<br />
We took a machine learning approach and built different time series' to show demographics( of 2 major religion) in coming years. The data is taken from Wikipedia ( 2011 Census of India; ref 2) . Data used is given below-<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiSP8VxjjfEpUH1OTlUPZEL6tY-9LHdehxemwONOfcXuUONBMuoer9tSMx6Vj3Fbp_R5xUM8ZXWs4ElOQCbdFRY4DWKm8qQPzAMvWlOzLB6OZShlyV-5nxVCV4S1fPalfXq2RuI0ZrzyJQ/s1600/data.JPG" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" data-original-height="349" data-original-width="1160" height="192" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiSP8VxjjfEpUH1OTlUPZEL6tY-9LHdehxemwONOfcXuUONBMuoer9tSMx6Vj3Fbp_R5xUM8ZXWs4ElOQCbdFRY4DWKm8qQPzAMvWlOzLB6OZShlyV-5nxVCV4S1fPalfXq2RuI0ZrzyJQ/s640/data.JPG" width="640" /></a></div>
<br />
<br />
<br />
Above image clearly shows that Hinduism is major religion followed by Islam. Lets create a new variable ratio of 'Hinduism to Islam' for these 70 years-<br />
<br />
for 1951 ratio is 84.1/9.8, which is 8.581633, similarly for other decades-<br />
<br />
8.581633, 7.806361, 7.380018, 7.004255, 6.465504, 5.991065, 5.607871,<br />
<br />
so Hinduism which was 8.5 times of Islam in 1951 is 5.6 times in 2011.<br />
<br />
Now, let's build Arima time-series on ratio variable-<br />
<br />
<span style="color: blue;">comman_ratio <- auto.arima(ratio)</span><br />
<span style="color: blue;">forecasted_ratio <-forecast(comman_ratio, 10)</span><br />
<span style="color: blue;"><br /></span>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgNgBCudUMGkDJVSdTlb_Xt4nnpanWGSlF50fQDbm_qbp55YB-NeWW4HlthgvFWOF6nRcQum5qmuneaI9aW78nGyv9OycIp-r6-MHZkf-WsJLlFEsLEp-w-voz-MvKbFHX0Bqvynz5Edf0/s1600/ratio_table.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="425" data-original-width="268" height="400" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgNgBCudUMGkDJVSdTlb_Xt4nnpanWGSlF50fQDbm_qbp55YB-NeWW4HlthgvFWOF6nRcQum5qmuneaI9aW78nGyv9OycIp-r6-MHZkf-WsJLlFEsLEp-w-voz-MvKbFHX0Bqvynz5Edf0/s400/ratio_table.JPG" width="251" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhqTJRwHwqAjZYzgVgz5tl50EpOrQ7P_9JBsyiuZz5XxT-HbEruXzf2Q6pZvWeTrp0HXemxEi2gruid4oo3ZKMbmD_3URjhseCg_Nq7a8z8DmYyGsgxvY5hkJyjV-CjADU85WrCVRtsqko/s1600/ratio.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="618" data-original-width="745" height="330" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhqTJRwHwqAjZYzgVgz5tl50EpOrQ7P_9JBsyiuZz5XxT-HbEruXzf2Q6pZvWeTrp0HXemxEi2gruid4oo3ZKMbmD_3URjhseCg_Nq7a8z8DmYyGsgxvY5hkJyjV-CjADU85WrCVRtsqko/s400/ratio.png" width="400" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
Above table and Image shows that around 2100, Islam and Hinduism will have equal number of followers. Is this forecasting correct??<br />
<br />
<br />
<br />
Let's build another time series with different ratio, now variable is ratio of Islam to Hinduism population. This variable gives the percentage of Islam respect to Hinduism population in India.<br />
<br />
0.1165279, 0.1281007 ,0.1355010, 0.1427704, 0.1546670, 0.1669152, 0.1783208 ( ratio1)<br />
<br />
in 1951, Islam is 11 % of total Hinduism and in 2011 it's 17 % of total Hinduism in India.<br />
<br />
<span style="color: blue;">comman_ratio1 <- auto.arima(ratio1)</span><br />
<span style="color: blue;">forecasted_ratio <-forecast(comman_ratio, 80)</span><br />
<br />
<span style="color: blue;">qq <- c(ratio1, forecasted_ratio$mean)</span><br />
<span style="color: blue;">year= seq(from = 1951, to=2811, by=10)</span><br />
<span style="color: blue;">df <- data.frame(percentage_of_islam_compare_to_hinduism= qq, year =year )</span><br />
<span style="color: blue;">ggplot2::ggplot(df, aes(year, percentage_of_islam_compare_to_hinduism)) + geom_line()</span><br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjNt8NfcqFqMrvJ4uvMIReR6G5NzF_SIRLNdcFNqrIgOr6foO_jCuBJVwVqMAs46wkHZnGScC5r53rSeey56mnPG8dA5JIOmpuQeCmIs6Z1WFtgratct9bLERwgNycGSkMyV-YRtEg7ghU/s1600/islam_to_hindu.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="618" data-original-width="745" height="331" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjNt8NfcqFqMrvJ4uvMIReR6G5NzF_SIRLNdcFNqrIgOr6foO_jCuBJVwVqMAs46wkHZnGScC5r53rSeey56mnPG8dA5JIOmpuQeCmIs6Z1WFtgratct9bLERwgNycGSkMyV-YRtEg7ghU/s400/islam_to_hindu.png" width="400" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgVKs5OfXth5exvkqPZos9nCx5UzhSSphae87GyTqcReh2c1fMF6401tlsSb-ZLabqgvY82EDulsCrwPN1rUDN6LQlMYRKBBurDYWF_ZmhcuvI6d_rZBYDQxJ-e28W1BHidIspWJ_pjZQ4/s1600/islam_tohindu.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="416" data-original-width="339" height="400" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgVKs5OfXth5exvkqPZos9nCx5UzhSSphae87GyTqcReh2c1fMF6401tlsSb-ZLabqgvY82EDulsCrwPN1rUDN6LQlMYRKBBurDYWF_ZmhcuvI6d_rZBYDQxJ-e28W1BHidIspWJ_pjZQ4/s400/islam_tohindu.JPG" width="325" /></a></div>
<br />
so this forecasting says that Islam is not going to be equal but 28% of total Hinduism and with current growth rate it would take 800 years for Islam to become equal to Hinduism in terms of followers.<br />
<br />
So what is correct composition of demographics in 2100? Machine learning is giving different results based on variable taken. Plus 7 data points are not sufficient to forecast future 70 values. ☺☺Results might be different if we had taken only population of religions not the ratios. <br />
<br />
<br />
ref:-<br />
<br />
1) <a href="https://en.wikipedia.org/wiki/Sachar_Committee" target="_blank">Sachar_Committee</a><br />
2) <a href="https://en.wikipedia.org/wiki/2011_Census_of_India" target="_blank">2011_Census_of_India</a><br />
3) <a href="https://www.quora.com/What-was-the-Muslim-population-in-India-in-1947-and-now-in-2016" target="_blank">https://www.quora.com/What-was-the-Muslim-population-in-India-in-1947-and-now-in-2016</a></div>
Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-8230692877620938204.post-7121276912721748012018-09-25T11:33:00.001-07:002018-09-25T11:52:46.521-07:00Connectivity Based Outlier Detection and its implementation in R<div dir="ltr" style="text-align: left;" trbidi="on">
<span style="font-family: "calibri" , sans-serif; font-size: 14.0pt; line-height: 107%;">Identifying
abnormality in any industrial process, banking fraud, ad clicks etc is one of
the major challenges for data scientist. There are many ways of detecting an
abnormality.</span><br />
<span style="font-family: "calibri" , sans-serif; font-size: 14.0pt; line-height: 107%;"><br /></span>
<span style="font-family: "calibri" , sans-serif; font-size: 14.0pt; line-height: 107%;"><a href="https://machinelearningstories.blogspot.com/2018/07/anomaly-detection-anomaly-detection-by.html" target="_blank">different ways of detecting abnormalities through machine learning</a></span><br />
<br />
<span style="font-family: "calibri" , sans-serif; font-size: 14.0pt; line-height: 107%;">There are many
outlier detection techniques. One of these is connectivity based
outlier factor. It is an improved version of LOF (local outlier factor)
technique. </span><br />
<span style="font-family: "calibri" , sans-serif; font-size: 14.0pt; line-height: 107%;"><br /></span>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjwF9hxECbp3uchnDfXUx-mXY8sZ1lReWnrTG0uff5Cl5FN-pyWzAcYfg2sqIDjxoBUCTBrcVS5SjpHRSLiMKEtKPjm6Igehi-7Z1fAjORyWKC2pva06DNgJjyzjg8977xYWuo4t3d1epM/s1600/lof_weekness.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="349" data-original-width="533" height="209" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjwF9hxECbp3uchnDfXUx-mXY8sZ1lReWnrTG0uff5Cl5FN-pyWzAcYfg2sqIDjxoBUCTBrcVS5SjpHRSLiMKEtKPjm6Igehi-7Z1fAjORyWKC2pva06DNgJjyzjg8977xYWuo4t3d1epM/s320/lof_weekness.JPG" width="320" /></a></div>
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiA_WM8RuFVSg65ssA6jCd-HSriAaSEni-q4A40xgS-GpZGs5qFyHr8DVNd9U5R2ReLt6kTMwCSXvgIeGGRwXq2qgTzHGF5Z3y89QDAdgKG9btsDry7tiim7FubFPo9lsSPCaBTmouJ4hs/s1600/lof_fails.JPG" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="344" data-original-width="529" height="208" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiA_WM8RuFVSg65ssA6jCd-HSriAaSEni-q4A40xgS-GpZGs5qFyHr8DVNd9U5R2ReLt6kTMwCSXvgIeGGRwXq2qgTzHGF5Z3y89QDAdgKG9btsDry7tiim7FubFPo9lsSPCaBTmouJ4hs/s320/lof_fails.JPG" width="320" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">data point away in linear set of data points<br />
should have been picked as outlier</td></tr>
</tbody></table>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="MsoNormal">
<span style="font-size: 14.0pt; line-height: 107%;">The idea of
Connectivity based outlier algorithm is to assign degree of outlier to each
data point. This degree of outlier is called connectivity based outlier factor;
COF of the data point. High COF value of data point represent the high
probability of being an outlier.<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-size: 14.0pt; line-height: 107%;">Let’s
understand COF step by step with an example.<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-size: 14.0pt; line-height: 107%;"><br /></span></div>
<div class="MsoNormal">
<span style="font-size: 14.0pt; line-height: 107%;">Below
diagram shows 9 data points in the plane. As we can see there are 2 data points
P1 and P2 which are away from the trend line and seems outlier. The COF value
for P1 and P2 should be higher than other data points in the trend line. Here
we are taking k=5 nearest neighbor for COF calculation. <o:p></o:p></span></div>
<div class="separator" style="clear: both;">
</div>
<div class="MsoNormal">
<span style="font-size: 14.0pt; line-height: 107%;">Following
steps to compute the COF value for a data point P1.</span><br />
<span style="font-size: 14pt; line-height: 107%; text-indent: -0.25in;"><b><br /></b></span>
<span style="font-size: 14pt; line-height: 107%; text-indent: -0.25in;"><b>1) <span style="font-size: 7pt; font-stretch: normal; font-variant-east-asian: normal; font-variant-numeric: normal; line-height: normal;"> </span></b></span><span style="font-size: 14pt; line-height: 107%; text-indent: -0.25in;"><b>Find k nearest neighbor (k-NN)</b> of the
data point P. (k=5)</span></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiFi_I0gM8zChKO30IqozcKdx3KAKwxmHu6F5uW6KEHfeCtQpxpAPGWDoVnj6qMXSPgZj0sAUV5w46SGlxxvKmxLrfiWQSgSgDwZCZz2zxRxrinzxXSF5iauuM_liEqb2qI_K8IbiO2s2A/s1600/image2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="264" data-original-width="771" height="136" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiFi_I0gM8zChKO30IqozcKdx3KAKwxmHu6F5uW6KEHfeCtQpxpAPGWDoVnj6qMXSPgZj0sAUV5w46SGlxxvKmxLrfiWQSgSgDwZCZz2zxRxrinzxXSF5iauuM_liEqb2qI_K8IbiO2s2A/s400/image2.png" width="400" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="MsoNormal">
<span style="font-size: 14.0pt; line-height: 107%;">N<sub>5 </sub>(P1)
= {P2, P5, P4, P7, P6} create set of all data points nearer to P1. <o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-size: 14pt; line-height: 107%; text-indent: -0.25in;"><b><br /></b></span>
<span style="font-size: 14pt; line-height: 107%; text-indent: -0.25in;"><b>2) <span style="font-size: 7pt; font-stretch: normal; font-variant-east-asian: normal; font-variant-numeric: normal; line-height: normal;"> </span></b></span><span style="font-size: 14pt; line-height: 107%; text-indent: -0.25in;"><b>Find Set based nearest (SBN) path:</b>
represent k nearest data points in order s={P<sub>1</sub>,P<sub>2</sub>,……., P<sub>k</sub>}</span></div>
<div class="MsoNormal">
<span style="font-size: 14.0pt; line-height: 107%;">SBN path = {P1,
P2, P5, P4, P6, P7}, arrange data points in such a way that it should create a
path, like P2 is the nearest data point from P1 then P5 is the nearest data
point from P2, then either P6 or P4 can be choose as nearest data from P5 then
P7 is the nearest data point from P6. All chosen data points must be available
in nearest neighbor data points N<sub>5 </sub>(P1) set.<o:p></o:p></span></div>
<div class="MsoNormal">
<b style="text-indent: -0.25in;"><span style="font-size: 14.0pt; line-height: 107%; mso-bidi-font-family: Calibri; mso-bidi-theme-font: minor-latin;"><br /></span></b>
<b style="text-indent: -0.25in;"><span style="font-size: 14.0pt; line-height: 107%; mso-bidi-font-family: Calibri; mso-bidi-theme-font: minor-latin;">3) <span style="font-size: 7pt; font-stretch: normal; font-variant-east-asian: normal; font-variant-numeric: normal; line-height: normal;"> </span></span></b><span style="font-size: 14pt; line-height: 107%; text-indent: -0.25in;"><b>Find set based nearest (SBN) trail:</b>
represent sequence of edges based on SBN path e={e<sub>1</sub>,e<sub>2</sub>, …,e<sub>k</sub>}.
SBN trail = {(P1, P2), (P2, P5), (P5, P4), (P5, P6), (P6, P7)} arrange set of
data points with respect to edges e1, e2, e3, e4, e5 respectively.</span></div>
<div class="MsoNormal">
<span style="font-size: 14pt; line-height: 107%; text-indent: -0.25in;"><b><br /></b></span>
<span style="font-size: 14pt; line-height: 107%; text-indent: -0.25in;"><b>4) <span style="font-size: 7pt; font-stretch: normal; font-variant-east-asian: normal; font-variant-numeric: normal; line-height: normal;"> </span></b></span><span style="font-size: 14pt; line-height: 107%; text-indent: -0.25in;"><b>Find the cost of SBN trail:</b> represent
the distance between 2 data point (edge value) - Cost description = {3, 2, 1,
1, 1} weight of each edge.</span></div>
<div class="MsoListParagraphCxSpLast">
<br /></div>
<div class="MsoNormal">
<span style="font-size: 14pt; line-height: 107%; text-indent: -0.25in;"><b>5) <span style="font-size: 7pt; font-stretch: normal; font-variant-east-asian: normal; font-variant-numeric: normal; line-height: normal;"> </span></b></span><span style="font-size: 14pt; line-height: 107%; text-indent: -0.25in;"><b>Find Average chaining distance of the
data point</b></span></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgS5SV0d0Bw-r0qt_6Etmdv67erCELfFnR6U9rzHqOoUcKt0kphAYz6BSu4WEvl4pmSh9k_P5WxuBxPgHzJkPXxgkt23FIZUJhA_NoStNVGhYIaEceE8bAqiwFJQjpVHaOeqV5-l5Xiifc/s1600/chaining_distance.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="69" data-original-width="448" height="61" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgS5SV0d0Bw-r0qt_6Etmdv67erCELfFnR6U9rzHqOoUcKt0kphAYz6BSu4WEvl4pmSh9k_P5WxuBxPgHzJkPXxgkt23FIZUJhA_NoStNVGhYIaEceE8bAqiwFJQjpVHaOeqV5-l5Xiifc/s400/chaining_distance.png" width="400" /></a></div>
<div class="MsoListParagraph">
<span style="font-size: 14.0pt; line-height: 107%;">dist(e<sub>i</sub>)
denotes distance between 2 data points, an edge, ex-<o:p></o:p></span></div>
<div class="MsoListParagraph">
<span style="font-size: 14.0pt; line-height: 107%;"><br /></span></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiGnTQwIWrcWtsbZrkndO_xQtOcjuG6dS15t3FkeLXo83l8hPWDnlz8cleW4FtTf95zZJvFH_ifb-4gpv3SkyA-w9QU5fP5fjr6qpmZ-_6jAB3PVWwVyBht_NfMqFDMTYENq1rN3fMXH_E/s1600/cost.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="139" data-original-width="619" height="88" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiGnTQwIWrcWtsbZrkndO_xQtOcjuG6dS15t3FkeLXo83l8hPWDnlz8cleW4FtTf95zZJvFH_ifb-4gpv3SkyA-w9QU5fP5fjr6qpmZ-_6jAB3PVWwVyBht_NfMqFDMTYENq1rN3fMXH_E/s400/cost.JPG" width="400" /></a></div>
<div class="MsoListParagraph">
<span style="font-size: 14.0pt; line-height: 107%;"><br /></span></div>
<div class="MsoNormal">
<span style="font-size: 14.0pt; line-height: 107%; mso-bidi-font-family: "TimesNewRoman\,Italic"; mso-bidi-font-style: italic;">Like P1, find average
chaining distance for all 5 nearest neighbor P2, P4, P5, P6 and P7.<o:p></o:p></span></div>
<div class="MsoListParagraph" style="mso-list: l0 level1 lfo1; text-indent: -.25in;">
<span style="font-size: 14.0pt; line-height: 107%;">
</span></div>
<div class="MsoNormal">
<span style="font-size: 14.0pt; line-height: 107%; mso-bidi-font-family: "TimesNewRoman\,Italic"; mso-bidi-font-style: italic;">Formula Explanation:<o:p></o:p></span></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj2cYKRSfzGzGxKWM9chwnLWSXvY1ksTnIQPxptPUZZpltf_cKT2greDhLa7ejE6pYvdSt-3SCnC1OAiuWIzUfTB6xzHriCgcF7-tilRgn8olJxrU7bBmyOFrncVO9VgsDHpH0pLgcxJvY/s1600/formula_explanation.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="89" data-original-width="728" height="47" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj2cYKRSfzGzGxKWM9chwnLWSXvY1ksTnIQPxptPUZZpltf_cKT2greDhLa7ejE6pYvdSt-3SCnC1OAiuWIzUfTB6xzHriCgcF7-tilRgn8olJxrU7bBmyOFrncVO9VgsDHpH0pLgcxJvY/s400/formula_explanation.JPG" width="400" /></a></div>
<div class="MsoNormal">
<span style="font-size: 14.0pt; line-height: 107%; mso-bidi-font-family: "TimesNewRoman\,Italic"; mso-bidi-font-style: italic;"><span style="font-family: "calibri" , sans-serif; font-size: 14.0pt; line-height: 107%;">Total
no of edges = {(P1,P2),(P1,P4),(P1,P5),(P1,P6),(P1,P7),(P2,P4),(P2,P5),(P2,P6),
(P2,P7),(P4,P5),(P4,P6),(P4,P7),(P5,P6),(P5,P7),(P6,P7)} =15</span></span></div>
<div class="MsoNormal">
<span style="font-size: 14.0pt; line-height: 107%; mso-bidi-font-family: "TimesNewRoman\,Italic"; mso-bidi-font-style: italic;"><span style="font-family: "calibri" , sans-serif; font-size: 14.0pt; line-height: 107%;"><br /></span></span></div>
<div class="MsoNormal">
<span style="font-size: 14.0pt; line-height: 107%; mso-bidi-font-family: "TimesNewRoman\,Italic"; mso-bidi-font-style: italic;"><span style="font-family: "calibri" , sans-serif; font-size: 14.0pt; line-height: 107%;">k(k+1)/2 = 5(5+1)/2=15</span></span></div>
<div class="MsoNormal">
<span style="font-size: 14.0pt; line-height: 107%; mso-bidi-font-family: "TimesNewRoman\,Italic"; mso-bidi-font-style: italic;"><span style="font-family: "calibri" , sans-serif; font-size: 14.0pt; line-height: 107%;"><br /></span></span></div>
<div class="MsoListParagraphCxSpFirst" style="margin-left: .75in; mso-add-space: auto; mso-list: l0 level1 lfo1; text-indent: -.25in;">
<!--[if !supportLists]--><span style="font-family: "symbol"; font-size: 14.0pt; line-height: 107%;">·<span style="font-family: "times new roman"; font-size: 7pt; font-stretch: normal; line-height: normal;">
</span></span><!--[endif]--><span style="font-size: 14.0pt; line-height: 107%; mso-bidi-font-family: "TimesNewRoman\,Italic"; mso-bidi-font-style: italic;">Sum of
all edges weight during traversal of nearest data point<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="margin-left: 1.25in; mso-add-space: auto; mso-list: l1 level1 lfo2; text-indent: -.25in;">
<!--[if !supportLists]--><span style="font-family: "wingdings"; font-size: 14.0pt; line-height: 107%;">Ø<span style="font-family: "times new roman"; font-size: 7pt; font-stretch: normal; line-height: normal;"> </span></span><!--[endif]--><span style="font-size: 14.0pt; line-height: 107%; mso-bidi-font-family: "TimesNewRoman\,Italic"; mso-bidi-font-style: italic;">Edge weight from P1 to P2 = 3<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="margin-left: 1.25in; mso-add-space: auto; mso-list: l1 level1 lfo2; text-indent: -.25in;">
<!--[if !supportLists]--><span style="font-family: "wingdings"; font-size: 14.0pt; line-height: 107%;">Ø<span style="font-family: "times new roman"; font-size: 7pt; font-stretch: normal; line-height: normal;"> </span></span><!--[endif]--><span style="font-size: 14.0pt; line-height: 107%; mso-bidi-font-family: "TimesNewRoman\,Italic"; mso-bidi-font-style: italic;">Edge weight from P1 to P5 = 3+2 =5<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="margin-left: 1.25in; mso-add-space: auto; mso-list: l1 level1 lfo2; text-indent: -.25in;">
<!--[if !supportLists]--><span style="font-family: "wingdings"; font-size: 14.0pt; line-height: 107%;">Ø<span style="font-family: "times new roman"; font-size: 7pt; font-stretch: normal; line-height: normal;"> </span></span><!--[endif]--><span style="font-size: 14.0pt; line-height: 107%; mso-bidi-font-family: "TimesNewRoman\,Italic"; mso-bidi-font-style: italic;">Edge weight from P1 to P4 = 3+2+1 =6<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="margin-left: 1.25in; mso-add-space: auto; mso-list: l1 level1 lfo2; text-indent: -.25in;">
<!--[if !supportLists]--><span style="font-family: "wingdings"; font-size: 14.0pt; line-height: 107%;">Ø<span style="font-family: "times new roman"; font-size: 7pt; font-stretch: normal; line-height: normal;"> </span></span><!--[endif]--><span style="font-size: 14.0pt; line-height: 107%; mso-bidi-font-family: "TimesNewRoman\,Italic"; mso-bidi-font-style: italic;">Edge weight from P1 to P6 = 3+2+1+1 =7<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpLast" style="margin-left: 1.25in; mso-add-space: auto; mso-list: l1 level1 lfo2; text-indent: -.25in;">
<!--[if !supportLists]--><span style="font-family: "wingdings"; font-size: 14.0pt; line-height: 107%;">Ø<span style="font-family: "times new roman"; font-size: 7pt; font-stretch: normal; line-height: normal;"> </span></span><!--[endif]--><span style="font-size: 14.0pt; line-height: 107%; mso-bidi-font-family: "TimesNewRoman\,Italic"; mso-bidi-font-style: italic;">Edge weight from P1 to P7 = 3+2+1+1+1 =8<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-size: 14.0pt; line-height: 107%; mso-bidi-font-family: "TimesNewRoman\,Italic"; mso-bidi-font-style: italic;"><span style="font-family: "calibri" , sans-serif; font-size: 14.0pt; line-height: 107%;">
<span style="font-size: 14pt; line-height: 107%;">Total edge weight =(3+5+6+7+8) = 29</span></span></span></div>
<div class="MsoNormal">
<span style="font-size: 14.0pt; line-height: 107%; mso-bidi-font-family: "TimesNewRoman\,Italic"; mso-bidi-font-style: italic;"><span style="font-family: "calibri" , sans-serif; font-size: 14.0pt; line-height: 107%;"><span style="font-size: 14pt; line-height: 107%;"><br /></span></span></span></div>
<div class="MsoNormal">
<span style="font-size: 14.0pt; line-height: 107%; mso-bidi-font-family: "TimesNewRoman\,Italic"; mso-bidi-font-style: italic;"><span style="font-family: "calibri" , sans-serif; font-size: 14.0pt; line-height: 107%;"><span style="font-size: 14pt; line-height: 107%;"><br /></span></span></span></div>
<div class="MsoNormal">
<span style="font-size: 14.0pt; line-height: 107%; mso-bidi-font-family: "TimesNewRoman\,Italic"; mso-bidi-font-style: italic;"><span style="font-family: "calibri" , sans-serif; font-size: 14.0pt; line-height: 107%;"><span style="font-size: 14pt; line-height: 107%;">ac-dist(P1) = 29/15 = 1.933</span></span></span></div>
<div class="MsoNormal">
<span style="font-size: 14.0pt; line-height: 107%; mso-bidi-font-family: "TimesNewRoman\,Italic"; mso-bidi-font-style: italic;"><span style="font-family: "calibri" , sans-serif; font-size: 14.0pt; line-height: 107%;"><span style="font-size: 14pt; line-height: 107%;"><br /></span></span></span></div>
<div class="MsoNormal">
<span style="font-size: 14.0pt; line-height: 107%; mso-bidi-font-family: "TimesNewRoman\,Italic"; mso-bidi-font-style: italic;"><span style="font-family: "calibri" , sans-serif; font-size: 14.0pt; line-height: 107%;"><span style="font-size: 14pt; line-height: 107%;"><b>6)</b> </span></span></span><span style="font-size: 14pt; line-height: 107%; text-indent: -0.25in;"><span style="font-size: 7pt; font-stretch: normal; font-variant-east-asian: normal; font-variant-numeric: normal; line-height: normal;"> </span></span><span style="font-size: 14pt; line-height: 107%; text-indent: -0.25in;"><b>Find COF value of the data point-</b></span></div>
<div class="MsoNormal">
<span style="font-size: 14pt; line-height: 107%; text-indent: -0.25in;"><br /></span></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg_qLvIbsL3DtZcEYi5QiNf0TRzQqORP2j0M_-1Ismxb-dYpVR8BqBmYlq8P-zzsAAVaFxZ19nDqCnCQcT7QZfS7OwR_D2YdzVetFZ9UbxbyacGDFGJ5N03CAPK5d-QRszob-vdzkx3vVY/s1600/cof_value.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="102" data-original-width="410" height="79" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg_qLvIbsL3DtZcEYi5QiNf0TRzQqORP2j0M_-1Ismxb-dYpVR8BqBmYlq8P-zzsAAVaFxZ19nDqCnCQcT7QZfS7OwR_D2YdzVetFZ9UbxbyacGDFGJ5N03CAPK5d-QRszob-vdzkx3vVY/s320/cof_value.png" width="320" /></a></div>
<div class="MsoNormal">
<span style="font-size: 14pt; line-height: 107%; text-indent: -0.25in;"><span style="font-family: "calibri" , sans-serif; font-size: 14.0pt; line-height: 107%;">COF
is the ratio of average chaining distance of data point and the average of
average chaining distance of k nearest neighbor of the data point.</span></span></div>
<div class="MsoNormal">
<span style="font-size: 14pt; line-height: 107%; text-indent: -0.25in;"><span style="font-family: "calibri" , sans-serif; font-size: 14.0pt; line-height: 107%;"><br /></span></span></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgSuDTR41bTZsi1oH812SzDq3c7RTtZzTuQ77TbdyJ3L8cjQ7eBo7QjPlYhbk1zfr2VFl0R9Q0en6O84rxWx-NL-gHJcY37ky5dUXC_Oy1o3W8qO_9MUqzl7O2jWDvK4JROKqrJFfmyWK8/s1600/Cof2.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="108" data-original-width="798" height="52" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgSuDTR41bTZsi1oH812SzDq3c7RTtZzTuQ77TbdyJ3L8cjQ7eBo7QjPlYhbk1zfr2VFl0R9Q0en6O84rxWx-NL-gHJcY37ky5dUXC_Oy1o3W8qO_9MUqzl7O2jWDvK4JROKqrJFfmyWK8/s400/Cof2.JPG" width="400" /></a></div>
<div class="MsoNormal">
<span style="font-size: 14pt; line-height: 107%; text-indent: -0.25in;"><span style="font-family: "calibri" , sans-serif; font-size: 14.0pt; line-height: 107%;"><br /></span></span></div>
<div class="MsoListParagraph" style="mso-list: l0 level1 lfo1; text-indent: -.25in;">
<span style="font-size: 14.0pt; line-height: 107%;"><o:p></o:p></span></div>
<div class="MsoListParagraph" style="mso-list: l0 level1 lfo1; text-indent: -.25in;">
<span style="font-size: 14.0pt; line-height: 107%;"><br /></span></div>
<div class="MsoListParagraph" style="mso-list: l0 level1 lfo1; text-indent: -.25in;">
<span style="font-size: 14.0pt; line-height: 107%;"> <o:p></o:p></span><span style="font-family: "calibri" , sans-serif; font-size: 14pt;">Like COF(P1), find COF for all the data points available in diagram, the
data points having high COF values will be considered as outliers.</span></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjWQGuV6TPHLDpQtRuBwVVMqGI5p7f0YXHJKNFn_9UfBTOsaIFylwd5m4HAwSEMtjrD5O8EFIQj9WhbD8fXmlSFhi7hvK7f3qcE1qHQME2LO8anF9zBipNte3blt76XCgTzNviavg7VxbE/s1600/R_code.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="217" data-original-width="770" height="111" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjWQGuV6TPHLDpQtRuBwVVMqGI5p7f0YXHJKNFn_9UfBTOsaIFylwd5m4HAwSEMtjrD5O8EFIQj9WhbD8fXmlSFhi7hvK7f3qcE1qHQME2LO8anF9zBipNte3blt76XCgTzNviavg7VxbE/s400/R_code.png" width="400" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgX1bAq9wAGH7C3KZoJi1iBZzwsysxo0PuGsL9J9mLnKtwXAqdWLDuB8o6JXOFxISkx32YCV6-EwfWjmqUHgPqytU9Y-7qtvmAcpB7FRJFGZJLwt9cfDoUWLD5hKdoN7uiMjeiiXAhBsPM/s1600/R_code_result.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="653" data-original-width="820" height="317" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgX1bAq9wAGH7C3KZoJi1iBZzwsysxo0PuGsL9J9mLnKtwXAqdWLDuB8o6JXOFxISkx32YCV6-EwfWjmqUHgPqytU9Y-7qtvmAcpB7FRJFGZJLwt9cfDoUWLD5hKdoN7uiMjeiiXAhBsPM/s400/R_code_result.png" width="400" /></a></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: "calibri" , sans-serif; font-size: 14.0pt; line-height: 107%;">Darker
data points showing most outlier data points. One can compare CBOF with Angle
based outlier detection techniques ( ABOD).</span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: "calibri" , sans-serif; font-size: 14.0pt; line-height: 107%;"><br /></span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: "calibri" , sans-serif; font-size: 14.0pt; line-height: 107%;"><a href="https://machinelearningstories.blogspot.com/2018/08/anomaly-detection-in-high-dimensional.html" target="_blank">click to know about angle based outlier detection algorithm ABOD- </a> </span></div>
<div class="MsoListParagraph" style="mso-list: l0 level1 lfo1; text-indent: -.25in;">
<span style="font-family: "calibri" , sans-serif; font-size: 14pt;"><br /></span></div>
<div class="MsoListParagraph" style="mso-list: l0 level1 lfo1; text-indent: -.25in;">
<span style="font-family: "calibri" , sans-serif; font-size: 14pt;"><br /></span></div>
<div class="MsoListParagraph" style="mso-list: l0 level1 lfo1; text-indent: -.25in;">
<span style="font-family: "calibri" , sans-serif; font-size: 14pt;"><br /></span></div>
<div class="separator" style="clear: both; text-align: left;">
<br /></div>
<span style="font-family: "calibri" , sans-serif; font-size: 14.0pt; line-height: 107%;"><br /></span></div>
Unknownnoreply@blogger.com0Bengaluru, Karnataka, India12.9715987 77.59456269999998312.4764182 76.949115699999979 13.4667792 78.240009699999987tag:blogger.com,1999:blog-8230692877620938204.post-21356028396030975852018-08-13T10:43:00.000-07:002018-09-30T21:22:44.199-07:00Anomaly Detection in High Dimensional data :- Angle based outlier detection technique<div dir="ltr" style="text-align: left;" trbidi="on">
<br />
<span style="font-family: "arial" , "helvetica" , sans-serif;">
<span style="font-family: "times new roman";">
</span></span><br />
<h4 class="MsoNormal" style="margin: 0in 0in 8pt; text-align: center;">
<span style="font-size: 14pt; line-height: 107%;"><u>Angular Based Outlier Detection
(ABOD)<o:p></o:p></u></span></h4>
<span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="font-family: "times new roman";">
</span></span><span style="font-size: 14pt; line-height: 107%;"><span style="font-family: "times" , "times new roman" , serif;">Before
starting ABOD method let’s try to understand what is outlier, different types
of methods to detect outliers and how ABOD is different from other outlier
detection methods.<o:p></o:p></span></span><br />
<br />
<span style="font-family: "times" , "times new roman" , serif;">
</span><br />
<div class="MsoNormal" style="margin: 0in 0in 8pt;">
<span style="font-size: 14pt; line-height: 107%;"><span style="font-family: "times" , "times new roman" , serif;">As per
Hawkins definition “An outlier is an observation which deviates so much from
the other observations as to arouse suspicions that it was generated by a
different mechanism”<o:p></o:p></span></span></div>
<span style="font-family: "times" , "times new roman" , serif;">
</span><span style="font-family: "calibri" , sans-serif; font-size: 14pt; line-height: 107%;"><span style="font-family: "calibri" , sans-serif; font-size: 14pt; line-height: 107%;"><span style="font-family: "times" , "times new roman" , serif; font-size: small;"><span style="font-size: 14pt; line-height: 107%;">There are mainly
3 types of methods:-</span></span></span></span><br />
<span style="font-family: "calibri" , sans-serif; font-size: 14pt; line-height: 107%;"><span style="font-family: "calibri" , sans-serif; font-size: 14pt; line-height: 107%;"><span style="font-family: "times" , "times new roman" , serif; font-size: small;"><span style="font-size: 14pt; line-height: 107%;"><o:p></o:p></span></span></span></span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="font-family: "calibri" , sans-serif; font-size: 14pt; line-height: 107%;"><span style="font-family: "calibri" , sans-serif; font-size: 14pt; line-height: 107%;"><span style="font-size: small;"><span style="font-family: "times" , "times new roman" , serif;">
<span style="font-size: 14pt; line-height: 107%;">1) Statistical or Model-based Methods:
It includes Parametric and Non-parametric approach.</span></span></span><br />
<span style="font-family: "times" , "times new roman" , serif; font-size: small;"><span style="font-size: 14pt; line-height: 107%;"></span></span><br />
<span style="font-family: "times" , "times new roman" , serif; font-size: small;"><span style="font-size: 14pt; line-height: 107%;"></span></span><br />
<span style="font-size: small;"><span style="font-family: "times" , "times new roman" , serif;">
<span style="font-size: 14pt; line-height: 107%;">2) Proximity based methods: It can be
classified in 3 category<o:p></o:p></span></span></span><br />
<span style="font-family: "times" , "times new roman" , serif; font-size: small;">
</span><br />
</span></span></span><br />
<ul style="text-align: left;"><span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="font-family: "calibri" , sans-serif; font-size: 14pt; line-height: 107%;"><span style="font-family: "calibri" , sans-serif; font-size: 14pt; line-height: 107%;">
<li><div class="MsoNormal" style="margin: 0in 0in 8pt;">
<span style="font-size: 14pt; line-height: 107%;"><span style="font-family: "times" , "times new roman" , serif;">Cluster based methods</span></span></div>
</li>
<li><div class="MsoNormal" style="margin: 0in 0in 8pt;">
<span style="font-size: 14pt; line-height: 107%;"><span style="font-size: 14pt; line-height: 107%;"><span style="font-family: "times" , "times new roman" , serif;">Distance based methods</span></span></span></div>
</li>
<li><div class="MsoNormal" style="margin: 0in 0in 8pt;">
<span style="font-size: 14pt; line-height: 107%;"><span style="font-size: 14pt; line-height: 107%;"><span style="font-size: 14pt; line-height: 107%;"><span style="font-family: "times" , "times new roman" , serif;">Density based methods</span></span></span></span></div>
</li>
</span></span></span></ul>
<span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="font-family: "calibri" , sans-serif; font-size: 14pt; line-height: 107%;"><span style="font-family: "calibri" , sans-serif; font-size: 14pt; line-height: 107%;">
</span></span></span>
<br />
<div class="MsoNormal" style="margin: 0in 0in 8pt;">
</div>
<span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="font-family: "calibri" , sans-serif; font-size: 14pt; line-height: 107%;"><span style="font-family: "calibri" , sans-serif; font-size: 14pt; line-height: 107%;">
<span style="font-family: "times" , "times new roman" , serif; font-size: small;">
<span style="font-size: 14pt;"><span style="font-size: 14pt; line-height: 107%;"><span style="font-size: 14pt; line-height: 107%;"><span style="font-size: 14pt; line-height: 107%;"><span style="font-family: "times" , "times new roman" , serif;">3) Angle based methods<o:p></o:p></span></span></span></span></span></span><br />
<span style="font-family: "times" , "times new roman" , serif; font-size: small;">
</span><br />
</span></span></span><br />
<div class="MsoNormal" style="margin: 0in 0in 8pt;">
<span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="font-family: "calibri" , sans-serif; font-size: 14pt; line-height: 107%;"><span style="font-family: "calibri" , sans-serif; font-size: 14pt; line-height: 107%;"><span style="font-size: 14pt; line-height: 107%;"><span style="font-family: "times" , "times new roman" , serif;">Statistical
models are relatively simple way of identifying an abnormal data point.
Abnormal data points are outliers which can be identified even by Box- Plot,
Extreme values in normal distribution etc.</span> <o:p></o:p></span></span></span></span></div>
<span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="font-family: "calibri" , sans-serif; font-size: 14pt; line-height: 107%;"><span style="font-family: "calibri" , sans-serif; font-size: 14pt; line-height: 107%;">
<span style="font-family: "times new roman"; font-size: small;">
</span><span style="font-family: "times new roman"; font-size: small;"> </span><br />
<span style="font-family: "times new roman"; font-size: small;"><span style="font-size: 14pt; line-height: 107%;"><span style="font-family: "times" , "times new roman" , serif;">Model based
and Proximity based approaches, however, are based on an assessment of distances
in the full-dimensional Euclidean data space. In high-dimensional data, these
approaches are bound to deteriorate due to the notorious “curse of
dimensionality”.<span style="font-size: small;"><span style="font-size: 14pt;">You can read this article to
know more about it- <a href="https://machinelearningstories.blogspot.com/2018/07/anomaly-detection-anomaly-detection-by.html" target="_blank">Distance & Density Based Clustering</a></span></span></span></span></span><br />
</span></span></span><br />
<div class="MsoNormal" style="background: white; line-height: normal; margin: 9pt 0in 0pt; mso-outline-level: 3;">
<span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="font-family: "calibri" , sans-serif; font-size: 14pt; line-height: 107%;"><span style="font-family: "calibri" , sans-serif; font-size: 14pt; line-height: 107%;"><span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="font-size: 14pt;"></span></span><br /></span></span></span>
<div class="MsoNormal" style="background: white; line-height: normal; margin: 9pt 0in 0pt; mso-outline-level: 3;">
<span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="font-family: "calibri" , sans-serif; font-size: 14pt; line-height: 107%;"><span style="font-family: "calibri" , sans-serif; font-size: 14pt; line-height: 107%;"><span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="font-size: 14pt; line-height: 107%;"><span style="font-size: 14pt; line-height: 107%;"></span></span></span><br /></span></span></span>
<div class="MsoNormal" style="background: white; line-height: normal; margin: 9pt 0in 0pt; mso-outline-level: 3;">
<span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="font-family: "calibri" , sans-serif; font-size: 14pt; line-height: 107%;"><span style="font-family: "calibri" , sans-serif; font-size: 14pt; line-height: 107%;"><span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="font-size: 14pt; line-height: 107%;"><span style="font-family: "arial" , "helvetica" , sans-serif;"></span></span></span><br /></span></span></span>
<div class="MsoNormal" style="background: white; line-height: normal; margin: 9pt 0in 0pt; mso-outline-level: 3;">
<span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="font-family: "calibri" , sans-serif; font-size: 14pt; line-height: 107%;"><span style="font-family: "calibri" , sans-serif; font-size: 14pt; line-height: 107%;"><span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="font-family: "times" , "times new roman" , serif;"><span style="font-size: 14pt;">The notion of ABOD algorithm is to find the outlier based on the
variance </span><span style="font-size: 14pt; mso-bidi-font-family: CMR9;">of the
angles between the difference vectors of data objects</span><span style="font-size: 14pt;"> in the dataset.<span style="font-size: 14pt; line-height: 107%;">This way,
the effects of the “curse of dimensionality” are alleviated compared to purely
distance-based approaches.<o:p></o:p></span></span></span></span></span></span></span></span></div>
<div class="MsoNormal" style="background: white; line-height: normal; margin: 9pt 0in 0pt; mso-outline-level: 3;">
<span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="font-family: "calibri" , sans-serif; font-size: 14pt; line-height: 107%;"><span style="font-family: "calibri" , sans-serif; font-size: 14pt; line-height: 107%;"><span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="font-family: "times new roman"; font-size: small;">
</span><o:p></o:p></span></span></span></span></span></div>
<div class="separator" style="clear: both; text-align: center;">
<span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="font-family: "calibri" , sans-serif; font-size: 14pt; line-height: 107%;"><span style="font-family: "calibri" , sans-serif; font-size: 14pt; line-height: 107%;"><span style="font-family: "arial" , "helvetica" , sans-serif;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjhNP0gPyszzDIv_8Bnxq53ng3RT7Y797wpq-fsYGTbB1u6I5IlSMyD3Ge1EeGwKFTqLZtUibANqG5qudiEJZE3dJI6U8MNZswTleI3cmhOhDIA1AGPsZF7Pmk_EWKXQNYlqD6kE0YxN54/s1600/ABOD_1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="402" data-original-width="499" height="321" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjhNP0gPyszzDIv_8Bnxq53ng3RT7Y797wpq-fsYGTbB1u6I5IlSMyD3Ge1EeGwKFTqLZtUibANqG5qudiEJZE3dJI6U8MNZswTleI3cmhOhDIA1AGPsZF7Pmk_EWKXQNYlqD6kE0YxN54/s400/ABOD_1.png" width="400" /></a></span></span></span></span></div>
<div class="MsoNormal" style="background: white; line-height: normal; margin: 9pt 0in 0pt; mso-outline-level: 3;">
</div>
</div>
</div>
</div>
<span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="font-family: "calibri" , sans-serif; font-size: 14pt; line-height: 107%;"><span style="font-family: "calibri" , sans-serif; font-size: 14pt; line-height: 107%;">
</span><br />
<div class="MsoNormal" style="background: white; line-height: normal; margin: 9pt 0in 0pt; mso-outline-level: 3;">
<span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="font-family: "times new roman"; font-size: small;">
</span></span></div>
<div class="MsoNormal" style="margin: 0in 0in 8pt;">
<span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="font-family: "times" , "times new roman" , serif;"><span style="font-size: 14pt; line-height: 107%;">In above
figure for an outlier point P the angle between PX</span><!--[if gte msEquation 12]><m:oMath><i
style='mso-bidi-font-style:normal'><span style='font-size:14.0pt;line-height:
107%;font-family:"Cambria Math",serif'><m:r> </m:r></span></i><m:acc><m:accPr><m:chr
m:val="⃗"/><span style='font-size:14.0pt;mso-ansi-font-size:14.0pt;
mso-bidi-font-size:14.0pt;font-family:"Cambria Math",serif;mso-ascii-font-family:
"Cambria Math";mso-fareast-font-family:"Times New Roman";mso-fareast-theme-font:
minor-fareast;mso-hansi-font-family:"Cambria Math";font-style:italic;
mso-bidi-font-style:normal'><m:ctrlPr></m:ctrlPr></span></m:accPr><m:e><i
style='mso-bidi-font-style:normal'><span style='font-size:14.0pt;line-height:
107%;font-family:"Cambria Math",serif;mso-fareast-font-family:"Times New Roman";
mso-fareast-theme-font:minor-fareast'><m:r>PX</m:r></span></i></m:e></m:acc></m:oMath><![endif]--><!--[if !msEquation]--><span style="font-family: "calibri" , sans-serif; font-size: 11pt; line-height: 107%; position: relative; top: 4.5pt;"><!--[if gte vml 1]><v:shapetype
id="_x0000_t75" coordsize="21600,21600" o:spt="75" o:preferrelative="t"
path="m@4@5l@4@11@9@11@9@5xe" filled="f" stroked="f">
<v:stroke joinstyle="miter"/>
<v:formulas>
<v:f eqn="if lineDrawn pixelLineWidth 0"/>
<v:f eqn="sum @0 1 0"/>
<v:f eqn="sum 0 0 @1"/>
<v:f eqn="prod @2 1 2"/>
<v:f eqn="prod @3 21600 pixelWidth"/>
<v:f eqn="prod @3 21600 pixelHeight"/>
<v:f eqn="sum @0 0 1"/>
<v:f eqn="prod @6 1 2"/>
<v:f eqn="prod @7 21600 pixelWidth"/>
<v:f eqn="sum @8 21600 0"/>
<v:f eqn="prod @7 21600 pixelHeight"/>
<v:f eqn="sum @10 21600 0"/>
</v:formulas>
<v:path o:extrusionok="f" gradientshapeok="t" o:connecttype="rect"/>
<o:lock v:ext="edit" aspectratio="t"/>
</v:shapetype><v:shape id="_x0000_i1025" type="#_x0000_t75" style='width:21pt;
height:20.4pt'>
<v:imagedata src="file:///C:/Users/inarsis/AppData/Local/Temp/msohtmlclip1/01/clip_image001.png"
o:title="" chromakey="white"/>
</v:shape><![endif]--><!--[if !vml]--><!--[endif]--></span><!--[endif]--><span style="font-size: 14pt; line-height: 107%; mso-fareast-font-family: "Times New Roman"; mso-fareast-theme-font: minor-fareast;"><span style="mso-spacerun: yes;"> </span>and PY</span><!--[if gte msEquation 12]><m:oMath><m:acc><m:accPr><m:chr
m:val="⃗"/><span style='font-size:14.0pt;mso-ansi-font-size:14.0pt;
mso-bidi-font-size:14.0pt;font-family:"Cambria Math",serif;mso-ascii-font-family:
"Cambria Math";mso-fareast-font-family:"Times New Roman";mso-fareast-theme-font:
minor-fareast;mso-hansi-font-family:"Cambria Math";font-style:italic;
mso-bidi-font-style:normal'><m:ctrlPr></m:ctrlPr></span></m:accPr><m:e><i
style='mso-bidi-font-style:normal'><span style='font-size:14.0pt;line-height:
107%;font-family:"Cambria Math",serif;mso-fareast-font-family:"Times New Roman";
mso-fareast-theme-font:minor-fareast'><m:r>PY</m:r></span></i></m:e></m:acc></m:oMath><![endif]--><!--[if !msEquation]--><span style="font-family: "calibri" , sans-serif; font-size: 11pt; line-height: 107%; position: relative; top: 4.5pt;"><!--[if gte vml 1]><v:shape
id="_x0000_i1025" type="#_x0000_t75" style='width:17.4pt;height:20.4pt'>
<v:imagedata src="file:///C:/Users/inarsis/AppData/Local/Temp/msohtmlclip1/01/clip_image003.png"
o:title="" chromakey="white"/>
</v:shape><![endif]--><!--[if !vml]--><!--[endif]--></span><!--[endif]--><span style="font-size: 14pt; line-height: 107%; mso-fareast-font-family: "Times New Roman"; mso-fareast-theme-font: minor-fareast;"><span style="mso-spacerun: yes;"> </span></span><span style="font-size: 14pt; line-height: 107%;">for any two X Y from the database is
substantially smaller than angles of other points Q and R.<span style="font-size: 14pt; line-height: 107%;">Angle
between farthest data point is less than the angle between nearer data points.
If you think deeper, the variance ( of all the possible angles to rest of the
data points) <span style="mso-spacerun: yes;"> </span>for the farthest data
points will be lesser as compared to the nearer data points. Thus the data
point with less variance of angle will be considered as an outlier.<o:p></o:p></span></span></span></span></div>
<div class="MsoNormal" style="background: white; line-height: normal; margin: 9pt 0in 0pt; mso-outline-level: 3;">
<span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="font-family: "times" , "times new roman" , serif; font-size: small;">
</span></span></div>
<div class="MsoNormal" style="margin: 0in 0in 8pt;">
<span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="font-size: 14pt; line-height: 107%;"><span style="font-family: "times" , "times new roman" , serif;">Angles are
more stable than distances in high dimensional <o:p></o:p></span></span></span></div>
<div class="MsoNormal" style="background: white; line-height: normal; margin: 9pt 0in 0pt; mso-outline-level: 3;">
<span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="font-family: "times" , "times new roman" , serif; font-size: small;">
</span></span></div>
<div class="MsoListParagraphCxSpFirst" style="margin: 0in 0in 0pt 0.5in; mso-list: l0 level1 lfo1; text-indent: -0.25in;">
<!--[if !supportLists]--><span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="font-family: "times" , "times new roman" , serif;"><span style="font-size: 14pt; line-height: 107%; mso-ascii-font-family: Calibri; mso-bidi-font-family: Calibri; mso-fareast-font-family: Calibri; mso-hansi-font-family: Calibri;"><span style="mso-list: Ignore;">•<span style="font-size-adjust: none; font-stretch: normal; font: 7pt/normal "Times New Roman";">
</span></span></span><!--[endif]--><span style="font-size: 14pt; line-height: 107%;">Object
o is an outlier if most other objects are located in similar directions ( less
variance of angles) </span></span></span></div>
<div class="MsoListParagraphCxSpFirst" style="margin: 0in 0in 0pt 0.5in; mso-list: l0 level1 lfo1; text-indent: -0.25in;">
<span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="font-size: 14pt; line-height: 107%;"><!--[if !supportLists]--><span style="font-family: "times" , "times new roman" , serif;"><span style="font-size: 14pt; line-height: 107%; mso-ascii-font-family: Calibri; mso-bidi-font-family: Calibri; mso-fareast-font-family: Calibri; mso-hansi-font-family: Calibri; mso-no-proof: yes;"><span style="mso-list: Ignore;">•<span style="font-size-adjust: none; font-stretch: normal; font: 7pt/normal "Times New Roman";"> </span></span></span><!--[endif]--><span style="font-size: 14pt; line-height: 107%;">Object o is no outlier if many other
objects are located in varying directions (Higher variance of angles)<span style="mso-no-proof: yes;"><o:p></o:p></span></span></span></span></span></div>
<div class="MsoNormal" style="background: white; line-height: normal; margin: 9pt 0in 0pt; mso-outline-level: 3;">
<span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="font-family: "times new roman"; font-size: small;">
</span></span></div>
<div class="MsoNormal" style="background: white; line-height: normal; margin: 9pt 0in 0pt; mso-outline-level: 3;">
</div>
</span><div class="MsoNormal" style="background: white; line-height: normal; margin: 9pt 0in 0pt; mso-outline-level: 3;">
<div class="separator" style="clear: both; text-align: center;">
<span style="font-family: "arial" , "helvetica" , sans-serif;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEge1lFqjAIsgHpyQ2UkYq200cJ6agYEPzyNbKSkNNaiIeAS2EkRWP3_f_5fb_GMa1VkE00ca0z34OltOjKA2FcHdNFDUhftBX5U5m2TF6KE-8RQj6SLUcLaifZzA2ARMaeyETspcLZ2xDo/s1600/ABOD2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="447" data-original-width="1213" height="145" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEge1lFqjAIsgHpyQ2UkYq200cJ6agYEPzyNbKSkNNaiIeAS2EkRWP3_f_5fb_GMa1VkE00ca0z34OltOjKA2FcHdNFDUhftBX5U5m2TF6KE-8RQj6SLUcLaifZzA2ARMaeyETspcLZ2xDo/s400/ABOD2.png" width="400" /></a></span></div>
<div class="MsoNormal" style="background: white; line-height: normal; margin: 9pt 0in 0pt; mso-outline-level: 3;">
</div>
<br />
<br />
<div class="MsoNormal" style="background: white; line-height: normal; margin: 9pt 0in 0pt; mso-outline-level: 3;">
<span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="font-family: "times new roman"; font-size: small;">
</span></span></div>
<div class="MsoNormal" style="margin: 0in 0in 8pt;">
<span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="font-size: 14pt; line-height: 107%;"><span style="font-family: "times" , "times new roman" , serif;"><span style="mso-spacerun: yes;"> </span>In actual implementation, not just the angle
but the distance between the point is also divided so that distance is also
taken into account.( Nearby points may also have very less angle but might not be outlier) So angular distance=<o:p></o:p></span></span></span></div>
<div class="MsoNormal" style="background: white; line-height: normal; margin: 9pt 0in 0pt; mso-outline-level: 3;">
<span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="font-family: "times new roman"; font-size: small;">
</span></span></div>
<div class="separator" style="clear: both; text-align: center;">
<span style="font-family: "arial" , "helvetica" , sans-serif;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhLZL6ARa6uCtyafJQqN3ppp2YpCut0CJV2JP2spb0GXYt5a5PZ0zpnoZJ2KL2xCRHi-LtEIrz6DpqFel9L5g5FRf-GOLS3a0gh5dw_hzczWdgDcaue0or_jucIhoRbyVIUC_zZ92y8lTA/s1600/formula.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="117" data-original-width="419" height="111" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhLZL6ARa6uCtyafJQqN3ppp2YpCut0CJV2JP2spb0GXYt5a5PZ0zpnoZJ2KL2xCRHi-LtEIrz6DpqFel9L5g5FRf-GOLS3a0gh5dw_hzczWdgDcaue0or_jucIhoRbyVIUC_zZ92y8lTA/s400/formula.JPG" width="400" /></a></span></div>
<div class="MsoNormal" style="background: white; line-height: normal; margin: 9pt 0in 0pt; mso-outline-level: 3;">
<span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="font-family: "times" , "times new roman" , serif;">(AB,AC) - dot product of AB</span></span></div>
<div class="MsoNormal" style="background: white; line-height: normal; margin: 9pt 0in 0pt; mso-outline-level: 3;">
<span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="font-family: "times" , "times new roman" , serif;">AB, AC - distance between A and B, A and C</span></span></div>
<div class="MsoNormal" style="background: white; line-height: normal; margin: 9pt 0in 0pt; mso-outline-level: 3;">
<span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="font-family: "times" , "times new roman" , serif;">So cosine= (AB, AC)/AB*AC</span></span></div>
<div class="MsoNormal" style="background: white; line-height: normal; margin: 9pt 0in 0pt; mso-outline-level: 3;">
<span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="font-family: "times" , "times new roman" , serif;">cosine /distances= (AB, AC)/(AB^2*AC^2)</span></span></div>
<div class="MsoNormal" style="background: white; line-height: normal; margin: 9pt 0in 0pt; mso-outline-level: 3;">
<span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="font-family: "times" , "times new roman" , serif;">to calculate angle based outlier factor of A, variance of all possible cosine/distance is taken. Lower value means more outlier-ness.</span></span></div>
<div class="MsoNormal" style="background: white; line-height: normal; margin: 9pt 0in 0pt; mso-outline-level: 3;">
<span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="font-family: "times new roman"; font-size: small;">
</span></span></div>
<div class="MsoNormal" style="margin: 0in 0in 8pt;">
<span style="font-family: "arial" , "helvetica" , sans-serif;"><b style="mso-bidi-font-weight: normal;"><u><span style="font-size: 14pt; line-height: 107%;"><span style="font-family: "calibri";">Implementation of AOBD method in R<o:p></o:p></span></span></u></b></span></div>
<div class="MsoNormal" style="background: white; line-height: normal; margin: 9pt 0in 0pt; mso-outline-level: 3;">
<span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="font-family: "times new roman"; font-size: small;">
</span></span></div>
<div class="MsoNormal" style="margin: 0in 0in 8pt;">
<span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="background: white; color: #222222; font-size: 14pt; line-height: 107%;"><span style="color: black; font-family: "times" , "times new roman" , serif;"># Sub-setting the data</span></span></span></div>
<div class="MsoNormal" style="margin: 0in 0in 8pt;">
<span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="background: white; color: #222222; font-size: 14pt; line-height: 107%;"><span style="font-family: "calibri";"><span style="color: blue;"><span style="font-size: 14pt; line-height: 107%;"><span style="font-family: "calibri";"><span style="color: blue;"><em>iris_dataset
<- iris[,1:4]</em></span></span></span></span></span></span></span></div>
<div class="MsoNormal" style="margin: 0in 0in 8pt;">
<span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="background: white; color: #222222; font-size: 14pt; line-height: 107%;"><span style="font-size: 14pt; line-height: 107%;"><span style="font-size: 14pt; line-height: 107%;"><span style="color: black; font-family: "times" , "times new roman" , serif;"># Running
ABOD code</span></span></span></span></span></div>
</div>
</span><br />
<div class="MsoNormal" style="margin: 0in 0in 8pt;">
<span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="background: white; color: #222222; font-size: 14pt; line-height: 107%;"><span style="font-family: "calibri";"><span style="color: blue;"><span style="font-size: 14pt; line-height: 107%;"><span style="font-family: "calibri";"><span style="color: blue;"><span style="font-size: 14pt; line-height: 107%;"><span style="font-family: "calibri";"><span style="color: blue;"><span style="font-size: 14pt; line-height: 107%;"><span style="font-family: "calibri";"><span style="color: blue;"><em>angular_distance
<- abodOutlier::abod(iris_dataset, method = "complete")</em></span></span></span></span></span></span></span></span></span></span></span></span></span></span></div>
<div class="MsoNormal" style="margin: 0in 0in 8pt;">
<span style="background: white; color: #222222; font-size: 14pt; line-height: 107%;"><span style="font-family: "calibri";"><span style="color: blue;"><span style="font-size: 14pt; line-height: 107%;"><span style="font-family: "calibri";"><span style="color: blue;"><span style="font-size: 14pt; line-height: 107%;"><span style="font-family: "calibri";"><span style="color: blue;"><span style="font-size: 14pt; line-height: 107%;"><span style="font-family: "calibri";"><span style="color: blue;"><span style="font-family: "calibri";"><span style="color: blue;"><span style="font-size: 14pt; line-height: 107%;"><span style="color: black; font-family: "times" , "times new roman" , serif;"># </span></span><span style="background: white; color: #222222; font-size: 14pt; line-height: 107%;"><span style="font-family: "times" , "times new roman" , serif;"><span style="color: black;">plotting</span> the data</span> </span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></div>
<div class="MsoNormal" style="margin: 0in 0in 8pt;">
<span style="font-family: "calibri";"><span style="color: blue;"><span style="background: white; color: #222222; font-size: 14pt; line-height: 107%;"><span style="background: white; color: #222222; font-size: 14pt; line-height: 107%;"><span style="font-family: "calibri";"><span style="color: blue;"><em>library(ggplot2)</em></span></span></span></span></span></span></div>
<div class="MsoNormal" style="margin: 0in 0in 8pt;">
<span style="background: white; color: #222222; font-size: 14pt; line-height: 107%;"><span style="font-family: "calibri";"><span style="color: blue;"><span style="background: white; color: #222222; font-size: 14pt; line-height: 107%;"><span style="font-family: "calibri";"><span style="color: blue;"><em>gg <- ggplot(data = iris_dataset,
aes(x=Sepal.Length, y= Sepal.Width)) + geom_point(aes(col=angular_distance))</em></span></span></span></span></span></span></div>
<div class="MsoNormal" style="margin: 0in 0in 8pt;">
<span style="background: white; color: #222222; font-size: 14pt; line-height: 107%;"><span style="font-family: "calibri";"><span style="color: blue;"><span style="background: white; color: #222222; font-size: 14pt; line-height: 107%;"><span style="font-family: "calibri";"><span style="color: blue;"><span style="background: white; color: #222222; font-size: 14pt; line-height: 107%;"><span style="font-family: "calibri";"><span style="color: blue;"><em>plot(gg)<o:p></o:p></em></span></span></span></span></span></span></span></span></span></div>
<div class="MsoNormal" style="background: white; line-height: normal; margin: 9pt 0in 0pt; mso-outline-level: 3;">
<span style="font-family: "times new roman"; font-size: small;">
</span></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhokRWKuSNONjU8DSTPDLirUNh5h8KRqYT38kQLhPzShz4HLFT4QsrCIyzX86AX-YEDXk6k-guMqYSDlaGcmTy-rhsZo8Vg0Fccl_PBaYLszKwxuahML2OyC38wgQ1jfTAVG9ILh1b6lr4/s1600/ABOD_3.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="693" data-original-width="1348" height="205" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhokRWKuSNONjU8DSTPDLirUNh5h8KRqYT38kQLhPzShz4HLFT4QsrCIyzX86AX-YEDXk6k-guMqYSDlaGcmTy-rhsZo8Vg0Fccl_PBaYLszKwxuahML2OyC38wgQ1jfTAVG9ILh1b6lr4/s400/ABOD_3.png" width="400" /></a></div>
<div class="MsoNormal" style="background: white; line-height: normal; margin: 9pt 0in 0pt; mso-outline-level: 3;">
</div>
<div class="MsoNormal" style="background: white; line-height: normal; margin: 9pt 0in 0pt; mso-outline-level: 3;">
<span style="font-family: "times new roman"; font-size: small;">
</span></div>
<div class="MsoNormal" style="margin: 0in 0in 8pt;">
<span style="background: white; color: #222222; font-size: 14pt; line-height: 107%;"><span style="font-family: "times" , "times new roman" , serif;">Here the darker points (smaller angular
distance) are clearly visible as outliers(Abnormal data points).</span></span><br />
<span style="background: white; color: #222222; font-size: 14pt; line-height: 107%;"><span style="font-family: "times" , "times new roman" , serif;"><br /></span></span>
<span style="background: white; color: #222222; font-size: 14pt; line-height: 107%;"><span style="font-family: "times" , "times new roman" , serif;"><a href="https://machinelearningstories.blogspot.com/2018/09/connectivity-based-outlier-detection.html" target="_blank">Connectivity based Outlier Detection method</a></span></span></div>
<div class="MsoNormal" style="margin: 0in 0in 8pt;">
</div>
<div class="MsoNormal" style="margin: 0in 0in 8pt;">
</div>
<div class="MsoNormal" style="margin: 0in 0in 8pt;">
</div>
<div class="MsoNormal" style="margin: 0in 0in 8pt;">
<span style="background: white; color: #222222; font-size: 14pt; line-height: 107%;"><span style="font-family: "times";">Read more about other interesting ML topics-</span></span></div>
<div class="MsoNormal" style="margin: 0in 0in 8pt;">
</div>
<div class="MsoNormal" style="margin: 0in 0in 8pt;">
</div>
<div class="MsoNormal" style="margin: 0in 0in 8pt;">
<span style="background: white; color: #222222; font-size: 14pt; line-height: 107%;"><span style="font-family: "times";"><a href="http://machinelearningstories.blogspot.com/2017/09/hierarchical-clustering-bottom-up.html" target="_blank">Hierarchical Clustering and performance parameters of clustering.</a></span></span></div>
<div class="MsoNormal" style="margin: 0in 0in 8pt;">
</div>
<div class="MsoNormal" style="margin: 0in 0in 8pt;">
</div>
<div class="MsoNormal" style="margin: 0in 0in 8pt;">
<span style="background: white; color: #222222; font-size: 14pt; line-height: 107%;"><span style="font-family: "times";"><a href="https://machinelearningstories.blogspot.com/2016/08/documentclassification-or-text.html" target="_blank">Text Classification Algorithms</a> </span></span></div>
<div class="MsoNormal" style="margin: 0in 0in 8pt;">
</div>
<div class="MsoNormal" style="background: white; line-height: normal; margin: 9pt 0in 0pt; mso-outline-level: 3;">
<span style="font-family: "times new roman"; font-size: small;">
</span></div>
<div class="MsoNormal" style="background: white; line-height: normal; margin: 9pt 0in 0pt; mso-outline-level: 3;">
</div>
<div class="MsoNormal" style="background: white; line-height: normal; margin: 9pt 0in 0pt; mso-outline-level: 3;">
</div>
<span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="font-size: small;">
</span></span></div>
Unknownnoreply@blogger.com0Bengaluru, Karnataka, India12.9715987 77.59456269999998312.4764182 76.949115699999979 13.4667792 78.240009699999987tag:blogger.com,1999:blog-8230692877620938204.post-43489106924874958512018-08-10T08:42:00.000-07:002018-08-10T10:50:55.821-07:00forecastHybrid :- daddy of all time series algorithms<div dir="ltr" style="text-align: left;" trbidi="on">
<h2 style="text-align: center;">
<u>Time series Ensembling using 'ForecastHybrid'</u></h2>
<br />
<br />
There is no comparison of R when it comes to readily available packages. Where in python, you have to write your own code for even auto.arima, R has already come up with ensembling for advance time series algorithms like Neural Nets, seasonal ARIMA, state space model and seasonal decomposition models.<br />
<br />
In conceptual terms most of the models are nothing but an extended version of regression. you can read <b><a href="https://machinelearningstories.blogspot.com/2016/08/time-series-and-fitting-regression-on.html" target="_blank">this</a> </b>article to know more about it. Any time series is made up of 3 components-<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjevnM89Xl5PH6K-8Ej30KlvHFlwW6EyEkyNJUEzp8egpAf8Wwztp-uT6jU525pq12JxnIuGF60RuqzvtewSma6_C8oe6r28kICAz9ZQBSOjgx9mpJJrIDGBNsMSgsO4wDUo7MmSF3lmHc/s1600/Components_of_ts.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="36" data-original-width="212" height="33" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjevnM89Xl5PH6K-8Ej30KlvHFlwW6EyEkyNJUEzp8egpAf8Wwztp-uT6jU525pq12JxnIuGF60RuqzvtewSma6_C8oe6r28kICAz9ZQBSOjgx9mpJJrIDGBNsMSgsO4wDUo7MmSF3lmHc/s200/Components_of_ts.JPG" width="200" /></a><b> or</b></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjX38q8RFqlqF1bdd9KMZH2KY-mGKB0Zw2UaiwjeKy358dUTUO3LkoCwY6lbCscEednNz1hFEAIzWHm_a3i4ebi63uJN77wjz1cnepTa2ROvf22_WOz8OIxy9Q0uk-aDtdgCpzooJmt8H0/s1600/multiplicative.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="52" data-original-width="228" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjX38q8RFqlqF1bdd9KMZH2KY-mGKB0Zw2UaiwjeKy358dUTUO3LkoCwY6lbCscEednNz1hFEAIzWHm_a3i4ebi63uJN77wjz1cnepTa2ROvf22_WOz8OIxy9Q0uk-aDtdgCpzooJmt8H0/s1600/multiplicative.JPG" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
One equation is <b>additive </b>and another is <b>multiplicative </b>of 3 components- st is seasonal component, Tt is trend component and Rt is remainder component. Box Jenkins ( ARMA) methods, time series decomposition methods ( STL, X11, SEATS etc) and exponential smoothing methods are based on this equation only.<br />
Advance time series algorithms like <b><a href="https://machinelearningstories.blogspot.com/2017/12/gaussian-state-space-model-in-r-using.html" target="_blank">State space model with Kalman Filter</a>, <a href="https://machinelearningstories.blogspot.com/2017/05/facebooks-phophet-model-for-forecasting.html" target="_blank">Facebook's Prophet Model</a></b> and<b> <a href="https://machinelearningstories.blogspot.com/2017/02/hidden-markov-model-session-1.html" target="_blank">Hidden Markov Models</a> </b>are not based on pure linear and multiplicative relations for time series components.<br />
<br />
To start from the beginning one should try to get time series forecasting from<b> <a href="https://machinelearningstories.blogspot.com/2016/08/time-series-and-fitting-regression-on.html" target="_blank">regression itself</a></b>. If someone is looking for quick forecasting , I would recommend Forecast Hybrid model of R. By default it included the below time series methods-<br />
<br />
<div style="text-align: left;">
<b>A) Auto.Arima-</b> Identify best parameter for <a href="https://machinelearningstories.blogspot.com/2017/05/residual-plot-in-regression-acf-pacf-in.html" target="_blank">AR, MA and difference</a>. Can be applicable on non stationary series as long as differentiating gives a stationary series. </div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
<b>B) NNetr</b> - 1) feedforward NN with only 1 hidden layer.</div>
<div style="text-align: left;">
2) seasonality can also be included</div>
<div style="text-align: left;">
3) data transformation can also be done using box-cox transformation( generally used to have data normally distributed)</div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
<b>C) STLM </b>- 1) time series is deseasonlised.</div>
<div style="text-align: left;">
2) Decompose a time series into seasonal, trend and irregular components using loess.</div>
<div style="text-align: left;">
3) Forecast after decomposition</div>
<div style="text-align: left;">
4) Re-seasonalizing using the last year of the seasonal component</div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
<b>D) Thetam </b>- This fits an exponential smoothing state space model.</div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
<b>E) ETS- </b>1)exponential time series smoothing.</div>
<div style="text-align: left;">
2) All the lag values get continuously reduced Weighatge in forecast. <b> </b></div>
<div style="text-align: left;">
<b><br /></b></div>
<div style="text-align: left;">
<b>F) TBATS</b>- Exponential smoothing state space model with Box-Cox transformation, ARMA errors, Trend and Seasonal components</div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
<b>G</b>) <b>SNAIVE</b> - Returns forecasts and prediction intervals from an ARIMA(0,0,0)(0,1,0)m model where m is the seasonal period ( A seasonal Arima model)</div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
We can also specify how much weight has to be given to each time series method or we can ensemble few of these, not all if we are more confident about data.</div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
<i><u>to run the model -</u></i></div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
ts_train contains a uni-variate time series </div>
<div style="text-align: left;">
<i><u><br /></u></i></div>
<span style="color: blue;">install.packages("forecastHybrid")</span><br />
<span style="color: blue;">library(forecastHybrid)</span><br />
<div style="text-align: left;">
<span style="color: blue;">hm_model <- hybridModel(ts_train, weights = "insample.errors", errorMethod = "MASE")</span></div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
<i><u>output</u></i></div>
<div style="text-align: left;">
<i><u><br /></u></i></div>
<span style="color: blue;">> hm_model</span><br />
Hybrid forecast model comprised of the following models: auto.arima, ets, thetam, nnetar, tbats<br />
<br />
auto.arima with weight 0.196<br />
ets with weight 0.192<br />
thetam with weight 0.192<br />
nnetar with weight 0.224<br />
<br />
tbats with weight 0.196<br />
<br />
<u><i>values can be forecasted using </i></u><br />
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
n ahead forecasts- </div>
<div style="text-align: left;">
<span style="color: blue;">forecasted_values <- forecast(hm_model, n)</span></div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
<u><i>to see the fitting of all methods involved-</i></u></div>
<div style="text-align: left;">
<u><i><br /></i></u></div>
<div style="text-align: left;">
<span style="color: blue;">plot(hm_model, type = "fit")</span></div>
<div style="text-align: left;">
<span style="color: blue;"><br /></span></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEix-IRcoSV4cCunPsC_mbJHWo9-uorLletHkyZqiloJ8-SRoqCamNgKW_a04d87vZQLLOCojtbxGrPEL5kTi2CMf6lefXSsbdXwJgfz4v_EHONueA4HmhllXdtH3kypn8-VUIH0otmk0Sg/s1600/fitted_plot.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="618" data-original-width="745" height="529" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEix-IRcoSV4cCunPsC_mbJHWo9-uorLletHkyZqiloJ8-SRoqCamNgKW_a04d87vZQLLOCojtbxGrPEL5kTi2CMf6lefXSsbdXwJgfz4v_EHONueA4HmhllXdtH3kypn8-VUIH0otmk0Sg/s640/fitted_plot.png" width="640" /></a></div>
<div style="text-align: left;">
We can't see as values are overlapped but nnet has performed best among all methods so it has got maximum weight out of all other methods.</div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
I would suggest people to try HybridModel on uni-variate time series . it is going to perform well as it is based on the concept of bagging. </div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
<a href="https://machinelearningstories.blogspot.com/2018/07/anomaly-detection-anomaly-detection-by.html" target="_blank">Click here to know about all Anomaly Detection techniques</a></div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
<a href="https://machinelearningstories.blogspot.com/2017/09/hierarchical-clustering-bottom-up.html" target="_blank">Click here to know about Hierarchical Clustering</a></div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
<br /></div>
</div>
Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-8230692877620938204.post-45410410881350583082018-08-07T09:04:00.000-07:002018-08-10T07:21:30.884-07:00Structural Equation Modeling and Its Implementation in R<div dir="ltr" style="text-align: left;" trbidi="on">
<br />
<div class="MsoNormal" style="text-align: left;">
<span style="font-family: "times" , "times new roman" , serif;">Structural Equation Modeling is a quantitative research
technique. Which is used for following things-</span></div>
<h4 style="text-align: left;">
<span style="font-family: "times" , "times new roman" , serif;">1)<span style="font-size: 7pt; font-stretch: normal; font-variant-east-asian: normal; font-variant-numeric: normal; line-height: normal; text-indent: -0.25in;"> </span><b style="text-indent: -0.25in;">Causal
Modeling /Path Analysis </b><span style="text-indent: -0.25in;">- <span style="font-weight: normal;">I always get confused with correlation and
causation. This simple example from abs.gov site helps me always-</span></span></span></h4>
<h4 style="text-align: left;">
<b style="font-family: Times, "Times New Roman", serif;"><span style="font-size: 12.0pt; line-height: 107%;">Correlation is
a statistical measure (expressed as a number) that describes the size and
direction of a relationship between two or more variables.</span></b><span style="font-family: "times" , "times new roman" , serif; font-size: 12pt; line-height: 107%;"> <span style="font-weight: normal;">A correlation between
variables, however, does not automatically mean that the change in one variable
is the cause of the change in the values of the other variable.
</span><b>Causation indicates that one event is the result of the occurrence of the
other event; i.e. there is a causal relationship between the two events. This
is also referred to as cause and effect.</b> </span></h4>
<div class="MsoNormal">
<span style="font-size: 12.0pt; line-height: 107%;"><span style="font-family: "times" , "times new roman" , serif;">
Theoretically, the difference between the two types of relationships are easy
to identify — an action or occurrence can <i>cause</i> another (e.g.
smoking causes an increase in the risk of developing lung cancer), or it
can <i>correlate</i> with another (e.g. smoking is correlated with
alcoholism, but it does not cause alcoholism). In practice, however, it remains
difficult to clearly establish cause and effect, compared with establishing
correlation.</span><o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-size: 12.0pt; line-height: 107%;"><span style="font-family: "times" , "times new roman" , serif;"><br /></span></span></div>
<div class="MsoNormal">
<span style="font-size: 12.0pt; line-height: 107%;"></span></div>
<div class="MsoNormal">
SEM finds causal relationship among latent variables and
observed variables. This relationship also gives path analysis. (path<b>
analysis</b> is used to describe the directed dependencies among a set of
variables)<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<h4 style="text-align: left;">
<b>2) </b><span style="font-family: "times" , "times new roman" , serif;"><b><span style="font-size: 12pt; line-height: 107%;">Confirmatory Factor Analysis</span></b><span style="font-size: 12pt; line-height: 107%;"><b>- </b><span style="font-weight: normal;">It is used to test whether measures of a factor are
consistent with a researcher's understanding of the nature of that construct
(or factor). As such, the objective of confirmatory factor analysis is to test
whether the data fit a hypothesized measurement model. Constructs are generally
a set of questions representing a factor.</span></span></span></h4>
<div>
<span style="font-family: "times" , "times new roman" , serif;"><span style="font-size: 12pt; line-height: 107%;"><span style="font-weight: normal;"><br /></span></span></span></div>
<div>
<span style="font-family: "times" , "times new roman" , serif;"><span style="font-size: 12pt; line-height: 107%;"><b>3)</b><span style="font-weight: normal;"> </span></span><span style="font-size: 12pt;"> </span><b style="font-size: 12pt;">Partial least squares path modeling</b><span style="font-size: 12pt;">-</span><span style="font-size: 12pt;"> </span><span style="font-size: 12pt;">allows estimating complex cause-effect
relationship models with latent variables</span></span></div>
<div>
<span style="font-size: 12pt;"><span style="font-family: "times" , "times new roman" , serif;"><br /></span></span></div>
<div>
<span style="font-family: "times" , "times new roman" , serif;"><span style="font-size: 12pt;"><b>4) </b></span><b><span lang="EN" style="font-size: 12pt; line-height: 107%;">Latent growth modeling</span></b><span lang="EN" style="font-size: 12pt; line-height: 107%;"> – used to estimate </span><span style="font-size: 12pt; line-height: 107%;">to estimate growth trajectory. It is a
longitudinal analysis technique to estimate growth over a period of time. It is
widely used in the field of behavioral science, education and social science</span></span></div>
<div class="MsoNormal">
<br /></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEinGLMmr-YfWgNUpsiShhe0rOHsnsiOTvws4uuwCHy-hH_nx7XRJ841Pz0USEnYmM56YCwFlRXkbQ62gcDUeIWfxzLIBDSmN5ext2R8iar1bwGtX_WPaXrS08oxmUi4YvBHO6Zie92Yjew/s1600/symbols.GIF" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img alt="" border="0" data-original-height="410" data-original-width="757" height="216" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEinGLMmr-YfWgNUpsiShhe0rOHsnsiOTvws4uuwCHy-hH_nx7XRJ841Pz0USEnYmM56YCwFlRXkbQ62gcDUeIWfxzLIBDSmN5ext2R8iar1bwGtX_WPaXrS08oxmUi4YvBHO6Zie92Yjew/s400/symbols.GIF" title="symbols to understand structured relationship" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><b>Symbols to understand structured relationship</b></td></tr>
</tbody></table>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="MsoListParagraph">
<span style="font-family: "times" , "times new roman" , serif;">There is a Lavaan package in R which is still in beta
release( 08/2018). Let’s use this package to rum SEM model. In our example, we will use the built-in PoliticalDemocracy dataset. Measured variables are survey questionnaire that are constructed to capture a factor( latent variable). So SEM would give multiple relationship between these variables( latent and observed) included in the model. </span><o:p></o:p></div>
<div class="MsoListParagraph">
<span style="font-family: "times" , "times new roman" , serif;"><br /></span></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhnq1BhayXic3ybcYWxFu68fXSitBd0KLwE_NcHtgjqN0iboJ3m4xNgMY72nO79S1fv7Ge0LfSPc-Q9guYVkosIvQciXEwoAzP550ldvgcgdoN367SZWEW676KGTWHyAW26VJ52Wt3O74k/s1600/path.GIF" imageanchor="1" style="margin-left: auto; margin-right: auto;"><span style="font-family: "times" , "times new roman" , serif;"><img border="0" data-original-height="770" data-original-width="786" height="313" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhnq1BhayXic3ybcYWxFu68fXSitBd0KLwE_NcHtgjqN0iboJ3m4xNgMY72nO79S1fv7Ge0LfSPc-Q9guYVkosIvQciXEwoAzP550ldvgcgdoN367SZWEW676KGTWHyAW26VJ52Wt3O74k/s320/path.GIF" width="320" /></span></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><b><span style="font-family: "times" , "times new roman" , serif;">The figure below contains a graphical representation of the model that we want to fit</span></b></td></tr>
</tbody></table>
<div class="MsoListParagraph">
<span style="font-family: "times" , "times new roman" , serif;"><br /></span></div>
<div class="MsoListParagraph">
<span style="font-family: "times" , "times new roman" , serif;">Explanation of above image can be explained -</span></div>
<div class="MsoListParagraphCxSpFirst" style="mso-list: l0 level1 lfo1; text-indent: -.25in;">
</div>
<ul style="text-align: left;">
<li style="text-indent: 0px;"><span style="font-family: "times" , "times new roman" , serif;"><span style="background: white; line-height: 107%; text-indent: -0.25in;"> </span><span style="background: white; color: #333333; font-size: 12pt; line-height: 107%; text-indent: -0.25in;">x1, x2, x3, y1,
y2, y3, y4, y5, y6, y7 and y8 are measured variables (MV).</span></span></li>
<li><span style="font-family: "times" , "times new roman" , serif;"><span style="font-size: 7pt; font-stretch: normal; font-variant-east-asian: normal; font-variant-numeric: normal; line-height: normal;"> </span><span style="background: white; color: #333333; font-size: 12pt; line-height: 107%; text-indent: -0.25in;">ind60, dem60
and dem65 are latent variables (LV).</span></span></li>
<li><span style="font-family: "times" , "times new roman" , serif;"><span style="font-size: 7pt; font-stretch: normal; font-variant-east-asian: normal; font-variant-numeric: normal; line-height: normal;"> </span><span style="background: white; color: #333333; font-size: 12pt; line-height: 107%; text-indent: -0.25in;">ind60 has
direct relationship with x1, x2, x3 MVs and dem60, dem65 LVs</span></span></li>
<li><span style="font-family: "times" , "times new roman" , serif;"><span style="font-size: 7pt; font-stretch: normal; font-variant-east-asian: normal; font-variant-numeric: normal; line-height: normal;"> </span><span style="background: white; color: #333333; font-size: 12pt; line-height: 107%; text-indent: -0.25in;">dem60 has
direct relationship with y1, y2, y3, y4 MVs and dem65 LV</span></span></li>
<li><span style="font-family: "times" , "times new roman" , serif;"><span style="font-size: 7pt; font-stretch: normal; font-variant-east-asian: normal; font-variant-numeric: normal; line-height: normal;"> </span><span style="background: white; color: #333333; font-size: 12pt; line-height: 107%; text-indent: -0.25in;">dem65 has
direct relationship with y5, y6, y7 and y8 MVs</span></span></li>
<li><span style="font-family: "times" , "times new roman" , serif;"><span style="font-size: 7pt; font-stretch: normal; font-variant-east-asian: normal; font-variant-numeric: normal; line-height: normal;"> </span><span style="background: white; color: #333333; font-size: 12pt; line-height: 107%; text-indent: -0.25in;">y1 has
correlation with y5 which is not explained by their latent variables.</span></span></li>
<li><span style="font-family: "times" , "times new roman" , serif;"><span style="font-size: 7pt; font-stretch: normal; font-variant-east-asian: normal; font-variant-numeric: normal; line-height: normal;"> </span><span style="background: white; color: #333333; font-size: 12pt; line-height: 107%; text-indent: -0.25in;">y2 has
correlation with y4 and y6 which is not explained by their latent variables.</span></span></li>
<li><span style="font-family: "times" , "times new roman" , serif;"> <span style="background: white; color: #333333; font-size: 12pt; line-height: 107%; text-indent: -0.25in;">y3 has
correlation with y7 which is not explained by their latent variables.</span></span></li>
<li><span style="font-family: "times" , "times new roman" , serif;"><span style="font-size: 7pt; font-stretch: normal; font-variant-east-asian: normal; font-variant-numeric: normal; line-height: normal;"> </span><span style="background: white; color: #333333; font-size: 12pt; line-height: 107%; text-indent: -0.25in;">y4 has
correlation with y8 which is not explained by their latent variables.</span></span></li>
<li><span style="font-family: "times" , "times new roman" , serif;"><span style="font-size: 7pt; font-stretch: normal; font-variant-east-asian: normal; font-variant-numeric: normal; line-height: normal;"> </span><span style="background: white; color: #333333; font-size: 12pt; line-height: 107%; text-indent: -0.25in;">y6 has correlation
with y8 which is not explained by their latent variables. </span></span></li>
</ul>
<!--[if !supportLists]--><br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<h4 style="clear: both; text-align: justify;">
<span style="font-family: "times" , "times new roman" , serif;"><i style="font-weight: bold; text-decoration-line: underline;">Complete code to run SEM in R- </i><span style="font-weight: normal;">( data -PoliticalDemocracy is </span><span style="font-weight: 400;">available with lavaan package itself)</span><span style="font-weight: normal;"> </span></span></h4>
<div>
<b><span style="font-family: "times" , "times new roman" , serif;"><br /></span></b></div>
<div>
<pre style="background-color: #fafafa; border-radius: 3px; border: 1px solid rgb(214, 214, 214); color: #0a0a0a; font-family: Consolas, "Liberation Mono", Courier, monospace; font-size: 14px; font-stretch: inherit; font-variant-east-asian: inherit; font-variant-numeric: inherit; line-height: inherit; margin-bottom: 20px; overflow: auto; padding: 6px 10px; vertical-align: baseline;"><code class="language-r" data-lang="r" style="background-color: transparent; border-radius: 3px; border: medium none; font-family: Consolas, "Liberation Mono", Courier, monospace; font-size: inherit; font-stretch: inherit; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; padding: 0px; vertical-align: baseline;">library(lavaan) # only needed once per session
model <- '
# measurement model
ind60 =~ x1 + x2 + x3
dem60 =~ y1 + y2 + y3 + y4
dem65 =~ y5 + y6 + y7 + y8
# regressions
dem60 ~ ind60
dem65 ~ ind60 + dem60
# residual correlations
y1 ~~ y5
y2 ~~ y4 + y6
y3 ~~ y7
y4 ~~ y8
y6 ~~ y8
'
fit <- sem(model, data=PoliticalDemocracy)
summary(fit, standardized=TRUE)</code></pre>
</div>
<div class="separator" style="clear: both; text-align: left;">
After running the above code we will get fitted regression model for all the variables for the relation we specified in Structured model. result can be related to linear regression where p values less that .5 is considered strong relationship between variables.</div>
<div class="separator" style="clear: both; text-align: left;">
<br /></div>
<div class="separator" style="clear: both; text-align: left;">
<br /></div>
<div class="separator" style="clear: both; text-align: left;">
To know about graphical explanation of regression assumption- <a href="https://machinelearningstories.blogspot.com/2017/09/assumptions-of-linear-regression.html" target="_blank">Graphical analysis of regression assumption</a></div>
<div class="separator" style="clear: both; text-align: left;">
<br /></div>
<div class="separator" style="clear: both; text-align: left;">
To know about the relation of time series with regression- <a href="https://machinelearningstories.blogspot.com/2016/08/time-series-and-fitting-regression-on.html" target="_blank">Regression and Time Series</a></div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<span style="font-family: "times" , "times new roman" , serif;"></span></div>
Unknownnoreply@blogger.com0Bengaluru, Karnataka, India12.9715987 77.59456269999998312.4764182 76.949115699999979 13.4667792 78.240009699999987tag:blogger.com,1999:blog-8230692877620938204.post-52568248000809827682018-07-12T06:58:00.002-07:002018-12-14T01:13:48.971-08:00All approaches of Anomaly Detection & Anomaly detection by distance and density based clustering algorithms. <div dir="ltr" style="text-align: left;" trbidi="on">
<h2>
<b> <u> Anomaly Detection -</u></b></h2>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Anomalies are present in many industrial and non-industrials application. Intrusion detection, fraud prevention, identifying issue in any running industrial device and some illness identification, all these require some kind of anomaly detection. From machine learning perspective there are 3 types of anomaly detection techniques-<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<h3>
<u>Types of Anomaly Detection-</u></h3>
<b>1. Unsupervised Anomaly detection</b> – Some clustering algorithms like K-means are used to do unsupervised anomaly detection. Here all the features are passed to clustering algorithm and outliers are treated as abnormal data points.<br />
<div class="MsoNormal">
<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<b>2. Semi Supervised Anomaly detection technique </b>- In this approach, a normal model aka relation among the features is prepared and treated as ideal model. An electric motor might have thermal abnormalities. So a regression or complex relationship is established between current and temperature. Let’s say a neural network is used to fit this relation and temperature is forecasted using current. At run time actual values of temperature should match with forecasting value of temperature. The relationship that we got from NN should remain the same. Higher the error, higher chances of having abnormalities in the system.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<b>3. Supervised Anomaly detection techniques- </b>These are used when abnormalities are known in training period. These can be solved using classification techniques like decision tree. For example, a water pump is bent, this is known abnormality so characteristics of system (water pressure, temp, electricity used etc) at bent pipe is different from normal running pipe. If one classifies this data, he can get rules and these rules can be used to identify abnormal condition in future.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Unsupervised techniques require to build algorithm every time we want to identify abnormalities while other two requires building model just once. A combination of supervised and unsupervised is also used some times when output from the unsupervised detection is converted into classification data and then get the rules by running classification on same data. This avoids problem of building model every time.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<h3>
<o:p><u>Anomaly Detection by Distance and Density Based Algorithm </u></o:p></h3>
<div class="MsoNormal">
<o:p><br /></o:p></div>
<div class="MsoNormal">
<b>Anomaly Detection using K-Means Clustering</b> – This is a type of distance based unsupervised anomaly detection technique-<o:p></o:p></div>
<div class="MsoNormal">
## subsetting IRIS data-<o:p></o:p></div>
<div class="MsoNormal">
<span style="color: blue;">iris2 <- iris[,1:4]</span><o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
# running K means clustering<o:p></o:p></div>
<div class="MsoNormal">
<span style="color: blue;">kmeans.result <- kmeans(iris2, centers=3)</span></div>
<div class="MsoNormal">
<span style="color: blue;">plot(iris2[,c("Sepal.Length", "Sepal.Width")], pch=19, col=kmeans.result$cluster, cex=1)<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="color: blue;">centers <- kmeans.result$centers[kmeans.result$cluster, ] </span># "centers" is a data frame of 3 centers but the length of iris<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
# distance from the respective center<o:p></o:p></div>
<div class="MsoNormal">
<span style="color: blue;">distances <- sqrt(rowSums((iris2 - centers)^2))<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="color: blue;">outliers <- order(distances, decreasing=T)[1:5]<o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
## plotting outliers+ centers and all data points<o:p></o:p></div>
<div class="MsoNormal">
<span style="color: blue;">print(outliers) # these rows are 5 top outliers<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="color: blue;">points(kmeans.result$centers[,c("Sepal.Length", "Sepal.Width")], col=1:3, pch=15, cex=2)<o:p></o:p></span></div>
<div class="MsoNormal">
</div>
<div class="MsoNormal">
<span style="color: blue;">points(iris2[outliers, c("Sepal.Length", "Sepal.Width")], pch="+", col=4, cex=3)</span><o:p></o:p></div>
<div class="MsoNormal">
<span style="color: blue;"><br /></span></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgOudL196LWGQR0sTRW3ibIYLnZj-Z5Xv06Mo_RIjaDUhM4WKeBLffTqK87u6a6sTc4K_OdRoXAi7Uv_Z-wU77VkO0LOxHEfPR1pNgA6wmDQOG_hZz9-NqEFLeibxTLq0ZwzAyydBjNh_s/s1600/Rplot-+Kmeans.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="618" data-original-width="745" height="331" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgOudL196LWGQR0sTRW3ibIYLnZj-Z5Xv06Mo_RIjaDUhM4WKeBLffTqK87u6a6sTc4K_OdRoXAi7Uv_Z-wU77VkO0LOxHEfPR1pNgA6wmDQOG_hZz9-NqEFLeibxTLq0ZwzAyydBjNh_s/s400/Rplot-+Kmeans.png" width="400" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="MsoNormal">
<b>Anomaly Detection Using Local Outlier Factor (LOF)</b>- Local outlier factor is more useful when there are multiple operating conditions for the system. Due to the local approach, LOF is able to identify outliers in a data set that would not be outliers in another area of the data set. For example, a point at a "small" distance to a very dense cluster is an outlier, while a point within a sparse cluster might exhibit similar distances to its neighbors. It is nicely explained here- <a href="https://en.wikipedia.org/wiki/Local_outlier_factor">LOF</a><o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
#Sub-setting the data</div>
<div class="MsoNormal">
<span style="color: blue;">iris2 <- iris[,1:4]<o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="color: blue;">k <- 5 </span># number of neighbours<span style="color: blue;"><o:p></o:p></span></div>
<div class="MsoNormal">
<span style="color: blue;">library(DMwR)<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="color: blue;"><br /></span></div>
<div class="MsoNormal">
# running LOF Code</div>
<div class="MsoNormal">
<span style="color: blue;">outlier.scores <- lofactor(iris2, k)</span><o:p></o:p></div>
<div class="MsoNormal">
<o:p> </o:p> </div>
<div class="MsoNormal">
# taking data points with high LOF score only </div>
<div class="MsoNormal">
<span style="color: blue;">iris2$LOF_Score <- outlier.scores<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="color: blue;">iris3 <- iris2[order(iris2$LOF_Score, decreasing = T),]<o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
# subsetting and plotting the data</div>
<div class="MsoNormal">
<span style="color: blue;">iris4 <- iris3[, c("Sepal.Length", "Sepal.Width")]<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="color: blue;">plot(iris4)</span></div>
<div class="MsoNormal">
<span style="color: blue;">lof_outlier <- iris4[c(1:5),]<o:p></o:p></span></div>
<div class="MsoNormal">
</div>
<div class="MsoNormal">
<span style="color: blue;">points(lof_outlier,pch="+", col=4, cex=3)</span><o:p></o:p></div>
<div class="MsoNormal">
<span style="color: blue;"><br /></span></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjxH2ElzxchpWV0Ot8PzU9OFeMiHU9vTRTvyL8U4rNgwR-K6759TF2uNaxibUgVhZcTnNDZT90QjB1GfiJJJJMn4QiBLqmT6qcRq54pPnZe8fvmwQQahJijqJiNFhY8PMLKuak4bNIrvpw/s1600/LOF.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="618" data-original-width="745" height="331" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjxH2ElzxchpWV0Ot8PzU9OFeMiHU9vTRTvyL8U4rNgwR-K6759TF2uNaxibUgVhZcTnNDZT90QjB1GfiJJJJMn4QiBLqmT6qcRq54pPnZe8fvmwQQahJijqJiNFhY8PMLKuak4bNIrvpw/s400/LOF.png" width="400" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="MsoNormal">
<span style="color: blue;"><br /></span></div>
Here we can see that LOF has identified local outliers while K-means collects global outlier. K-means might collect outliers from less dense cluster while LOF try to give equal weight by diving the readability distance of a points by it's neighbor's readability.<br />
<br />
Other complex outlier detection techniques-<br />
<br />
<a href="https://machinelearningstories.blogspot.com/2018/08/anomaly-detection-in-high-dimensional.html" target="_blank">Angle based outlier detection method</a><br />
<br />
<a href="https://machinelearningstories.blogspot.com/2018/09/connectivity-based-outlier-detection.html" target="_blank">Connectivity based outlier detection methid</a><br />
<br />
<br />
read about the <a href="http://machinelearningstories.blogspot.com/2017/09/hierarchical-clustering-bottom-up.html" target="_blank">Hierarchical Clustering (Bottom-Up Clustering) & Performance Parameters</a> if you are looking for some lucid explanation and difference between both the type of clustering.<br />
<br />
basics of statistics topics - Graphical explanation of Linear regression equations. <a href="http://machinelearningstories.blogspot.com/2017/09/assumptions-of-linear-regression.html" target="_blank">Assumptions of LR- (Graphical Explanation) </a><br />
<div>
<br /></div>
</div>
Unknownnoreply@blogger.com2tag:blogger.com,1999:blog-8230692877620938204.post-24951881895077856432018-06-15T05:39:00.002-07:002018-06-15T05:47:01.961-07:00LDA ( Linear discriminant Analysis )- Where is it in data science !<div dir="ltr" style="text-align: left;" trbidi="on">
<br />
<div class="MsoNormal">
It is used for objects classification based on the set of
features and for dimensionality reduction. Groups are pre known so its
supervised technique unlike PCA which is unsupervised.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
It is different from any
kind of clustering also as in clustering, based on type of clustering rules are
already given to classify the data. Ex. K means takes distance into
consideration while LOF takes density into consideration. LDA would give the
classification rules much like logistic regression.<span style="background: white; color: #111111; font-family: "helvetica" , sans-serif;"> </span>The goal is to
project a dataset onto a lower-dimensional space with good class-separability
in order avoid overfitting (“curse of dimensionality”) and also reduce
computational costs.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<b>PreCondition-</b><o:p></o:p></div>
<div class="MsoNormal">
<b><br /></b></div>
<div class="MsoNormal">
Groups should be linearly separable.<o:p></o:p></div>
<div class="MsoNormal">
Standardization of data is optional.<o:p></o:p></div>
<div class="MsoNormal">
Groups should be normally distributed.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
but in addition to finding the component axes that
maximize the variance of our data (PCA), we are additionally interested in the
axes that maximize the separation between multiple classes (LDA).<o:p></o:p></div>
<div class="MsoNormal">
So, in a nutshell, often the goal of an LDA is to project a
feature space (a dataset n-dimensional samples) onto a smaller
subspace kk (where k≤n−1k≤n−1) while maintaining the
class-discriminatory information.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<b>Comparision of LDA and PCA-</b></div>
<div class="MsoNormal">
<b><br /></b></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgS0bcHpXKwEl53HvDfo0siQH_zaNpZXAzGfDPbTW5-Xll-S5LJbySfjbRX7qlIR9xnjhjAzk5TqwhtfguS_61T0kMI2UcWu8cofsNfxkibFxAkIMF18mW4wWl-ztns_jOHEZYM1mtvBw0/s1600/output_24_1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="264" data-original-width="383" height="220" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgS0bcHpXKwEl53HvDfo0siQH_zaNpZXAzGfDPbTW5-Xll-S5LJbySfjbRX7qlIR9xnjhjAzk5TqwhtfguS_61T0kMI2UcWu8cofsNfxkibFxAkIMF18mW4wWl-ztns_jOHEZYM1mtvBw0/s320/output_24_1.png" width="320" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjPz86jBysWPv5SKv7H86nyZXQRdDrUZ6i1i2af1FCGu4BcHZEiinmAsoWl8jszDkiIvmianrmMFQ9nUc2dGM4snKDhwRImpWeB4evs35qcVixTFJ_e15AEJxF7zZGhYh1PkqBLYyYeDzM/s1600/output_24_2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="264" data-original-width="374" height="225" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjPz86jBysWPv5SKv7H86nyZXQRdDrUZ6i1i2af1FCGu4BcHZEiinmAsoWl8jszDkiIvmianrmMFQ9nUc2dGM4snKDhwRImpWeB4evs35qcVixTFJ_e15AEJxF7zZGhYh1PkqBLYyYeDzM/s320/output_24_2.png" width="320" /></a></div>
<div class="MsoNormal">
<b><br /></b></div>
<div class="MsoNormal">
It seems like both have separated data very well. With class separability should be lesser in LDA as it has to separate the class data well . </div>
<div class="MsoNormal">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEio3r6RA2xRYioTv_KS8GH6yzCTkjTRdCSKFEofwA4yoWJtnQmuDe_1ts4RpknQj5Rx-LeDJzUW87oBojAGvupgs75ZHoHMyvPbS7aNzLq2de3YKi0Ue9Nbye10VnrkVSSPrIFThm9R16w/s1600/Capture.PNG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="197" data-original-width="943" height="132" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEio3r6RA2xRYioTv_KS8GH6yzCTkjTRdCSKFEofwA4yoWJtnQmuDe_1ts4RpknQj5Rx-LeDJzUW87oBojAGvupgs75ZHoHMyvPbS7aNzLq2de3YKi0Ue9Nbye10VnrkVSSPrIFThm9R16w/s640/Capture.PNG" width="640" /></a></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Taking IRIS datasets and for 2 class, LDA performs better than PCA in terms dimensionality reduction. </div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<o:p>Detailed reading is present at- </o:p></div>
<div class="MsoNormal">
<a href="https://sebastianraschka.com/Articles/2014_python_lda.html#21-within-class-scatter-matrix-s_w">LDA by Sebastian Raschka</a></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<br /></div>
Unknownnoreply@blogger.com0India20.593684 78.962880000000041-8.6041045000000018 37.654286000000042 49.7914725 120.27147400000004tag:blogger.com,1999:blog-8230692877620938204.post-63890942599736886402017-12-28T05:40:00.001-08:002018-08-03T07:07:47.276-07:00Gaussian State Space Model in R using Kalman Filtering and Smoothing<div dir="ltr" style="text-align: left;" trbidi="on">
<div class="MsoNormal">
<span class="tex2jaxignore"><b><span style="border: 1pt none; font-family: "georgia" , serif; padding: 0in;">State space model</span></b></span><span style="background: white; font-family: "georgia" , serif;"> (SSM) refers to a class of probabilistic graphical
model (Koller and Friedman, 2009) that describes the probabilistic dependence
between the latent state variable and the observed measurement. The state or
the measurement can be either continuous or discrete. The term “</span><span style="background: white; border: none 1.0pt; font-family: "georgia" , serif; padding: 0in;">state space</span><span style="background: white; font-family: "georgia" , serif;">” originated
in 1960s in the area of control engineering (Kalman, 1960). SSM provides a
general framework for analyzing deterministic and </span><span style="background: white; border: none 1.0pt; font-family: "georgia" , serif; padding: 0in;">stochastic dynamical systems</span><span style="background: white; font-family: "georgia" , serif;"> that are
measured or observed through a </span><span style="background: white; border: none 1.0pt; font-family: "georgia" , serif; padding: 0in;">stochastic process. Now what is state space and
dynamic system?</span><o:p></o:p></div>
<div class="MsoNormal">
<span style="background: white; border: none 1.0pt; font-family: "georgia" , serif; padding: 0in;"><br /></span></div>
<div class="MsoNormal">
<span class="tex2jaxignore"><b><span style="border: 1pt none; font-family: "georgia" , serif; padding: 0in;">State space</span></b></span><span style="background: white; font-family: "georgia" , serif;"> is the set of all possible states of a </span><span style="background: white; border: none 1.0pt; font-family: "georgia" , serif; padding: 0in;">dynamical system</span><span style="background: white; font-family: "georgia" , serif;">; each state
of the system corresponds to a unique point in the state space. For example,
the state of an idealized pendulum is uniquely defined by its angle and angular
velocity, so the state space is the set of all possible pairs (angle, velocity).</span>
<span style="background: white; font-family: "georgia" , serif;">A
dynamical system is a rule for time evolution on a state space.<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="background: white; font-family: "georgia" , serif;">The most well studied SSM is the Kalman filter, which defines
an optimal </span><span style="background: white; border: none 1.0pt; font-family: "georgia" , serif; padding: 0in;">algorithm</span><span style="background: white; font-family: "georgia" , serif;"> for inferring linear Gaussian systems</span>.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="background: white; font-family: "georgia" , serif;">Like hidden markov models, in SSM we have ,<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="background: white; font-family: "georgia" , serif;"><br /></span></div>
<div class="MsoNormal">
</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg0R09JzNqRRd58WuhmZJ2POKhyBIdReDpuUE-HUlYAmJ1gfKWhqrGPzKcjSBBWiP8cVamEBWewPlHMykwW-OhQQqNgMEvRb6_BNKMxn8CNtnQPBfl4J_mzm9z_WRH1lZ-hMVW0Vn5u0vQ/s1600/state%2526transition.PNG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="140" data-original-width="747" height="116" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg0R09JzNqRRd58WuhmZJ2POKhyBIdReDpuUE-HUlYAmJ1gfKWhqrGPzKcjSBBWiP8cVamEBWewPlHMykwW-OhQQqNgMEvRb6_BNKMxn8CNtnQPBfl4J_mzm9z_WRH1lZ-hMVW0Vn5u0vQ/s640/state%2526transition.PNG" width="640" /></a></div>
<ol style="text-align: left;">
<li>Observation Equation</li>
<li>State Equation.</li>
</ol>
<div>
where yt is observations at time t. Zt, Tt and Rt are system matrices. Alpha is latent state variable. et is normally distributed with 0 mean and Ht standard deviation (sd) and nt is also normally distributed with 0 mean and Qt sd.</div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<b>Example of Gaussian State space model ( using KFAS package in R)-</b></div>
<br />
<div class="MsoNormal">
<br /></div>
<div>
<div class="MsoNormal">
<span style="background: white; font-family: "georgia" , serif;"><u>Providing Initial values of parameters-</u><o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="background: white; font-family: "georgia" , serif;"><i><span style="color: blue;">data("alcohol")<o:p></o:p></span></i></span></div>
<div class="MsoNormal">
<span style="background: white; font-family: "georgia" , serif;"><i><span style="color: blue;">View(head(alcohol))<o:p></o:p></span></i></span></div>
<div class="MsoNormal">
<span style="background: white; font-family: "georgia" , serif;"><i><span style="color: blue;">deaths <- window(alcohol[,2], end =2007)<o:p></o:p></span></i></span></div>
<div class="MsoNormal">
<span style="background: white; font-family: "georgia" , serif;"><i><span style="color: blue;">population <- window(alcohol[,6], end = 2007)<o:p></o:p></span></i></span></div>
<div class="MsoNormal">
<span style="background: white; font-family: "georgia" , serif;"><i><span style="color: blue;">## defining all system matrices<o:p></o:p></span></i></span></div>
<div class="MsoNormal">
<span style="background: white; font-family: "georgia" , serif;"><i><span style="color: blue;">Zt <- matrix(c(1,0),1,2) #matrix(data = NA, nrow = 1, ncol
= 1)<o:p></o:p></span></i></span></div>
<div class="MsoNormal">
<span style="background: white; font-family: "georgia" , serif;"><i><span style="color: blue;">Ht<- matrix(NA)<o:p></o:p></span></i></span></div>
<div class="MsoNormal">
<span style="background: white; font-family: "georgia" , serif;"><i><span style="color: blue;">Tt <- matrix(c(1,0,1,1),2,2)<o:p></o:p></span></i></span></div>
<div class="MsoNormal">
<span style="background: white; font-family: "georgia" , serif;"><i><span style="color: blue;">Rt <- matrix(c(1,0), 2,1)<o:p></o:p></span></i></span></div>
<div class="MsoNormal">
<span style="background: white; font-family: "georgia" , serif;"><i><span style="color: blue;">Qt <- matrix(NA)<o:p></o:p></span></i></span></div>
<div class="MsoNormal">
<span style="background: white; font-family: "georgia" , serif;"><i><span style="color: blue;">a1 <- matrix(c(1,0),2,1)<o:p></o:p></span></i></span></div>
<div class="MsoNormal">
<span style="background: white; font-family: "georgia" , serif;"><i><span style="color: blue;">P1 <- matrix(0,2,2)<o:p></o:p></span></i></span></div>
<div class="MsoNormal">
<span style="background: white; font-family: "georgia" , serif;"><i><span style="color: blue;">P1inf <- diag(2)<o:p></o:p></span></i></span></div>
<div class="MsoNormal">
<br /></div>
</div>
<div class="MsoNormal">
<span style="background: white; font-family: "georgia" , serif;"><br /></span></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgzfE4mpNiIh34qVvu2fU_jKfHXW0lfr5bZ5CxDTOSzQJ4FBQtbqHixOydwoUIbhMasxrog_ODaA0Dk4RAnftMeif8GOM3Ut1ViS4-5GPrsIyW4o3o9no4g-a80VGGYPLheegXb4aEkz94/s1600/input_data.PNG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="225" data-original-width="604" height="238" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgzfE4mpNiIh34qVvu2fU_jKfHXW0lfr5bZ5CxDTOSzQJ4FBQtbqHixOydwoUIbhMasxrog_ODaA0Dk4RAnftMeif8GOM3Ut1ViS4-5GPrsIyW4o3o9no4g-a80VGGYPLheegXb4aEkz94/s640/input_data.PNG" width="640" /></a></div>
<div class="MsoNormal">
<span style="background: white; font-family: "georgia" , serif;"><br /></span></div>
<div class="MsoNormal">
<span class="tex2jaxignore"><span style="border: none 1.0pt; font-family: "georgia" , serif; padding: 0in;"><u>Create a State Space Model Object of Class SSModel-</u><o:p></o:p></span></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span class="tex2jaxignore"><span style="border: none 1.0pt; font-family: "georgia" , serif; padding: 0in;"><span style="color: blue;"><i>model_gaussian <- SSModel(deaths / population ~
-1 + SSMcustom(Z = Zt, T = Tt, R = Rt, Q = Qt, a1 = a1, P1 = P1,P1inf =
P1inf),H = Ht)</i><o:p></o:p></span></span></span></div>
<div class="MsoNormal">
<span class="tex2jaxignore"><span style="border: none 1.0pt; font-family: "georgia" , serif; padding: 0in;"><br /></span></span></div>
<div class="MsoNormal">
<span class="tex2jaxignore"><span style="border: none 1.0pt; font-family: "georgia" , serif; padding: 0in;"><br /></span></span></div>
<div class="MsoNormal">
</div>
<div class="MsoNormal">
<span class="tex2jaxignore"><span style="border: none 1.0pt; font-family: "georgia" , serif; padding: 0in;">y; univariate time-series is deaths / population. Function SSModel (SSMarima, SSMcustom , SSMcycle, SSMregression,
SSMseasonal¸ SSMtrend) creates a state
space object of class SSModel which
can be used as an input object for various functions like fitSSM of KFAS package.<o:p></o:p></span></span></div>
<div class="MsoNormal">
<span class="tex2jaxignore"><span style="border: none 1.0pt; font-family: "georgia" , serif; padding: 0in;"><br /></span></span></div>
<div class="MsoNormal">
<span class="tex2jaxignore"><span style="border: none 1.0pt; font-family: "georgia" , serif; padding: 0in;"><u>Fitting the model-</u><o:p></o:p></span></span></div>
<div class="MsoNormal">
<span class="tex2jaxignore"><span style="border: none 1.0pt; font-family: "georgia" , serif; padding: 0in;"><u><br /></u></span></span></div>
<div class="MsoNormal">
<span class="tex2jaxignore"><span style="border: none 1.0pt; font-family: "georgia" , serif; padding: 0in;"><i><span style="color: blue;">fit_gaussian <- fitSSM(model_gaussian, inits =
c(0, 0), method = "BFGS")</span></i><o:p></o:p></span></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span class="tex2jaxignore"><span style="border: none 1.0pt; font-family: "georgia" , serif; padding: 0in;">
</span></span></div>
<div class="MsoNormal">
<span class="tex2jaxignore"><span style="border: none 1.0pt; font-family: "georgia" , serif; padding: 0in;">Function fitSSM finds the maximum
likelihood estimates for unknown parameters of an arbitary state space model,
given the user-defined model updating function. One of the arguments of fitSSM
is ‘updatefn’ which is used to update parameters. This is wrapper on optim
function which is a General-purpose optimization based on Nelder–Mead,
quasi-Newton and conjugate-gradient algorithms.<o:p></o:p></span></span></div>
<div class="MsoNormal">
<span class="tex2jaxignore"><span style="border: none 1.0pt; font-family: "georgia" , serif; padding: 0in;"><br /></span></span></div>
<div class="MsoNormal">
<span class="tex2jaxignore"><span style="border: none 1.0pt; font-family: "georgia" , serif; padding: 0in;"><i><span style="color: blue;">out_gaussian <- KFS(fit_gaussian$model)</span></i><o:p></o:p></span></span></div>
<div class="MsoNormal">
<span class="tex2jaxignore"><span style="border: none 1.0pt; font-family: "georgia" , serif; padding: 0in;"><br /></span></span></div>
<div class="MsoNormal">
<span class="tex2jaxignore"><span style="border: none 1.0pt; font-family: "georgia" , serif; padding: 0in;">
</span></span></div>
<div class="MsoNormal">
<span class="tex2jaxignore"><span style="border: none 1.0pt; font-family: "georgia" , serif; padding: 0in;">This will provide state filtering and state
smoothing. KFS will also do next step forecasting. Forecasted value can be
taken from<span style="color: blue;"> ts_data_forecasted <- out_gaussian$muhat</span>.The plot of fitted and actual values can be drawn by below code-<o:p></o:p></span></span></div>
<div class="MsoNormal">
<span class="tex2jaxignore"><span style="border: none 1.0pt; font-family: "georgia" , serif; padding: 0in;"><br /></span></span></div>
<div class="MsoNormal">
<span class="tex2jaxignore"><span style="border: none 1.0pt; font-family: "georgia" , serif; padding: 0in;"><span style="color: blue;"><br /></span></span></span></div>
<div class="MsoNormal">
<span class="tex2jaxignore"><span style="border: none 1.0pt; font-family: "georgia" , serif; padding: 0in;"><i><span style="color: blue;">ts_data_forecasted <- ts_data_forecasted[1:
length(ts_data_forecasted)-1]<o:p></o:p></span></i></span></span></div>
<div class="MsoNormal">
<span class="tex2jaxignore"><span style="border: none 1.0pt; font-family: "georgia" , serif; padding: 0in;"><i><span style="color: blue;">x<- 1:length(ts_data_forecasted)</span></i></span></span></div>
<div class="MsoNormal">
<span class="tex2jaxignore"><span style="border: none 1.0pt; font-family: "georgia" , serif; padding: 0in;"><i><span style="color: blue;">alcohol_plots <-
data.frame(X=x,actuals=ts_data,forecasted=ts_data_forecasted)<o:p></o:p></span></i></span></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span class="tex2jaxignore"><span style="border: none 1.0pt; font-family: "georgia" , serif; padding: 0in;"><i><span style="color: blue;">ggplot(alcohol_plots, aes(x=X)) + <o:p></o:p></span></i></span></span></div>
<div class="MsoNormal">
<span class="tex2jaxignore"><span style="border: none 1.0pt; font-family: "georgia" , serif; padding: 0in;"><i><span style="color: blue;">
</span></i></span></span></div>
<div class="MsoNormal">
<span class="tex2jaxignore"><span style="border: none 1.0pt; font-family: "georgia" , serif; padding: 0in;"><i><span style="color: blue;">
geom_line(aes(y = forecasted, colour = "red")) +
geom_line(aes(y = actuals, colour = "blue"))<o:p></o:p></span></i></span></span></div>
<div class="MsoNormal">
<span class="tex2jaxignore"><span style="border: none 1.0pt; font-family: "georgia" , serif; padding: 0in;"><br /></span></span></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEicw1p0XlLUIS-Qmrf0KY07dR33jpaIdSiW4txFfuJC0nRFPOym4QXvLZ8YBY5vl-2U7Ua39Da0l1pPCk6DsIJTZuMQxrrwWL3Aunpi3XyEWK3otGxKxr8bPfb7AvwIEJUzcfEV4064HoU/s1600/Rplot.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="611" data-original-width="725" height="538" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEicw1p0XlLUIS-Qmrf0KY07dR33jpaIdSiW4txFfuJC0nRFPOym4QXvLZ8YBY5vl-2U7Ua39Da0l1pPCk6DsIJTZuMQxrrwWL3Aunpi3XyEWK3otGxKxr8bPfb7AvwIEJUzcfEV4064HoU/s640/Rplot.png" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">red-forecasted & black-actual values</td></tr>
</tbody></table>
<div class="MsoNormal">
<span class="tex2jaxignore"><span style="border: none 1.0pt; font-family: "georgia" , serif; padding: 0in;">More about state based Models ( Hidden Markov) -</span></span><br />
<span class="tex2jaxignore"><span style="border: none 1.0pt; font-family: "georgia" , serif; padding: 0in;"><br /></span></span><a href="https://machinelearningstories.blogspot.com/2017/02/hidden-markov-model-session-1.html" target="_blank">HMM- 1 ( Hidden Markov Model 1)</a><br />
<span class="tex2jaxignore"><span style="border: none 1.0pt; font-family: "georgia" , serif; padding: 0in;"><br /></span></span><a href="https://machinelearningstories.blogspot.com/2017/03/hidden-markov-model-session-2.html" target="_blank">HMM- 2 ( Hidden Markov Models 2)</a><br />
<span class="tex2jaxignore"><span style="border: none 1.0pt; font-family: "georgia" , serif; padding: 0in;"><br /></span></span>
<span class="tex2jaxignore"><span style="border: none 1.0pt; font-family: "georgia" , serif; padding: 0in;"><br /></span></span>
<span class="tex2jaxignore"><span style="border: none 1.0pt; font-family: "georgia" , serif; padding: 0in;"><br /></span></span>
<span class="tex2jaxignore"><span style="border: none 1.0pt; font-family: "georgia" , serif; padding: 0in;"><br /></span></span></div>
<div class="MsoNormal">
<span class="tex2jaxignore"><span style="border: none 1.0pt; font-family: "georgia" , serif; padding: 0in;"><br /></span></span></div>
</div>
Unknownnoreply@blogger.com0Bengaluru, Karnataka, India12.9715987 77.59456269999998312.4764182 76.949115699999979 13.4667792 78.240009699999987tag:blogger.com,1999:blog-8230692877620938204.post-40524382038128258192017-09-18T05:10:00.003-07:002017-09-18T05:13:33.870-07:00Assumptions of Linear Regression, Multicollinearity & Outliers Detection<div dir="ltr" style="text-align: left;" trbidi="on">
<div class="MsoNormal">
<span style="font-family: "arial" , "helvetica" , sans-serif;"><b>Assumption Of Linear Regression-</b><o:p></o:p></span></div>
<div class="MsoNormal">
<b><span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span></b></div>
<div class="MsoListParagraphCxSpFirst" style="mso-list: l0 level1 lfo1; text-indent: -18.0pt;">
<!--[if !supportLists]--><span style="font-family: "arial" , "helvetica" , sans-serif;">1.<span style="font-size: 7pt; font-stretch: normal; font-variant-numeric: normal; line-height: normal;">
</span><!--[endif]-->Mean of response, at each value of predictor x
is <span style="color: red;">L</span>inear function of x.<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="mso-list: l0 level1 lfo1; text-indent: -18.0pt;">
<!--[if !supportLists]--><span style="font-family: "arial" , "helvetica" , sans-serif;">2.<span style="font-size: 7pt; font-stretch: normal; font-variant-numeric: normal; line-height: normal;">
</span><!--[endif]-->Error terms should be <span style="color: red;">I</span>ndependent
of each other.<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpMiddle" style="mso-list: l0 level1 lfo1; text-indent: -18.0pt;">
<!--[if !supportLists]--><span style="font-family: "arial" , "helvetica" , sans-serif;">3.<span style="font-size: 7pt; font-stretch: normal; font-variant-numeric: normal; line-height: normal;">
</span><!--[endif]-->Error terms should be <span style="color: red;">N</span>ormally
distributed.<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpLast" style="mso-list: l0 level1 lfo1; text-indent: -18.0pt;">
<!--[if !supportLists]--><span style="font-family: "arial" , "helvetica" , sans-serif;">4.<span style="font-size: 7pt; font-stretch: normal; font-variant-numeric: normal; line-height: normal;">
</span><!--[endif]-->Error terms should have <span style="color: red;">E</span>qual
variance.<o:p></o:p></span></div>
<div class="MsoListParagraphCxSpLast" style="mso-list: l0 level1 lfo1; text-indent: -18.0pt;">
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span></div>
<div class="MsoNormal">
<span style="font-family: "arial" , "helvetica" , sans-serif;">These can be termed as <span style="color: red;">LINE
assumptions. <o:p></o:p></span></span></div>
<div class="MsoNormal">
<span style="color: red;"><span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span></span></div>
<br />
<div class="MsoNormal">
<span style="font-family: "arial" , "helvetica" , sans-serif;"><b>Residuals v/s fitted</b>
values graph can be used to test assumptions of linear relationship of Y and x
values, Equal variance of error terms. <o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgWRr0quVQEHMr-upA8-GvTlK3hsdviWr35aOmykVIoWVNldNVspm5BvaSZkNs3HpqtXbp9UfbbUcToHIRAwFeg1CERa4ZeIHtVX27HX4IRyP9DQK4jNTVmPwilzZLhdJEjebdQwDgxcsY/s1600/residualPplot.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><span style="font-family: "arial" , "helvetica" , sans-serif;"><img border="0" data-original-height="333" data-original-width="505" height="263" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgWRr0quVQEHMr-upA8-GvTlK3hsdviWr35aOmykVIoWVNldNVspm5BvaSZkNs3HpqtXbp9UfbbUcToHIRAwFeg1CERa4ZeIHtVX27HX4IRyP9DQK4jNTVmPwilzZLhdJEjebdQwDgxcsY/s400/residualPplot.png" width="400" /></span></a></div>
<div class="separator" style="clear: both; text-align: center;">
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span></div>
<div class="separator" style="clear: both; text-align: center;">
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span></div>
<ul style="margin-top: 0cm;" type="square">
<li class="MsoNormal"><span style="font-family: "arial" , "helvetica" , sans-serif;">The
residuals "bounce randomly" around the 0 line. This suggests
that the assumption that the relationship is linear is reasonable.<o:p></o:p></span></li>
<li class="MsoNormal"><span style="font-family: "arial" , "helvetica" , sans-serif;">The
residuals roughly form a "horizontal band" around the 0 line.
This suggests that the variances of the error terms are equal.<o:p></o:p></span></li>
<li class="MsoNormal"><span style="font-family: "arial" , "helvetica" , sans-serif;">No
one residual "stands out" from the basic random pattern of
residuals. This suggests that there are no outliers.</span></li>
</ul>
<div>
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span></div>
<div>
<div class="MsoNormal">
<span style="font-family: "arial" , "helvetica" , sans-serif;">Residuals v/s fitted values graph when relationship is not
linear or variance is not constant for error terms.<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjiMwYP5BJshxeETyRDrsCetDVRMmPiGQirO8esZ9gpWjqTf4u1Rw49svlIZEDC1sYKskN9SuXqYRr0dyTaozGsxcc7NqCD0bVccekgK7rhui9gG45zoDk8UE06DbXavYQNaNEDOeWJ7qY/s1600/nonlinearRelation.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><span style="font-family: "arial" , "helvetica" , sans-serif;"><img border="0" data-original-height="337" data-original-width="503" height="267" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjiMwYP5BJshxeETyRDrsCetDVRMmPiGQirO8esZ9gpWjqTf4u1Rw49svlIZEDC1sYKskN9SuXqYRr0dyTaozGsxcc7NqCD0bVccekgK7rhui9gG45zoDk8UE06DbXavYQNaNEDOeWJ7qY/s400/nonlinearRelation.png" width="400" /></span></a></div>
<div class="separator" style="clear: both; text-align: center;">
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span></div>
<div class="MsoNormal">
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span></div>
<div class="MsoNormal">
<span style="font-family: "arial" , "helvetica" , sans-serif;">Relationship is not linear- Most of the values will not lie
near y=0, showing deviation from linear relationship. Had it a perfect linear
relation all residuals must be 0.<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span></div>
</div>
<div class="MsoNormal">
<span style="font-family: "arial" , "helvetica" , sans-serif;">Error terms don’t have constant variance- As the fitted values are increasing error
terms are increasing.<o:p></o:p></span></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhbhNRmTYJ4JERUjvjJnT5VgQzUoz65t60_Czf2VKOqy6-5-WaFiTMgve1NWGTaVM2LJIeelkf5bv2ManX4H1liqZ30lf_4ht9kZ6tlFFUSnDuyPIgH5sUuURgSDkxWw8FDepnRmni24ag/s1600/non_constant_variation.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><span style="font-family: "arial" , "helvetica" , sans-serif;"><img border="0" data-original-height="333" data-original-width="502" height="212" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhbhNRmTYJ4JERUjvjJnT5VgQzUoz65t60_Czf2VKOqy6-5-WaFiTMgve1NWGTaVM2LJIeelkf5bv2ManX4H1liqZ30lf_4ht9kZ6tlFFUSnDuyPIgH5sUuURgSDkxWw8FDepnRmni24ag/s320/non_constant_variation.png" width="320" /></span></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><span style="font-family: "arial" , "helvetica" , sans-serif;">Hetroscedastic residuals</span></td></tr>
</tbody></table>
<ul style="margin-top: 0cm;" type="square">
<li class="MsoNormal"><span style="font-family: "arial" , "helvetica" , sans-serif;">The
plot has a "<b>fanning</b>" effect. That is, the residuals are
close to 0 for small <i>x</i> values and are more spread out for
large <i>x</i> values.<o:p></o:p></span></li>
<li class="MsoNormal"><span style="font-family: "arial" , "helvetica" , sans-serif;">The
plot has a "<b>funneling</b>" effect. That is, the residuals are
spread out for small <i>x</i> values and close to 0 for
large <i>x</i> values.<o:p></o:p></span></li>
<li class="MsoNormal"><span style="font-family: "arial" , "helvetica" , sans-serif;">Or,
the spread of the residuals in the residuals vs. fits plot varies in some
complex fashion.<o:p></o:p></span></li>
</ul>
<div>
<div class="MsoNormal">
<span style="font-family: "arial" , "helvetica" , sans-serif;"><b>Residual v/s order
plot- </b>Error terms should be independent of each other. There should not be autocorrelation/trend
in residuals. If the data are obtained in a time (or space) sequence, a residuals vs. order plot helps to
see if there is any correlation between the error terms that are near each
other in the sequence.<o:p></o:p></span></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgFWPlKLjw8zKqPYVz87W5e9J3juhoAE-qqNtm3mPXc1MDqdxPeqzPQ7W2v4eVvnnaAwqg5x-YhyphenhyphencE5oXe0LlzZnyvO2T-zEcGeGm7Rwei6yCCpR52J9N0VVb_pJsZH09c21pfJVmx6KPc/s1600/orderplot.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><span style="font-family: "arial" , "helvetica" , sans-serif;"><img border="0" data-original-height="233" data-original-width="323" height="230" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgFWPlKLjw8zKqPYVz87W5e9J3juhoAE-qqNtm3mPXc1MDqdxPeqzPQ7W2v4eVvnnaAwqg5x-YhyphenhyphencE5oXe0LlzZnyvO2T-zEcGeGm7Rwei6yCCpR52J9N0VVb_pJsZH09c21pfJVmx6KPc/s320/orderplot.png" width="320" /></span></a></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhjabvC274amHMJ88V5CliScpVe-FjPOiNUNRYpu_JlGFO88LbJ81lqBxHHni0-avmSfi_P7LSFMDPemN1B3hh-qXO55Sb7xQuCt0wiJCyhw0YzV4bNIBZW16pVoU3VrHujxJANoDl6KVE/s1600/order_2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><span style="font-family: "arial" , "helvetica" , sans-serif;"><img border="0" data-original-height="223" data-original-width="341" height="209" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhjabvC274amHMJ88V5CliScpVe-FjPOiNUNRYpu_JlGFO88LbJ81lqBxHHni0-avmSfi_P7LSFMDPemN1B3hh-qXO55Sb7xQuCt0wiJCyhw0YzV4bNIBZW16pVoU3VrHujxJANoDl6KVE/s320/order_2.png" width="320" /></span></a></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-size: 11pt; line-height: 115%;"><span style="font-family: "arial" , "helvetica" , sans-serif;">If
we draw sequence of residuals in first residuals /order plot , we get next plot
which shows negative auto-correlation. If we have such correlation it’s time to
move to time series from regression.</span></span></div>
<div class="separator" style="clear: both; text-align: center;">
<span style="font-size: 11pt; line-height: 115%;"><span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span></span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-size: 11pt; line-height: 115%;"><span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="font-size: 11pt; line-height: 115%;">We
can simply draw distribution of residuals. A normal distribution has a
bell-shaped density curve described by its mean and standard
deviation . The density curve is symmetrical, centered about its mean,
with its spread determined by its standard deviation.</span><span style="font-size: 11.5pt; line-height: 115%;"> </span><span style="font-size: 11pt; line-height: 115%;">if
the residuals follow a normal distribution with mean <i>µ</i> and
variance <i>σ</i><sup>2</sup>, then a plot of the theoretical percentiles of the normal
distribution versus the observed sample percentiles should be approximately linear.</span></span></span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-size: 11pt; line-height: 115%;"><span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="font-size: 11pt; line-height: 115%;"><br /></span></span></span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-size: 11pt; line-height: 115%;"><span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="font-size: 11pt; line-height: 115%;"><br /></span></span></span></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhK1qMaQCt3t-l4JyjsNWg2DOLzaAG18CB9hdStJK1TXJTYdsd8IPbi9Bfyxv7Ko9te32okTP5YsBrV4jxbGoLYovl2IgLIHGyPKqEcv96XvPXq3saNaq_dVQ9PYJGXRf-h4X1ovSdeA5o/s1600/1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="337" data-original-width="365" height="295" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhK1qMaQCt3t-l4JyjsNWg2DOLzaAG18CB9hdStJK1TXJTYdsd8IPbi9Bfyxv7Ko9te32okTP5YsBrV4jxbGoLYovl2IgLIHGyPKqEcv96XvPXq3saNaq_dVQ9PYJGXRf-h4X1ovSdeA5o/s320/1.png" width="320" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgvAgngNppWTTX1tIfGy1pwqsaKHXUfX3DRPJk_5xBhIXk40Lj2r2DuEZ2uy0JHJORJ98UD-UiW2p_65e4t841Qfp2FotFytxZR0a7JCuGUYHsjU-T4m6h6IJMWnE4dvfWUDz19VEOohwE/s1600/2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="340" data-original-width="352" height="309" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgvAgngNppWTTX1tIfGy1pwqsaKHXUfX3DRPJk_5xBhIXk40Lj2r2DuEZ2uy0JHJORJ98UD-UiW2p_65e4t841Qfp2FotFytxZR0a7JCuGUYHsjU-T4m6h6IJMWnE4dvfWUDz19VEOohwE/s320/2.png" width="320" /></a></div>
<div class="MsoNormal">
<b>Handling Outliers and
Multi-collinearity-<o:p></o:p></b></div>
<div class="MsoNormal">
<b><span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span></b></div>
<div class="MsoNormal">
<span style="font-family: "arial" , "helvetica" , sans-serif;">Outliers or influential data points can be identified by
Cook’s distance or by Difference in fits. Both use same idea of identifying
influential points.<span style="font-size: 11.5pt; line-height: 115%;">
They fit the regression without i’th observation and see the change in y
values. Higher the change more influential is I th variable. <o:p></o:p></span></span></div>
<div class="MsoNormal">
<span style="font-size: 11.5pt; line-height: 115%;"><span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span></span></div>
<div class="MsoNormal">
<span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="font-size: 11.5pt; line-height: 115%;">More
information -</span> <span style="font-size: 11.5pt; line-height: 115%;"><a href="https://onlinecourses.science.psu.edu/stat501/node/340">https://onlinecourses.science.psu.edu/stat501/node/340</a><o:p></o:p></span></span></div>
<div class="MsoNormal">
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span></div>
<div class="separator" style="clear: both;">
<span style="font-size: 11pt; line-height: 115%;"><span style="font-family: "arial" , "helvetica" , sans-serif;">
</span></span></div>
<div class="MsoNormal">
<span style="font-size: 11.5pt; line-height: 115%;"><span style="font-family: "arial" , "helvetica" , sans-serif;"><b>VIF
</b>is used to identify multi-collinearity in regression. Lets say y= ax1 +bx2 +c
is the regression equation. There we need to check if x1 and x2 are co-related.
We can build a regression like x1=px2+q, the value of r square in this
regression equation is strength of correlation. VIF is nothing but 1/(1-R2), of
x1=px2+q equation) , clearly if R2 is more than .8/VIR >5, we say good
correlation. </span></span><o:p></o:p></div>
<div class="MsoNormal">
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span></div>
</div>
<div class="MsoNormal">
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span></div>
<div class="MsoNormal">
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span></div>
<div class="MsoNormal">
<br /></div>
</div>
Unknownnoreply@blogger.com0Mumbai, Maharashtra, India19.0759837 72.87765590000003618.5957917 72.232208900000032 19.556175699999997 73.52310290000004tag:blogger.com,1999:blog-8230692877620938204.post-14885918512202553932017-09-13T07:58:00.002-07:002017-09-13T08:07:05.741-07:00Neural Network: Forward Propagation in Python<div dir="ltr" style="text-align: left;" trbidi="on">
<div class="MsoNormal">
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh7phNoAGhNp9BRoVzaqp2n0FNercVsFgaaaCEfOBIQElyyEaW6-B0BBcxZoxTIC14qRiQ7XOR0OjsUW7TvYa9t8W1EscDMRTnGpIzpBL3Lod0UXuYOYkli3C7SBd5upCTnTh5rB3HCiEM/s1600/nn.PNG" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="219" data-original-width="572" height="152" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh7phNoAGhNp9BRoVzaqp2n0FNercVsFgaaaCEfOBIQElyyEaW6-B0BBcxZoxTIC14qRiQ7XOR0OjsUW7TvYa9t8W1EscDMRTnGpIzpBL3Lod0UXuYOYkli3C7SBd5upCTnTh5rB3HCiEM/s400/nn.PNG" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Structure of Neural Network</td></tr>
</tbody></table>
<br />
<a href="https://www.blogger.com/blogger.g?blogID=8230692877620938204" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"></a><br />
<div class="MsoNormal" style="margin-left: 18.0pt;">
<b><u>Here is your first forward propagation Algorithm in python-</u><o:p></o:p></b></div>
<div class="MsoListParagraphCxSpFirst">
<br /></div>
<div class="MsoListParagraphCxSpMiddle">
import numpy as np<o:p></o:p></div>
<div class="MsoListParagraphCxSpMiddle">
inputs= np.array([1,2])<o:p></o:p></div>
<div class="MsoListParagraphCxSpMiddle">
weight0 =np.array([1,-1])<o:p></o:p></div>
<div class="MsoListParagraphCxSpMiddle">
weight1 = np.array([1,2])<o:p></o:p></div>
<div class="MsoListParagraphCxSpMiddle">
weight2= np.array([2,-1])<o:p></o:p></div>
<div class="MsoListParagraphCxSpMiddle">
hiddenValue1 = (inputs*weight0).sum()<o:p></o:p></div>
<div class="MsoListParagraphCxSpMiddle">
hiddenValue2 = (inputs*weight1).sum()<o:p></o:p></div>
<div class="MsoListParagraphCxSpMiddle">
hiddenlayer_val
=np.array([hiddenValue1,hiddenValue2])<o:p></o:p></div>
<div class="MsoListParagraphCxSpMiddle">
output_val=(hiddenlayer_val*weight2).sum()<o:p></o:p></div>
<div class="MsoListParagraphCxSpLast">
print(output_val)<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoListParagraph" style="mso-list: l0 level1 lfo1; text-indent: -18.0pt;">
<!--[if !supportLists]-->1)<span style="font-size: 7pt; font-stretch: normal; font-variant-numeric: normal; line-height: normal;">
</span><!--[endif]--><b>Use of
Activation functions</b>- to include non-linearity. An "activation
function" is a function applied at each node. It converts the node's input
into some output. Ex-<o:p></o:p></div>
<br />
<div class="MsoNormal">
<b>ReLu ( Rectified
Linear Activation)<o:p></o:p></b></div>
<div class="MsoNormal">
<b><br /></b></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgaul11TECc-Fr3ljj0D2fgZ0AvpiMg0WH3CcFZ1gP-VZNp5iu1ybpLXss-99AGky54EWIFtO58bDTONS_nWJd321xvTv2ljsi3sK5O6FQphEzHra8oMYllTKsJorFMmhWD9-UaGAavKIk/s1600/relu.PNG" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="260" data-original-width="488" height="212" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgaul11TECc-Fr3ljj0D2fgZ0AvpiMg0WH3CcFZ1gP-VZNp5iu1ybpLXss-99AGky54EWIFtO58bDTONS_nWJd321xvTv2ljsi3sK5O6FQphEzHra8oMYllTKsJorFMmhWD9-UaGAavKIk/s400/relu.PNG" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">ReLu Activation Function</td></tr>
</tbody></table>
<div class="MsoNormal">
<b><u>Here is your first
forward propagation Algorithm with Activation function in python-</u></b></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<b><o:p></o:p></b>import numpy as np<o:p></o:p></div>
<div class="MsoListParagraphCxSpMiddle">
def relu(input) :<o:p></o:p></div>
<div class="MsoListParagraphCxSpMiddle" style="text-indent: 36.0pt;">
op=max(input,0)<o:p></o:p></div>
<div class="MsoListParagraphCxSpMiddle" style="text-indent: 36.0pt;">
return(op)<o:p></o:p></div>
<div class="MsoListParagraphCxSpMiddle" style="text-indent: 36.0pt;">
<a href="https://www.blogger.com/blogger.g?blogID=8230692877620938204" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"></a><br /></div>
<div class="MsoListParagraphCxSpMiddle">
inputs= np.array([1,2])<o:p></o:p></div>
<div class="MsoListParagraphCxSpMiddle">
weight0 =np.array([1,-1])<o:p></o:p></div>
<div class="MsoListParagraphCxSpMiddle">
weight1 = np.array([1,2])<o:p></o:p></div>
<div class="MsoListParagraphCxSpMiddle">
weight2= np.array([2,-1])<o:p></o:p></div>
<div class="MsoListParagraphCxSpMiddle">
node0_input <- (input*weight0).sum()<o:p></o:p></div>
<div class="MsoListParagraphCxSpMiddle">
node1_input <- (input* weight1).sum()<o:p></o:p></div>
<div class="MsoListParagraphCxSpMiddle">
node0_output <-relu(node0_input)<o:p></o:p></div>
<div class="MsoListParagraphCxSpMiddle">
node1_output <- relu(node1_input)<o:p></o:p></div>
<div class="MsoListParagraphCxSpMiddle">
hidden_layer_output= np.array(node0_output,
node1_output)<o:p></o:p></div>
<div class="MsoListParagraphCxSpMiddle">
output <- (hidden_layer_output*weights2).sum()<o:p></o:p></div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="MsoListParagraphCxSpLast">
print(output)<o:p></o:p><br />
<br />
<br />
<a href="https://www.blogger.com/blogger.g?blogID=8230692877620938204" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"></a><a href="https://machinelearningstories.blogspot.in/2017/05/before-abc-of-deep-learning.html">ABC of deep Learning</a></div>
<div class="MsoNormal">
<b><br /></b></div>
</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
</div>
Unknownnoreply@blogger.com0Bengaluru, Karnataka, India12.9715987 77.59456269999998312.4764182 76.949115699999979 13.4667792 78.240009699999987tag:blogger.com,1999:blog-8230692877620938204.post-79587076533395906132017-09-10T00:01:00.000-07:002018-08-03T11:03:58.439-07:00Hierarchical Clustering (Bottom-Up Clustering) & Performance Parameters<div dir="ltr" style="text-align: left;" trbidi="on">
<br />
<div class="MsoNormal">
K-means requires to set numbers of cluster beforehand. When
we want to know more where clusters have merged and not want to specify numbers
of clusters beforehand, we use hierarchical clustering (Agglomerative clustering) . <o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
</div>
It starts with calculating distance between every pair of
objects. Then merge the nearest pair of objects in a cluster. Thus new clusters
are created. Again distance between all new clusters are calculated. Again they
are merged based on minimal distance between the objects. This process goes on until
a single cluster is obtained.<br />
<o:p></o:p><br />
<div class="MsoNormal">
<br /></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi1WeYuo6Xbi0V0-KVsfau970EOaxmheDyUn82tcMIT2GW-c_4wU3KMCtvwAjh8ZPM1aPjzEFrXTNjAezndAXrStnOBI_7Lkx4D4rMy_GkRZcC8ZRQCDRkb1tltSt-q-C6kisSg0kF-ACs/s1600/hcust.PNG" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="368" data-original-width="732" height="200" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi1WeYuo6Xbi0V0-KVsfau970EOaxmheDyUn82tcMIT2GW-c_4wU3KMCtvwAjh8ZPM1aPjzEFrXTNjAezndAXrStnOBI_7Lkx4D4rMy_GkRZcC8ZRQCDRkb1tltSt-q-C6kisSg0kF-ACs/s400/hcust.PNG" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Bottom-up approach of Hierarchical Clustering</td></tr>
</tbody></table>
<div>
<br /></div>
<div>
<br /></div>
<div>
This is different from K-means in following ways-</div>
<div class="MsoNormal">
</div>
<ol style="text-align: left;">
<li>Computationally expensive as distance between every point is
calculated initially.</li>
<li>It might give locally optimised clusters unlike k means that
will give globally optimised clusters.</li>
<li>Resulting clusters in hierarchical clustering may differ
based on linkage-single linkage or complete linkage. Single linkage can also
suffer with chaining of clusters.</li>
<li>When there are many objects, reading a tree becomes
difficult.</li>
</ol>
<div>
Like k-means, hierarchical clustering can not handle categorical variables, though one hot encoding can help to some extent but as the variables increase, clustering fails.In those scenarios we can use K-mode clustering. </div>
<o:p></o:p><br />
<div class="MsoNormal">
<o:p></o:p></div>
<div class="MsoNormal">
<o:p></o:p></div>
<div class="MsoNormal">
<o:p></o:p></div>
<div class="MsoNormal">
<b><br /></b></div>
<div class="MsoNormal">
<b>Performance parameters of clustering-</b><o:p></o:p></div>
<div class="MsoNormal">
<b><br /></b></div>
<div class="MsoNormal">
<b>Dunn’s Index-</b> The Dunn Index assesses the goodness of a
clustering, by measuring the maximal diameter of clusters and relating it to
the minimal distance between clusters. It can be calculated by- </div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
minimum
distance between centre of cluster/ maximum diameter of a cluster.</div>
<div class="MsoNormal">
<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<b>Jaccard Index-</b> {A intersection B}/ {A union B} , lower values
are desirable.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgx0O9Dvso8Dwam3JKE3FtStsJfOJD2NiQAupaAIWRTy_ui7UtknUSXE81vAfdPGfwy8qIEVAuobOVHnmvi3mCdy1j__u8fVkh4dfHeTHljDOKI3tuCPufUPoECiSNKHtZRNWGvQ1WyYQM/s1600/jaccard.PNG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="211" data-original-width="273" height="154" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgx0O9Dvso8Dwam3JKE3FtStsJfOJD2NiQAupaAIWRTy_ui7UtknUSXE81vAfdPGfwy8qIEVAuobOVHnmvi3mCdy1j__u8fVkh4dfHeTHljDOKI3tuCPufUPoECiSNKHtZRNWGvQ1WyYQM/s200/jaccard.PNG" width="200" /></a></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<b>Silhouette Distance-</b> it's calculated for a point in clusters, it depends on 2
parameters- 1) a(i) ; average distance
of i’th point to other points in it’s cluster. 2) b(i); distance of i’th point
to nearest cluster. a(i) can be said as compactness and b(i) can be said as separation.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
{b(i)- a(i)}/max{b(i),a(i)}<o:p></o:p></div>
<div class="MsoNormal">
It lies between 1 and -1. Higher value is desirable.0 indicated that point.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
BIC values can also be seen as clustering fit. These indexes are relative and can be used to compare clustering algorithms but there is no cut off that would say that now clustering is perfect, though accuracy of clustering can be validated if we have some pre-defined classes.<br />
<br />
Clustering are highly related to identify abnormalities- See how these are related-<br />
<br />
<a href="http://machinelearningstories.blogspot.com/2018/07/anomaly-detection-anomaly-detection-by.html" target="_blank">Anomaly detection techniques</a></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
other article that you may like-</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<a href="https://machinelearningstories.blogspot.in/search?updated-max=2016-11-02T05:49:00-07:00&max-results=8">All text classification Algo</a></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<a href="https://machinelearningstories.blogspot.in/2016/11/recommendation-engine-market-basket.html">Recommendation Engine - Market Basket Analysis</a></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<br />
<div class="MsoNormal">
<br /></div>
</div>
Unknownnoreply@blogger.com0Bengaluru, Karnataka, India12.9715987 77.59456269999998312.4764182 76.949115699999979 13.4667792 78.240009699999987tag:blogger.com,1999:blog-8230692877620938204.post-44287959951887197042017-05-25T06:34:00.001-07:002017-05-25T06:46:42.882-07:00Residual Plot in Regression, ACF, PACF in ARIMA<div dir="ltr" style="text-align: left;" trbidi="on">
<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjp3_5r-lWwOLn8CoWj0eUVzHgxNyURBri74MgcFu5iAi0TwP3UZaINVcGXm1yDc2STtnd-JOAVrK6WhwQykoTi9vstuV-pJORRSs2lWMu2o4yv-4iw2smuFN3a197pTEA6R3wKfH3EUpQ/s1600/Constant+mean.PNG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="283" data-original-width="424" height="213" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjp3_5r-lWwOLn8CoWj0eUVzHgxNyURBri74MgcFu5iAi0TwP3UZaINVcGXm1yDc2STtnd-JOAVrK6WhwQykoTi9vstuV-pJORRSs2lWMu2o4yv-4iw2smuFN3a197pTEA6R3wKfH3EUpQ/s320/Constant+mean.PNG" width="320" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<span style="font-family: "arial" , "helvetica" , sans-serif;"><b>What is Constant Mean for a time series</b>- If we draw mean value line on time series, half of the points are above live and half of the points are below line, so we can say that mean is constant over the period of time for this series.</span><br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg3niRcdi-fiVCqkF4dA7r5e_87i0-o9rIRSQXF2-e9nxr4duSB_qK5m2wqADi9ehFARYBLoSW62gPfsSNib1P5NkYr1SRLda2xpO0AU_ksCWYZPLTHuXyuYH8rptlxEACbQZD89xFO_Kc/s1600/residual+plot.PNG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="142" data-original-width="583" height="95" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg3niRcdi-fiVCqkF4dA7r5e_87i0-o9rIRSQXF2-e9nxr4duSB_qK5m2wqADi9ehFARYBLoSW62gPfsSNib1P5NkYr1SRLda2xpO0AU_ksCWYZPLTHuXyuYH8rptlxEACbQZD89xFO_Kc/s400/residual+plot.PNG" width="400" /></a></div>
<br />
<br />
<div style="text-align: left;">
<b><span style="font-family: "arial" , "helvetica" , sans-serif;">What is the important of residual plot in modelling linear relationship</span>- </b><span style="background-color: white;"><span style="font-family: "arial" , "helvetica" , sans-serif;">The first plot shows a random pattern, indicating a good fit for a linear model. The other plot patterns are non-random (U-shaped and inverted U), suggesting a better fit for a non-linear model.</span></span></div>
<div style="text-align: left;">
<span style="background-color: white;"><span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span></span></div>
<div style="text-align: left;">
<span style="background-color: white;"><span style="font-family: "arial" , "helvetica" , sans-serif;"><b>What is constant variance of a time-series- </b></span></span></div>
<div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0cm; text-indent: -18.0pt;">
</div>
<ol style="text-align: left;">
<li> The mean E(xt) is the same for all t.</li>
<li> The variance
of xt is the same for all t.</li>
</ol>
<br />
<div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0cm; text-indent: -18.0pt;">
<span style="background: white; font-family: "arial" , sans-serif; font-size: 13.5pt;"> </span><span style="background: white; font-family: "arial" , sans-serif;"> In other words, Mean of series x_t and
x_t-h is same.</span><span style="font-family: "times new roman" , serif;"><o:p></o:p></span></div>
<div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0cm; text-indent: -18.0pt;">
<span style="background: white; font-family: "arial" , sans-serif;"> Standard
deviation of series x_t is same as standard deviation of series x_t-h.</span><span style="font-family: "times new roman" , serif;"><o:p></o:p></span></div>
<div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin: 0cm; mso-list: l1 level1 lfo1; tab-stops: list 36.0pt; text-indent: -18.0pt;">
</div>
<div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0cm;">
<span style="background: white; font-family: "arial" , sans-serif;">An interesting property of a stationary series is
that theoretically it has the same structure forwards as it does backwards.</span><span style="font-family: "times new roman" , serif; font-size: 12pt;"><o:p></o:p></span></div>
<div style="text-align: left;">
<br /></div>
<b><span style="font-family: "arial" , "helvetica" , sans-serif;">what is ACF for a time series</span>- </b> <span style="font-family: "arial" , "helvetica" , sans-serif;">The correlation between xt to all xt-h is ACF. If there is .3 correlation between xt anf xt-1 then by simple multiplicative rule, there will be .3*.3 correlation between xt and xt-2 and .3*.3*.3 between xt and xt-3.</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhOArXymInbYYcK1YXMh21jWquOSmd2hQHJDxEllfIFzeY-YV7vF43hOiQsp0zMJPLfTEsN6fH3ln3JyRBolPntx3DOqA8VSEZEyYO-eohH99zIYEW59M8GsKmDWpOzvb4DnKh-kmNO-Cg/s1600/ACF.PNG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="286" data-original-width="425" height="268" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhOArXymInbYYcK1YXMh21jWquOSmd2hQHJDxEllfIFzeY-YV7vF43hOiQsp0zMJPLfTEsN6fH3ln3JyRBolPntx3DOqA8VSEZEyYO-eohH99zIYEW59M8GsKmDWpOzvb4DnKh-kmNO-Cg/s400/ACF.PNG" width="400" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
in the above image correlation with lag 1 is .6, with lag 2 is .36 and so on. Based on above graph we can say that it is AR(1) process where y=.6*yt-1 +constant +error.</div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<b><br /></b></div>
<span style="font-family: "arial" , "helvetica" , sans-serif;"><b>What is Moving Average and how is it related to PACF - </b> MA is time series of past errors(multiplied by some constant). PACF is more difficult to understand. It is a conditional correlation between variables( series).It is correlation between 2 variables conditioning that the correlation is coming from some other variables.</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;">If we have regression y= x1 and x2 so PACF between y and x2 will be-</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;">covariance(y,x2/x1)/sd(y/x1)*sd(x2/x1)</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;">For time series, PACF between yt and yt-2 is given by-</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;">covariance (yt, yt-2/yt-1)/sd(yt,yt-1)* sd(yt-2.yt-1)</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;">ACF is used to identify order of MA and PACF is used to identify order of AR terms in stationary time series.</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;">know more about the relation between time-series and regression-</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><a href="http://machinelearningstories.blogspot.in/2016/08/time-series-and-fitting-regression-on.html">Regression and time series</a></span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<b><br /></b></div>
Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-8230692877620938204.post-58167634401445697772017-05-23T07:44:00.001-07:002018-08-03T11:10:30.443-07:00Before ABC of Deep Learning.<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="text-align: left;">
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgcXjlDC9HjS3Eskg6il7fFdoPcToLC7kLMnakh16O0M_ZAH2tFv6WlPSjxbXY5pCkPPqwTYhr86fIO1Mx5wq-gomuoS4JBfsNtAr4hfjUEpG92RgXab_vLymS1k4_Voz5EcfIuH4ei_6w/s1600/intro.PNG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="110" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgcXjlDC9HjS3Eskg6il7fFdoPcToLC7kLMnakh16O0M_ZAH2tFv6WlPSjxbXY5pCkPPqwTYhr86fIO1Mx5wq-gomuoS4JBfsNtAr4hfjUEpG92RgXab_vLymS1k4_Voz5EcfIuH4ei_6w/s400/intro.PNG" width="400" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="MsoNormal">
<span style="background: white; font-family: "arial" , sans-serif; font-size: 13.5pt; line-height: 115%;">Mark Cuban has rightly said above lines for deep learning.First of all, when was it
coined first time? Deep Learning is used by Google in its voice and image
recognition algorithms, Netflix and Amazon say that they have used in their
recommendation engine, researchers at MIT say they are relying more on deep
learning now.</span></div>
<span style="background-color: white; font-family: "georgia" , "times new roman" , "times" , serif;"><br /></span>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjuJitz-apCqRq7gczfmQHpfdpB6nJd4vfXjfYijNgzK2Q1DNBY9HeCVAB4bqRWGZoWuSz0fv8fAv0wYFuIfBabv7_9zmD40IPzS9z-BQKBj_d_u6PtXduBhLazpZTzcItZBj5xOgTynw0/s1600/Capture.PNG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="196" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjuJitz-apCqRq7gczfmQHpfdpB6nJd4vfXjfYijNgzK2Q1DNBY9HeCVAB4bqRWGZoWuSz0fv8fAv0wYFuIfBabv7_9zmD40IPzS9z-BQKBj_d_u6PtXduBhLazpZTzcItZBj5xOgTynw0/s320/Capture.PNG" width="320" /></a></div>
<span style="background-color: white; font-family: "georgia" , "times new roman" , "times" , serif;"><br /></span></div>
<div style="text-align: left;">
<span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="background-color: white; font-size: 13.5pt;">According to Jack Rae; Google
DeepMind Research Engineer, Deep learning refers to artificial neural
networks that are composed of many layers. Essentially Deep Learning
involves feeding a computer system a lot of data, which it can use to make
decisions about other data. This data is fed through neural networks, as is the
case in machine learning. These networks – logical constructions which ask a
series of binary true/false questions, or extract a numerical value, of every
bit of data which pass through them, and classify it according to the answers received.
If we define in language of data scientists, It uses a cascade of many
layers of </span><span style="background-color: white; font-size: 13.5pt;">nonlinear processing</span><span style="background-color: white; font-size: 13.5pt;"> units
for </span><span style="background-color: white; font-size: 13.5pt;">feature extraction</span><span style="background-color: white; font-size: 13.5pt;"> and
transformation. It is based on the (unsupervised) learning of multiple
levels of features or representations of the data and learns multiple
levels of representations that correspond to different levels of abstraction. </span></span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="background-color: white; font-size: 13.5pt;"><br /></span></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="background-color: white; font-size: 13.5pt;">According to Jack Rae; Google
DeepMind Research Engineer, Deep learning refers to artificial neural
networks that are composed of many layers. Essentially Deep Learning
involves feeding a computer system a lot of data, which it can use to make
decisions about other data. This data is fed through neural networks, as is the
case in machine learning. These networks – logical constructions which ask a
series of binary true/false questions, or extract a numerical value, of every
bit of data which pass through them, and classify it according to the answers received.
If we define in language of data scientists, It uses a cascade of many
layers of </span><span style="background-color: white; font-size: 13.5pt;">nonlinear processing</span><span style="background-color: white; font-size: 13.5pt;"> units
for </span><span style="background-color: white; font-size: 13.5pt;">feature extraction</span><span style="background-color: white; font-size: 13.5pt;"> and
transformation. It is based on the (unsupervised) learning of multiple
levels of features or representations of the data and learns multiple
levels of representations that correspond to different levels of abstraction.</span></span><span style="background-color: white; font-family: "arial" , sans-serif; font-size: 13.5pt;"> </span><br />
<span style="background-color: white; font-family: "arial" , sans-serif; font-size: 13.5pt;"><br /></span>
<br />
<div class="MsoNormal">
<span style="background: white; font-family: "arial" , sans-serif; font-size: 13.5pt; line-height: 115%;"><o:p></o:p></span></div>
</div>
<div style="text-align: left;">
<div style="margin-bottom: .0001pt; margin: 0cm;">
<span style="background: white; color: #222222; font-family: "arial" , "sans-serif"; font-size: 13.5pt;">Deep learning
methods are often looked at as a black box, with most confirmations done
empirically, rather than theoretically.</span><span style="font-size: 13.5pt;"><o:p></o:p></span></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<br /></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<span style="background: white; font-family: "arial" , sans-serif; font-size: 13.5pt;">Deep learning is
used across all industries for a number of different tasks. Commercial apps
that use image recognition, </span><span style="font-family: "arial" , sans-serif; font-size: 13.5pt;">open source<span style="background: white;"> platforms with consumer recommendation apps, navigation
of self-driving cars, re-colouring black and white images, automated analysis
and reporting and medical research tools that explore the possibility of
reusing drugs for new ailments are the examples of deep learning.</span></span><span style="font-size: 13.5pt;"><o:p></o:p></span></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<br /></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<span style="background: white; font-family: "arial" , sans-serif; font-size: 13.5pt;">There are many
available software libraries for deep learning-</span><span style="font-size: 13.5pt;"><o:p></o:p></span></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<br /></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<b><span style="font-family: "arial" , sans-serif; font-size: 13.5pt;">Deeplearning4j</span></b><span style="background: white; color: #222222; font-family: "arial" , "sans-serif"; font-size: 13.5pt;"> - Written in Java and algorithms can be
integrated with Hadoop, Spark. It was </span><span style="background: white; font-family: "arial" , sans-serif; font-size: 13.5pt;">developed
mainly by a </span><span style="font-family: "arial" , sans-serif; font-size: 13.5pt;">machine learning<span style="background: white;"> group
in </span>San Francisco<span style="background: white;"> led by Adam
Gibson.</span></span><span style="font-size: 13.5pt;"><o:p></o:p></span></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<br /></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<b><span style="font-family: "arial" , sans-serif; font-size: 13.5pt;">Torch</span></b><span style="font-family: "arial" , sans-serif; font-size: 13.5pt;"> -
Open Source and written in Lua language. It is presently used by Facebook, IBM
and Yandex.</span><span style="font-size: 13.5pt;"><o:p></o:p></span></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<br /></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<b><span style="font-family: "arial" , sans-serif; font-size: 13.5pt;">Theano</span></b><span style="font-family: "arial" , sans-serif; font-size: 13.5pt;">- Open
source library for python and developed by Université de Montréal.</span><span style="font-size: 13.5pt;"><o:p></o:p></span></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<br /></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<b><span style="font-family: "arial" , sans-serif; font-size: 13.5pt;">TensorFlow</span></b><span style="font-family: "arial" , sans-serif; font-size: 13.5pt;"> -
Developed by Google Brain team and used by google in their products. It is Google's
second generation machine learning system.</span><span style="font-size: 13.5pt;"><o:p></o:p></span></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<br /></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<b><span style="font-family: "arial" , sans-serif; font-size: 13.5pt;">PaddlePaddle</span></b><span class="apple-converted-space"><span style="font-family: "arial" , sans-serif; font-size: 13.5pt;"> </span></span><span style="font-family: "arial" , sans-serif; font-size: 13.5pt;">- Baidu's deep learning platform.</span><span style="font-size: 13.5pt;"><o:p></o:p></span></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<br /></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<b><span style="font-family: "arial" , sans-serif; font-size: 13.5pt;">Keras</span></b><span style="font-family: "arial" , sans-serif; font-size: 13.5pt;">- It </span><span style="background: white; color: #222222; font-family: "arial" , "sans-serif"; font-size: 13.5pt;">is an </span><span style="font-family: "arial" , sans-serif; font-size: 13.5pt;">open source</span><span style="background: white; color: #222222; font-family: "arial" , "sans-serif"; font-size: 13.5pt;"> </span><span style="font-family: "arial" , sans-serif; font-size: 13.5pt;">neural
network</span><span style="background: white; color: #222222; font-family: "arial" , "sans-serif"; font-size: 13.5pt;"> library written in </span><span style="font-family: "arial" , sans-serif; font-size: 13.5pt;">Python</span><span style="background: white; color: #222222; font-family: "arial" , "sans-serif"; font-size: 13.5pt;">. It is capable of running on top of </span><span style="font-family: "arial" , sans-serif; font-size: 13.5pt;">Deeplearning4j</span><span style="background: white; color: #222222; font-family: "arial" , "sans-serif"; font-size: 13.5pt;">, </span><span style="font-family: "arial" , sans-serif; font-size: 13.5pt;">Tensorflow</span><span style="background: white; color: #222222; font-family: "arial" , "sans-serif"; font-size: 13.5pt;"> or </span><span style="font-family: "arial" , sans-serif; font-size: 13.5pt;">Theano.</span><span style="font-size: 13.5pt;"><o:p></o:p></span></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<br /></div>
<b><span style="font-family: "arial" , sans-serif; font-size: 13.5pt; line-height: 115%;">CNTK</span></b><span class="apple-converted-space"><span style="font-family: "arial" , sans-serif; font-size: 13.5pt; line-height: 115%;"> </span></span><span style="font-family: "arial" , sans-serif; font-size: 13.5pt; line-height: 115%;">- It is deep learning
framework developed by Microsoft Research. Also know as Microsoft's
Cognitive tool-kit.</span><br />
<span style="font-family: "arial" , sans-serif; font-size: 13.5pt; line-height: 115%;"><br /></span>
<span style="font-family: "arial" , sans-serif; font-size: 13.5pt; line-height: 115%;"><br /></span>
<span style="font-family: "arial" , sans-serif; font-size: 13.5pt; line-height: 115%;">Another article on advance- machine learning by Russian Andrey Markov-</span><br />
<span style="font-family: "arial" , sans-serif; font-size: 13.5pt; line-height: 115%;"><br /></span>
<span style="font-family: "arial" , sans-serif; font-size: 18px; line-height: 115%;"><a href="http://machinelearningstories.blogspot.in/2017/02/hidden-markov-model-session-1.html">http://machinelearningstories.blogspot.in/2017/02/hidden-markov-model-session-1.html</a></span><br />
<br />
start you first feed-forward newral network from <a href="http://machinelearningstories.blogspot.com/2017/09/assumptions-of-linear-regression.html" target="_blank">here</a>-<br />
<br />
<br />
<br />
<br /></div>
</div>
Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-8230692877620938204.post-61740590730225363292017-05-15T02:58:00.000-07:002017-05-20T02:38:54.160-07:00Facebook's Prophet Model for forecasting<div dir="ltr" style="text-align: left;" trbidi="on">
<span style="font-family: "times" , "times new roman" , serif;">Forecasting is central to data science activities.Facebook's open source forecasting tool '<b>PROPHET</b>' is available in R and Python. This has been very useful in Web-sites' page view forecasting, road traffic forecasting and in the areas where there is multiple level of seasonality. Prophet is useful in below scenarios-</span><br />
<span style="font-family: "times" , "times new roman" , serif;"><br /></span>
<br />
<ul style="text-align: left;">
<li><span style="font-family: "times" , "times new roman" , serif;">Seasonal data ( Hourly, daily,weekly, monthly).</span></li>
<li><span style="font-family: "times" , "times new roman" , serif;">Data having outliers.</span></li>
<li><span style="font-family: "times" , "times new roman" , serif;">Data with holidays' information.</span></li>
<li><span style="font-family: "times" , "times new roman" , serif;">Having multiple trend change points.</span></li>
</ul>
<div>
<span style="font-family: "times" , "times new roman" , serif;"><br /></span></div>
<div>
<span style="font-family: "times" , "times new roman" , serif;">Prophet works on additive regressive model with four component.</span></div>
<br />
<div style="margin-bottom: .0001pt; margin: 0cm;">
</div>
<ol style="text-align: left;">
<li><span style="font-family: "times" , "times new roman" , serif;">A piece-wise linear or logistic growth curve trend. Prophet
automatically detects changes in trends by selecting change points from the
data.</span></li>
<li><span style="font-family: "times" , "times new roman" , serif;">A yearly seasonal component modelled using Fourier series.</span></li>
<li><span style="font-family: "times" , "times new roman" , serif;">A weekly seasonal component using dummy variables.</span></li>
<li><span style="font-family: "times" , "times new roman" , serif;">A user-provided list of important holidays.</span></li>
</ol>
<div>
<span style="font-family: "times" , "times new roman" , serif;">Here is the example of using prophet model in R.</span></div>
<div>
<span style="font-family: "times" , "times new roman" , serif;">we take model_data dataset. It must have ds column which is to have dates and other column 'y' which is the values to forecast ( Uni-variate time series). This is how data look like -</span></div>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhgFtlPZk0ydzFA8F_JUEWTQSVXb2c7OzNa4M8IZmIUG-gs89OHc4Q90cCtDSsrTNCnAVlBs7VUzxxy8CVxXnzWN9Er7bgaYCxUmpTN1qers-RqgnnKntR1Cpu5kbJmjSEwWI_LExuYDaQ/s1600/data.PNG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhgFtlPZk0ydzFA8F_JUEWTQSVXb2c7OzNa4M8IZmIUG-gs89OHc4Q90cCtDSsrTNCnAVlBs7VUzxxy8CVxXnzWN9Er7bgaYCxUmpTN1qers-RqgnnKntR1Cpu5kbJmjSEwWI_LExuYDaQ/s1600/data.PNG" /></a></div>
<div>
<br /></div>
<div>
<span style="font-family: "times" , "times new roman" , serif;"><i><b>running the model - </b></i></span></div>
<div>
<span style="font-family: "times" , "times new roman" , serif;">mod <- prophet::prophet(model_data, weekly.seasonality =F)</span></div>
<div>
<span style="font-family: "times" , "times new roman" , serif;">There is no weekly seasonality so one of the argument is false.</span></div>
<div>
<span style="font-family: "times" , "times new roman" , serif;"><br /></span></div>
<div>
<span style="font-family: "times" , "times new roman" , serif;"><i><b>Preparing op data-set to store result -</b></i></span></div>
<div>
<span style="font-family: "times" , "times new roman" , serif;">future <- make_future_dataframe(mod, periods = 4, freq = 'month')</span></div>
<div>
<span style="font-family: "times" , "times new roman" , serif;">we want to predict for next 4 data points and on monthly basis. This can be day, week etc.</span></div>
<div>
<span style="font-family: "times" , "times new roman" , serif;"><br /></span></div>
<div>
<span style="font-family: "times" , "times new roman" , serif;"><b><i>predicting future values -</i></b></span></div>
<div>
<span style="font-family: "times" , "times new roman" , serif;">forecast <- predict(mod, future)</span></div>
<div>
<span style="font-family: "times" , "times new roman" , serif;"><br /></span></div>
<div>
<span style="font-family: "times" , "times new roman" , serif;"><i><b>Visualising result-</b></i></span></div>
<div>
<span style="font-family: "times" , "times new roman" , serif;">forecast <- predict(mod, future)</span></div>
<div>
<span style="font-family: "times" , "times new roman" , serif;"><br /></span></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEioPbMXO7A4wIjy_rkcdp1Bjbrcv_SJLQPJrJ2qZhkkKOOS69UHdTbkHlxNCEtFbfVNNe91gZiqKiobYWfukK8mYZG86BULrhdy_ESb4HQ-8kVq1J1X_G0rM2Dn5RUJm5HJeDIT53fuexs/s1600/result.PNG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEioPbMXO7A4wIjy_rkcdp1Bjbrcv_SJLQPJrJ2qZhkkKOOS69UHdTbkHlxNCEtFbfVNNe91gZiqKiobYWfukK8mYZG86BULrhdy_ESb4HQ-8kVq1J1X_G0rM2Dn5RUJm5HJeDIT53fuexs/s1600/result.PNG" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div>
<span style="font-family: "times" , "times new roman" , serif;"><br /></span></div>
<div>
<span style="font-family: "times" , "times new roman" , serif;"><i><br /></i></span></div>
<div>
<span style="font-family: "times" , "times new roman" , serif;">Here are the forecasted values for the coming months- Yhat are output values with the range of yhat_lower and yhat_upper.</span></div>
<div>
<span style="font-family: "times" , "times new roman" , serif;"><br /></span></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgaYfRdtWkdzOxDHll0tGxMZGkHI30ol6E65CWyno4SPQq9s56McJGTFn6d6gKKweSbNI76W-CzHgTi8GRfAzlVLaWvQG4vC1ma5u8C1EvShleppfjRTJTJK9HXtQ7_gbmg1OCpHVHJ1q8/s1600/op.PNG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgaYfRdtWkdzOxDHll0tGxMZGkHI30ol6E65CWyno4SPQq9s56McJGTFn6d6gKKweSbNI76W-CzHgTi8GRfAzlVLaWvQG4vC1ma5u8C1EvShleppfjRTJTJK9HXtQ7_gbmg1OCpHVHJ1q8/s1600/op.PNG" /></a></div>
<div>
<span style="font-family: "times" , "times new roman" , serif;"><br /></span></div>
<div>
</div>
<div>
<span style="font-family: "times" , "times new roman" , serif;">Different component of forecasts can also be plotted like this-</span></div>
<div>
<div class="MsoNormal">
<span style="font-family: "times new roman" , serif; line-height: 115%;">prophet_plot_components(m,
forecast) </span></div>
<div class="MsoNormal">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEinyMntE2mIdHjDavn49SPYurJzT6xM-909TFYbfiMzLbBzagxgIBMdEp6ChSJMGvPn6hPDDjfWwfYY408AyXfdcRk37iaYa9mgV_LuoUmsCXK9wo2Jr76E9I3xexruG5-NrbiJTwZpN9I/s1600/Component.PNG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="318" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEinyMntE2mIdHjDavn49SPYurJzT6xM-909TFYbfiMzLbBzagxgIBMdEp6ChSJMGvPn6hPDDjfWwfYY408AyXfdcRk37iaYa9mgV_LuoUmsCXK9wo2Jr76E9I3xexruG5-NrbiJTwZpN9I/s400/Component.PNG" width="400" /></a></div>
<o:p></o:p></div>
<br />
<span style="font-family: "times" , "times new roman" , serif;"><br /></span>
<span style="font-family: "times" , "times new roman" , serif;"><br /><span style="font-family: "Times New Roman";"><i><b>Impact of confidence interval in forecasting-</b></i></span></span><br />
<span style="font-family: "times" , "times new roman" , serif;"><i><br /></i></span>
<span style="font-family: "times" , "times new roman" , serif;">m1 <- prophet::prophet(n_train, weekly.seasonality =F, interval.width = 0.99)</span><br />
<span style="font-family: "times" , "times new roman" , serif;">m2 <- prophet::prophet(n_train, weekly.seasonality =F, interval.width = 0.80)</span><br />
<span style="font-family: "times" , "times new roman" , serif;"><i><br /></i></span>
<span style="font-family: "times" , "times new roman" , serif;">I have run above 2 models to see impact of having different confidence interval in forecasting-</span><br />
<span style="font-family: "times" , "times new roman" , serif;">Here is the result of forecasting using above two-</span><br />
<span style="font-family: "times" , "times new roman" , serif;"><br /></span>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiveSbIMxk7N6mpUhq1cQIBWy3WQcDzmlvLwUP5t8P59ue33CSMOMRvOmgO-v9Hrs8JWOAIdDan-k87eAyVP2v-hNLYWQQRT43HhwDyqH1plctazawgUuuyLNoiyWHe2F5vwTXVNjFS5N4/s1600/confidence+interval.PNG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiveSbIMxk7N6mpUhq1cQIBWy3WQcDzmlvLwUP5t8P59ue33CSMOMRvOmgO-v9Hrs8JWOAIdDan-k87eAyVP2v-hNLYWQQRT43HhwDyqH1plctazawgUuuyLNoiyWHe2F5vwTXVNjFS5N4/s400/confidence+interval.PNG" width="400" /></a></div>
<span style="font-family: "times" , "times new roman" , serif;"><br /><br />mean values are same but the variation increases with increase in confidence interval.</span><br />
<span style="font-family: "times" , "times new roman" , serif;"><br /></span>
This is how, we can use prophet model for time series forecasting!<br />
<br />
Here is another time series technique. Which is more related to pure linear regression and good to know-<br />
<a href="http://machinelearningstories.blogspot.in/2016/08/time-series-and-fitting-regression-on.html" target="_blank">Time-Series and Regression Relation</a><br />
<br />
<br />
MBA is no longer Masters in business analytics. it's something else. ↙😋 If you want to know-<br />
<a href="http://machinelearningstories.blogspot.in/2016/11/recommendation-engine-market-basket.html" target="_blank">Market- Basket Analysis</a><br />
<br />
<br />
<br />
<br />
<br /></div>
Unknownnoreply@blogger.com3Bengaluru, Karnataka, India12.9715987 77.59456269999998312.4764182 76.949115699999979 13.4667792 78.240009699999987tag:blogger.com,1999:blog-8230692877620938204.post-46412550198935967242017-03-11T15:07:00.000-08:002017-03-11T15:07:21.502-08:00Hidden Markov Model: Session -2<div dir="ltr" style="text-align: left;" trbidi="on">
<b>Probabilistic inference in an HMM :-</b><br />
<br />
1) Compute the probability of given observed states when the tag sequence is hidden<br />
2) Find the most likely hidden sequence path<br />
3) Given the observation sequence, find the parameters that will make observations most likely.<br />
<br />
these 3 are known as Evaluation problem, Decoding problem and Learning problem.<br />
<br />
1) Evaluation Problem-Identify all possible hidden state sequences. The sequence that generates the observed states with maximum probability is the right sequence. Lets us explain it conceptually-<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj0h71CYiE-MAAQdbub1K6vpg56JnMtI55COK_PCBmZexIuroZP5rXfHz-wG0iccWWtAeYnSxSm98Zr7Zb4IOF1r3FbjZjJr8lTr69hsJgK5Amh6p9Y_cWYToXcS09JhRaq5US2PJhjc5Q/s1600/hmm_blogger2.PNG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="304" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj0h71CYiE-MAAQdbub1K6vpg56JnMtI55COK_PCBmZexIuroZP5rXfHz-wG0iccWWtAeYnSxSm98Zr7Zb4IOF1r3FbjZjJr8lTr69hsJgK5Amh6p9Y_cWYToXcS09JhRaq5US2PJhjc5Q/s640/hmm_blogger2.PNG" width="640" /></a></div>
<br />
w0, w1, w2.. are hidden states. t=0, t=1, t=2 are different time intervals. Transition probabilities are a11 ( transition from 1 to 1) and emission probabilities are b1( emission from state 1).<br />
<br />
At t=0, say all 2 states w1 and w2 are present with probability .5,.5 ( not taking all hidden states). There would be 4 possible transitions.<br />
a11, a12, a21, a22. So the next state probability would be-<br />
hidden state being 1- .5*a11 +.5*a21<br />
hidden state being 2- .5*a12 + .5*a22<br />
<br />
Thus again these participation probabilities would taken for further state participation calculation. like for time t=2<br />
<br />
hidden state being 1= (5*a11 +.5*a21)* a11 +(.5*a12 + .5*a22) * a21<br />
hidden state being 2 = (5*a11 +.5*a21)* a12 +(.5*a12 + .5*a22) * a22<br />
<br />
Thus using this forward algorithm probabilities of all the hidden state sequence is generated and the sequence with maximum probability of emitting output values is considered the hidden state sequence.<br />
<br />
This is the evaluation problem that says that given hidden state sequence has generated given output sequence with this probability.<br />
<br />
Detailed sessions are present at- <a href="https://www.youtube.com/watch?v=E3qrns5f3Fw">https://www.youtube.com/watch?v=E3qrns5f3Fw</a><br />
Basic Info about HMM are present at- <a href="http://machinelearningstories.blogspot.in/2017/02/hidden-markov-model-session-1.html">http://machinelearningstories.blogspot.in/2017/02/hidden-markov-model-session-1.html</a><br />
<br />
</div>
Unknownnoreply@blogger.com3Bengaluru, Karnataka, India12.9715987 77.59456269999998312.4764182 76.949115699999979 13.4667792 78.240009699999987