Skip to main content

Posts

Showing posts from July, 2017

Detecting outlying patterns in Categorical Variables

It has always been amazing to play with data. Most specifically the continuous data. However, its a pain when I get to analyse the insights of the categorical data. Ofcourse, there are different ways to handle the categorical data such as converting them to binary form, creating dummy columns for them or by factoring them and giving them all a separate new numeric number. I wonder if its only me, but these methods have really never helped me when I get real data with many categorical variables and many categories in them.   Real data in work mostly comes with categorical variables and sometimes even with more than 100 categories.   For example, say We are given we a problem to find which are the most uncommon patterns of the categorical variables.   Below is the data with country and country codes.  AFGHANISTAN AFG ALBANIA ALB ALGERIA DZA AMERICAN SAMOA ASM ANDORRA AND ANGOLA AGO ANGUILLA AIA BAHAMA...