Saturday, 19 September 2015

Introduction to Predictive Analytics



by Bakhshash Kaur Saini,Mekete Berhanie & Nuwan Ramawickrama,

When you hear the word “Predictive Analytics” if you think about your local fortune teller, we have got it wrong, somewhere somehow. Predictive analytics is a process to identify patterns and relationships which may happen in the future by analysing the existing data. It is not fortune telling but it can become a factor that decides the fortune of your business. With the new trends such as IoT (Internet of things), The big data and one to one marketing the market place has changed drastically. The nice markets such as Uber, Airbnb are booming because of the recent burst of the technological growth.  While technology advances every second and the quick changes of the market atmosphere created a huge value for the knowledge about the future. Eventhough we would like to know the future there is no proven way of looking into the future. That is when Predictive Analytics comes into picture with an ability of predicting the trends and patterns of the market using the existing data.

Predictive analytics is heavily based on statistical application and mathematical modelling. The need of predictive modelling and analysis made the job “Data Scientist” one of the most talked about positions in the 21st century. The HBR article “Data Scientist: The Sexiest Job of the 21st Century” by  Thomas H. Davenport and D.J. Patil talks about how important this job can be. The article further talks about data scientists becoming key player of the organisations.
A popular writer Bernard Marr in one of his articles talk about what is the skill set one should possess to become successful in data science field. The qualities are as follows.
  • Multidisciplinary Its not only Maths Phd’s become successful you can come from various other fields.
  • Business savvy Regardless of your higher degree you must understand business
  • Analytical. naturally analytical to spot patterns
  • Good at visual communications Able to make correct graph in the correct time
  • Versed in computer science. Familiar with Hadoop, R,Java, Python, etc. are in high demand.
  • Creative. Creative enough to find answers to questions with the existing data
  • Able to add significant value to data
  • A storyteller. Be able to create a story that make value to the organisation
Bearing these qualities in mind anyone who is interested in finding their future in Predictive analytics must develop the above qualities. To get a start for the process, this blog contains a brief tutorial on R basics. This R basics section talks about a case study on social health determinants and finding value from the existing data.


A bit about the Case study

SDOH: SOCIAL DETERMINANTS OF HEALTH

According to WHO(World health organization), people die young or has poor health based on where they live and what they do (WHO 2015).WHO identified nine social determinants of health that affect people with low socioeconomic status that have less accesses to SDOH (WHO 2015). The nine social determinants of health are 

  • Social gradients, mortality rate is higher for communities with poor socioeconomic status.
  • Stress at work or in life general.
  • Early childhood development such as conditions alcohol or drug use during pregnancy.
  • Social exclusion such as racism.
  • Unemployment
  • Social support networks
  • Addiction
  • Availability of healthy food.
  • Availability of transportation

SDOH DATA FROM ADELAIDE UNIVERSITY 

Socioeconomic data was extracted and transformed for predictive analytics regarding social health. The data can be acquired from Adelaide university at  www.publichealth.gov.au (Adelaide 2015).
The data has demography and socioeconomic status of Australian population across the states based on LGA local area governments. The data values are percentage out of 100 populations therefore the values are scaled for statistical analysis.




R Basics

R is a versatile statistical computational package commonly used in the predictive analytics environment. In this section we will have a look at few basic functionality which can get you stated in R.

Uploading data into R

There are few ways to do the data uploading. The most common and easiest way is to use the “read.csv” command. This is how it can be used
Convert your data file into a “.csv” file.
Enter the following in your R console. To start with create an object with any name (eg: data1) and followed by the file uploading command.

data1<-read.csv(file.choose(), header=T)

file.choose” command is one of the easy ways to browse the required file without nominating actual file path. This will open a new window for you to find the needed file.

Exploration

To start exploring we can generate a summary of the data set.

summary(data1)

A similar command which will show you the first 6 entries of ach variable is “head()” .

head(data1)

To find out what type of data your variable has you can run the following command

class(data1)
class(data1$ your variable name)





  Basic Visualisations with R



Scatter plot

To make a scatter plot you can use the plot command in the following manner. 

plot(
data1$Obese,data1$Smokers, x and y variables you are looking at
xlab="Obese People %",   X axis label
ylab="Smokers %",   Y axis label
main="Obesity vs Smoking"  The main title of the plot which will appear on the top.
)


To show the mean in your scatter plot
mean.ob<-mean(Obese)  calculates the mean of the required variable
plot(Obese~Full.time.Education.at.age.16) creates a scatter plot
abline(h=mean.ob) draws the line


Linear regression
model1<-lm(Obese~Full.time.Education.at.age.16) Creates a linear model
model1 prints the model
abline(model1,col="red") shows the regression

Multiple Linear regression
model2<-lm(Obese~Full.time.Education.at.age.16+Unemployed)


3D Scatter plots
The following commands can be used to create 3D scatter plots
install.packages("scatterplot3d") Installing required packages
require(scatterplot3d) calling the libraries
scatterplot3d(data1[3:5]) creating the 3d scatter plot

Interactive 3D scatter plot

library("rgl") Call the libraries or install them if you don’t have them already
library("RColorBrewer")
plot3d(data1$Obese,data1$Smokers,data1$Unemployed,xlab="Obesity",ylab="Smokers",col=brewer.pal(3,"Dark2"),size=8)


Basic Predictions
You can use do basic predictions using the linear model that you have created earlier.
predict (model1) r will predict the outcomes using the linear model

No comments:

Post a Comment