by Bakhshash Kaur Saini,Mekete Berhanie & Nuwan Ramawickrama,
When you hear the word
“Predictive Analytics” if you think about your local fortune teller, we have
got it wrong, somewhere somehow. Predictive analytics is a process to identify
patterns and relationships which may happen in the future by analysing the existing
data. It is not fortune telling but it can become a factor that decides the
fortune of your business. With the new trends such as IoT (Internet of things),
The big data and one to one marketing the market place has changed drastically.
The nice markets such as Uber, Airbnb are booming because of the recent burst
of the technological growth. While technology
advances every second and the quick changes of the market atmosphere created a
huge value for the knowledge about the future. Eventhough we would like to know
the future there is no proven way of looking into the future. That is when
Predictive Analytics comes into picture with an ability of predicting the
trends and patterns of the market using the existing data.
Predictive analytics is heavily
based on statistical application and mathematical modelling. The need of
predictive modelling and analysis made the job “Data Scientist” one of the most
talked about positions in the 21st century. The HBR article “Data
Scientist: The Sexiest Job of the 21st Century” by Thomas H. Davenport and D.J. Patil talks
about how important this job can be. The article further talks about data scientists
becoming key player of the organisations.
A popular writer Bernard Marr in
one of his articles talk about what is the skill set one should possess to
become successful in data science field. The qualities are as follows.
- Multidisciplinary Its not only Maths Phd’s become successful you can come from various other fields.
- Business savvy Regardless of your higher degree you must understand business
- Analytical. naturally analytical to spot patterns
- Good at visual communications Able to make correct graph in the correct time
- Versed in computer science. Familiar with Hadoop, R,Java, Python, etc. are in high demand.
- Creative. Creative enough to find answers to questions with the existing data
- Able to add significant value to data
- A storyteller. Be able to create a story that make value to the organisation
Bearing these qualities in mind anyone who is
interested in finding their future in Predictive analytics must develop the
above qualities. To get a start for the process, this blog contains a brief
tutorial on R basics. This R basics section talks about a case study on social
health determinants and finding value from the existing data.
A bit about the Case study
SDOH: SOCIAL DETERMINANTS OF HEALTH
According to WHO(World health organization), people
die young or has poor health based on where they live and what they do (WHO
2015).WHO identified nine social determinants of health that affect people with
low socioeconomic status that have less accesses to SDOH (WHO 2015). The nine
social determinants of health are
- Social gradients, mortality rate is higher for communities with poor socioeconomic status.
- Stress at work or in life general.
- Early childhood development such as conditions alcohol or drug use during pregnancy.
- Social exclusion such as racism.
- Unemployment
- Social support networks
- Addiction
- Availability of healthy food.
- Availability of transportation
SDOH DATA FROM ADELAIDE UNIVERSITY
Socioeconomic data was extracted and transformed
for predictive analytics regarding social health. The data can be acquired from
Adelaide university at www.publichealth.gov.au (Adelaide
2015).
The data has demography and socioeconomic status of
Australian population across the states based on LGA local area governments.
The data values are percentage out of 100 populations therefore the values are
scaled for statistical analysis.
R Basics
R is a versatile statistical computational package commonly
used in the predictive analytics environment. In this section we will have a
look at few basic functionality which can get you stated in R.
Uploading data into R
There are few ways to do the data uploading. The most common
and easiest way is to use the “read.csv” command. This is how it can be used
Convert your data file into a “.csv”
file.
Enter the following in your R console. To start with create
an object with any name (eg: data1) and followed by the file uploading command.
data1<-read.csv(file.choose(),
header=T)
“file.choose”
command is one of the easy ways to browse the required file without nominating
actual file path. This will open a new window for you to find the needed file.
Exploration
To start exploring we can generate a summary of the data
set.
summary(data1)
A similar command which will show you the first 6 entries of
ach variable is “head()”
.
head(data1)
To find out what type of data your variable has you can run
the following command
class(data1)
class(data1$ your
variable name)
Basic Visualisations with R
Scatter plot
To make a scatter plot you can use the plot command in the
following manner.
plot(
data1$Obese,data1$Smokers,
x and y variables you are looking at
xlab="Obese
People %", X axis label
ylab="Smokers
%", Y axis label
main="Obesity
vs Smoking" The main
title of the plot which will appear on the top.
)
To show the mean in your scatter plot
mean.ob<-mean(Obese) calculates the mean of the
required variable
plot(Obese~Full.time.Education.at.age.16)
creates a scatter plot
abline(h=mean.ob) draws
the line
Linear regression
model1<-lm(Obese~Full.time.Education.at.age.16)
Creates a linear model
model1 prints
the model
abline(model1,col="red")
shows the regression
Multiple Linear regression
model2<-lm(Obese~Full.time.Education.at.age.16+Unemployed)
3D Scatter plots
The following commands can be used to create 3D scatter
plots
install.packages("scatterplot3d")
Installing required packages
require(scatterplot3d)
calling the libraries
scatterplot3d(data1[3:5])
creating the 3d scatter plot
Interactive 3D scatter plot
library("rgl")
Call the libraries or install them if you don’t have them
already
library("RColorBrewer")
plot3d(data1$Obese,data1$Smokers,data1$Unemployed,xlab="Obesity",ylab="Smokers",col=brewer.pal(3,"Dark2"),size=8)
Basic Predictions
You can use do basic predictions using the linear model that
you have created earlier.
predict (model1) r
will predict the outcomes using the linear model







No comments:
Post a Comment