By: Shehan Rodrigo, Piyumi Gunamuthu & Samson Akinyemi
(Masters students in Victoria University Melbourne)
Data is everywhere. But we can
measure the value of data only by the way we extract knowledge from it. Visual
tools and other analytical techniques support data exploration for meaningful
discovery of information. This blog post has been written as part of the
assignment for unit Predictive Analytics studied at Victoria University
Melbourne. We would like to thank our lecturer Dr.Shah Miah, who provided us
guidance in understanding the value and scope of Predictive Analytics.
Data analytics add many more opportunities
to businesses by providing deeper and more meaningful insights which enables
them to strengthen the decision-making process and improve customer
experience. But today, most companies
overwhelmed by the variety and the amount of data that they have and struggling
to analyze, interpret and visualize them in a meaningful manner. To overcome
these issues organizations are moving towards predictive analytics data
discovery methods and visualization tools.
In order to properly handle these data analytics and visualization tools
it is very important to identify the nature of the data and feed the data
correctly to use in particular tools in required manner.
What is Data Exploration?
Data exploration can be identified
as the first step in data analysis and basically involves summarizing
the main characteristics of a data set. Data exploration usually carried out
with the use of visual analytics tools such as SAP Lumira, SAP Analytics Cloud,
Tableau and can be done using more advanced statistical software, like “R”.
These tools allow users to quickly and simply view most of the relevant features
of their data set by displaying data graphically - for example, through scatter
plots or bar charts -users can see if two or more variables correlate and
determine if they are good candidates for further in-depth analysis.
What is R?
Further, we would like to discuss
some aspects of above mentioned data exploration tools. If we select “R”, it is
a programming language which also named as an open source scripting language
for predictive analytics and data visualization. The R allows academic
statisticians and others with sophisticated programming skills to perform
complex data statistical analysis and display the results in any of a multitude
of visual graphics. The "R" name is derived from the first letter of
the names of its two developers, Ross Ihaka and Robert Gentleman, who were
associated with the University of Auckland in 1995. R
provides a wide variety of statistical and graphical techniques such as linear
and nonlinear modelling, classical statistical tests, time-series analysis,
classification, clustering etc. and it is highly extensible. One of R’s
strengths is the ease with which well-designed publication-quality plots can be
produced, including mathematical symbols and formulae where needed.
Examples of Data exploration using different tools.
Below is a snipped image from
tutorial session where we practiced R programming language to generate histogram
to identify the data related to an airline delay. R do parallel external memory
algorithms which use to compute results by using xdf file formats. It includes
conditionals, loops, user-defined recursive functions and input and output
facilities. R has its own LaTeX-like documentation format, which is used to
supply comprehensive documentation, both on-line in a number of formats and in
hard copy. R also help to Move, Manage Mung and Merge data to retrieve useful
insights.
 |
Source: https://campus.datacamp.com/courses/big-data-revolution-r-enterprise-tutorial/chapter-1-introduction-1?ex=6
Similarly, other data visualization
tools and software provide more insight to explore data in more meaningful way.
Below is another example where we did data exploration to identify how refugees
spread across the world in 2015. Here we used SAP Analytics Cloud visualization
software to explore the data which we obtained from world bank. Here we first
insert the data into an excel sheet and then upload it to the online software.
It helps us to easily identify the highest regions where refugees scattered and
rank those countries. If we did it manually it will take hours to generate this
output. But with this visualization software it was easy to explore the story
behind the data. At a glance, we can identify which country has the highest
refugee population and its geographical location. It saves time and help to
analyze easily.
Data obtained from: data.worldbank.org/indicator/SM.POP.REFG.OR
As another example below demonstrate a visualization output which we generated for a previous research
study. This study conducted for the
purpose of representing data sets in a visualization report which gives a
better and simplified insight about the Illicit Drug Usage in Australia, using SAP Business Object Lumira tool. We found that tool is very useful and
having lots of options to analyse and visualize data. Even a beginner level
user can easily make outputs using it. Below visualization reflects the popularity of different types of drugs in
Australia during two different year combinations.
Data
obtained from: https://www.aihw.gov.au/reports-statistics/behaviours-risk-factors/illicit-use-of-drugs/data
From all these examples, we can justify that proper data exploration methods will allow analysts and other interested parties to explore the stories behind rational data sets.
Impact of Data Exploration in Predictive Analytics
Moreover, it is important to
discuss about the impact of data exploration in predictive analytics. The
quality of your data inputs decides the quality of your output. So, once you
have got your business hypothesis ready, it makes sense to spend lot
of time and efforts here. According to researchers’ estimates data
exploration, cleaning and preparation can take up to 70% of your
total project time. The analysts make such a great effort to these tasks as
they could gain a high quality accurate output as a result. The organizations
allocate huge budgets on these tasks as they expect a perfect analysis outcome
from the process which is important to make decisions and predict the future of
the organization to achieve competitive advantage in the industry. Most
organizations gain the ability of planning future and setting up goals and
targets depending on the data outputs generated through this process.
Challenges Related to Data Exploration
Furthermore, there are challenges
related to data exploration has identified in different studies. User
interaction with database at various level of request is still a challenge.
Ability to automatically steer the database with minimal query formations is a
task that requires a high level of technical know-how including the versed used
of programming instructions and its integration.
Idreos et al (2015) overview of
data exploration techniques, highlighted the challenge experienced by data
analyst at the user interaction layer, suggesting that we still lack
declarative “exploration” languages to present and reason about popular
navigational idioms. Such languages could facilitate custom optimizations, such
as user-driven , reusing past or in-progress query results and
customizing visualization tools. Other future directions include processing
past user interaction histories to predict exploration trajectories and
identify interesting exploration patterns. Similarly, at the database system
layer there are numerous opportunities to reconsider fundamental assumptions
about data and storage patterns and how they can be driven dynamically by high
level requests.
The overall idea of data
exploration is to achieve data navigation systems that automatically steer
users towards interesting data. Visualization tools should inherently support
exploration with help to identify patterns which are fully driven by the
exploration paths taken by the users. Exploration methods should be able to
provide answers instantly even if they are not complete, but it should also be
able to eventually lead users towards interesting stories.
Used Resources
Idreos, S., Pa paemmanouil, O. and Chaudhuri, S., 2015, May. Overview of
data exploration techniques. In Proceedings of the 2015 ACM SIGMOD
International Conference on Management of Data (pp. 277-281). ACM.
Linoff,
G.S. and Berry, M.J., 2011. Data
mining techniques: for marketing, sales, and customer relationship management.
John Wiley & Sons.
Matloff, N., 2011. The art of
R programming: A tour of statistical
software design. No Starch Press.
Ohlhorst, F.J., 2012. Big data
analytics: turning big data into big money. John Wiley & Sons
|
No comments:
Post a Comment