This guide will introduce you to data visualization, why it is important, explain types of visualizations, data cycles, design principles, and connect you with tools and learning resources.
First, data is everywhere. 2.5 quintillion data bytes are created every day! If you covered the earth in 2.5 quintillion pennies, it would wrap around FIVE times over. That is a lot of data. And data output is growing. It is difficult to think of a professional industry that is not being impacted by data. It has also impacted us personally with social media and applications and sensors creating a deluge of data (IoT). Even government is data-driven with "open data" and smart cities. Our world is moving from analog to digital (this "digital" transformation has been accelerated the last few years especially) at an especially rapid and increasing pace. So what is data visualization?
Data Visualization Definitions
Data visualization is the term for information collected and expressed in the form of an image, chart, graph, plot, cloud, tree, or other graphical means. - Sinclair Data Visualization
Data visualization is an interdisciplinary field that deals with the graphic representation of data. It is a particularly efficient way of communicating when the data is numerous as for example a time series. - Wikipedia
Data visualization is the graphical representation of information and data. By using visual elements like charts, graphs, and maps, data visualization tools provide an accessible way to see and understand trends, outliers, and patterns in data. - Tableau.com
"Every two days now we create as much information as we did from the dawn of civilization up until 2003, according to Schmidt. That’s something like five exabytes of data. Let me repeat that: we create as much information in two days now as we did from the dawn of man through 2003." - Eric Schmidt, CEO and Chairman of the board of Google
"90% of all data has been created in the last two years." (Source: IBM)
Data Viz is increasingly important and not going away soon as data grows. This means we need to have a "handle" on data and how to use it. The world needs more people who can work with data (Our current people shortage is increasing salaries).
Data viz helps to define outcomes in various roles in our personal lives, public policy, and for organization gains.
Data Cycles
Data is found/collected, then processed or analyzed to display certain values, then displayed as visualizations. Even the simplest expression of data in a plain table is a visualization. Below are the technical cycles associated with data wrangling.
Data Collection
Data collection is the process of gathering and curating data for use in decision-making, strategic planning, research and other purposes. The first steps in the data cycle process is to find, collect, and curate the data.
Data Storage
Data storage refers to the use of recording media to retain data using computers or other devices. The most prevalent forms of data storage are file storage, block storage, and object storage, with each being ideal for different purposes or types of data. Data can be stored in flat files or databases. So types of data are unstructured (think social media), semi-structured (pdf files), or structured storage (relational databases).
Data Cleaning (sometimes called cleansing)
Data cleaning is the process of fixing or removing incorrect, corrupted, incorrectly formatted, duplicate, or incomplete data within a dataset. Cleansing can be manual using tools like Excel or Google Sheets, various programming languages (Vba, Perl or Python), or highly automated workflow using advanced commercial or opensource tools like Alteryx, Knime or Blue Prism, UIPath etc. Some tools today blur the lines between cleansing data and transforming it.
Data Analysis
Data Analysis is the process of systematically applying statistical and/or logical techniques to describe and illustrate, condense and recap, and evaluate data. The analysis is a way to distinguish the "signal" from the "noise" like finding the proverbial needle in the haystack commonly referred to as an outlier or inlier. Some of the top tools for data analysis are R, SAS, and SPSS.
Communication, data visualization
Data visualization is the representation of data and information through charts and diagrams. Data Viz can be used to clearly communicate important information and is often a much faster way of sharing a message than describing everything in text and numbers. Data visualization helps us understand information faster.
"A picture says a thousand words." We’re familiar with that expression and it also holds true when it comes to data. Data viz helps us to persuade - like communicating visually at a board meeting.
Data-Driven Decisions
Data-driven decision making (or DDDM) and sometimes called decision management system (DMS) is the process of making organizational decisions based on actual data rather than intuition, gut, or observation alone.
Insights from data help us to improve processes like identifying patterns and relationships between operations and performance
Ethical and Privacy Considerations
Digital ethics describes the moral principles governing the behaviors and beliefs about how we use technology and data in particular personal data. Or, in short, data ethics doing the right thing with data with regard to conduct that is taken by data collectors. Data Ethics is of increasing relevance as the quantity of internet data increases exponentially. Data ethics is a very important area of consideration and solutions.
Data Story Telling
Data storytelling is the ability to effectively communicate insights from a dataset using narratives and visualizations. It can be used to put data insights into context for and inspire action from your audience. Data storytelling is an approach for communicating data-driven insights, and it involves using a combination of three key areas: data, visuals, and narrative.
Telling the story allows us to understand users, market, and competition - like increasing customer loyalty strategy with marketing data.
The role of data visualization in communicating the complex insights hidden inside data is vital. While analysts and other data roles have an eye for discovering the key insights from datasets, a top business stakeholder or average consumers might not be able to do the same.
Data design is what makes effective communication the last step of the delivery. Communicating the data effectively is an art coupled with some science. However, many data analysts lag on the skills when it comes to the design of data visualization.
“Graphical excellence is that which gives to the viewer the greatest number of ideas in the shortest time with the least ink in the smallest space.” —Edward R. Tufte, The Visual Display of Quantitative Information
Usefulness
The visualization provides information that is logical and the user can easily decipher, draw conclusions from, and initiate actionable decisions. The data has no errors and is formatted appropriately.
Pay attention to basic elements
Choose the right charts
Remove unneeded elements (chart junk)
Give your chart a title
Label your chart elements (various axis)
Truthful/Intuitiveness
The visualization conveys the intended information and the chart is appropriate for the data.
Data
Data Quality
Link to the source of the data.
Aesthetics
Every aspect of the design enhances the message and guides the user's attention. Title, headings, labels are present. The font, color, and style is memorable to the user. The visualization is clean, clear, concise, and shows attention to detail.
Simplify less important information
Use color carefully
Layout
Typography/Fonts
Interactivity
The visualization changes as the user interacts with the data and the data is of interest. Interactivity can be indicated by the use of the following techniques like dropdowns, sliders, markings, filters etc.
Completeness / Timeliness
The visualization is submitted on time and contains all required elements. Include a textual description of the visualization in surrounding text, and include alt text for the visualization when it is in an image format.
All of these design elements can be checked via a data visualization rubric or scorecard. Here are a few suggestions:
Interpreting Data Visualizations
General Accessibility
Data visualization best practices and tools do not always discuss accessibility, which can exclude groups of people.
Color
Using Color for Design
(Request through OhioLINK)
(Request in the Sinclair Library)
(Request in the Sinclair Library)
(Request in the Sinclair Library)
(Online Free Book)
(Request in the Sinclair Library)
(Request in the Sinclair Library)
(Request in the Sinclair Library)
(Request through OhioLINK)
(Request through OhioLINK)
(Request through OhioLINK)
(Request in the Sinclair Library)
(Request in the Sinclair Library)
(Request in the Sinclair Library)
(Online Free Book)
(Request in the Sinclair Library)
(Request in the Sinclair Library)
(Request in the Sinclair Library)
(Request through OhioLINK)
(Request through OhioLINK)
(Request through OhioLINK)
Blogs