Four Mistakes To Avoid If You’re Analyzing Data

Analyzing and graphing data helps us understand our work in science, business, and everyday life. We’ve written this post with a few principles we think about as a startup. We used Plotly’s free web app. Contact us if you’d like to use Plotly Enterprise to power your graphing and collaboration.

Source: xkcd

1. Choose The Right Metrics

There are two types of data teams must be able to differentiate between: vanity metrics relate data that sounds appealing but is ultimately irrelevant while actionable metrics relate data that is relevant to a team.

For example, a startup studying only the first graph below might conclude that things are going well. The second graph reveals that despite the increased traffic, only 1% of visitors are actually signing up. Thus, we could make a new goal: increase not only the number of visitors who sign up, but the proportion thereof.

2. Correlation vs. Causation

As the comic at the beginning of this post notes, correlation does not imply causation. An increase in sales can’t directly be attributed to a new marketing startegy, just like cheese consumption can’t directly be attributed to doctorates awarded. A “correlation means causation” argument needs to pass further testing, analysis, and study.

<br>Cheese Consumption & Degrees Awarded

3. Ignoring the Tail

Looking at “top 10” for metrics is natural. It can be misleading if the “other” category largely exceeds the top categories for the metric. For example, consider the next two graphs. Most of the traffic is coming from smaller contributors in the “other” category, yet someone looking at the first graph might only focus on Facebook and Twitter. For more on distributions, see heavy-tailed distribution.

Top 4 Traffic Sources for the Month of May

4. Avoid Averages

Focusing too much on averages can be misleading as they do not accurately portray exactly how the data is dispersed. For example, say that our analytics say that “Average Time Spent” on the site is 1 minute and 33 seconds. Yet graphing out all the times spent on the site yields this:

Number of Seconds Spent on Site by Users

The two factors to analyze are (1) that many users are leaving the site in under 10 seconds, and (2) that a portion of them stay between 181 and 1800 seconds. In this case, the average does not explain how users are interacting with the site. Pro tip: look at a histogram or a boxplot to get a better feel for a distribution.

Analyzing data is not easy. We hope this post helps. Has your team made or avoided any of these mistakes? Do you have suggestions for a future post? Let us know; we’re @plotlygraphs, or email us at feedback at plot dot ly.

Four Mistakes To Avoid If You’re Analyzing Data

1. Choose The Right Metrics

2. Correlation vs. Causation

3. Ignoring the Tail

4. Avoid Averages

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

VMOU RSCIT Result 2017, RSCIT Result VMOU rkcl.vmou.ac.in Name Wise

NCERT Solutions for Class 9th Sanskrit Chapter 3 पाथेयम्

Form: VAT: registration - land and property (VAT5L)

High-speed Ethernet switches a bright spot in network forecasts

Trial of East Grinstead man accused of rape to begin next week

WONHO – Better Than Me – Single [iTunes Plus M4A]

Rajasthan Board 10th Result 2016 Roll No wise & Name Wise

Theja Surapaneni The ‘Most Attractive' Man on Australian TV Of All Time

MS-CHAPV2 NAP Policy failing - Reason Code 65

Ex-Colchester United youth player Craig Winskill carried out armed robbery to...

Karimnagar District Tahsildars Phone Numbers-Mobile Numbers Telangana-State

Bureau of Internal Revenue: Regional Offices (Directory)

Four Air Leitchville Pty Ltd v Hurlad Pty Ltd (No 3) [2024] FCA 238

A/L Technology Stream – Subject combinations, Syllabuses and Teacher guides

Wazifa Remedy to Increase Enlarge Penis Size

Arms accused back in court next month

TBT: Samini “Tempo” Feat Mugeez (R2Bees) Prod by Kaywa

In Court: Cases heard at Central Devon Magistrates' Court

Schools benefit from American donation