Bar Plot and Histogram Made Simple

Bar plot displaying categorical data. Each bar represents a specific category or group, with the height or length of the bar indicating the value or frequency associated with that category.

Bar plots and histograms both visualize data distribution. Here, we discuss the usage of both of these visualizations.

What is a Bar Plot?

A bar plot, or a bar chart, is a graphical representation that uses rectangular bars to display categorical data.

For a single variable bar plot, each of the bars is used to represent different subgroups within the variable.

An ice cream company can have sales data for different flavors of ice cream, and each bar represents different types of ice cream.

Bar plot depicting ice cream sales data by flavor: vanilla, strawberry, and chocolate. Strawberry flavor shows the highest sales.

We can see from the figure above, the height of the bar represents how much of each group of ice cream was sold. With just a glance, we could tell that strawberry ice cream flavors were sold the most.

Let’s Talk About Pie Charts

Pie charts are used to represent data by their proportions, but sometimes they are a really bad way to visualize the data.

Let’s take the above example about ice cream sales compared to different flavors, but in this case, what if the data is similar in proportion?

Pie chart illustrating ice cream sales data by flavor. Vanilla, chocolate, and strawberry flavors are equally represented, indicating an equal proportion of sales for each flavor.

A pie chart like this makes us think they are all the same size, but what about visualizing the same data using a bar plot instead?

Bar plot on the same data allows us to instantly see that vanilla-flavored ice creams have the most sales, and chocolate-flavored ice creams have a bit more sales than strawberry-flavored ice creams.

Bar chart representing ice cream sales data for vanilla, chocolate, and strawberry flavors. Vanilla flavor demonstrates the highest sales.

How to make a Bar Plot in R?

Building a bar plot in R, it can be easily done using the built-in barplot function. Suppose I have this data frame of ice cream sales data by flavor.

> df

      Flavor Sales

1    vanilla    37

2  chocolate     9

3 strawberry    28

Now to build our graph, we just have to call the function.

barplot(height=df$Sales, names=df$Flavor, col=rainbow(3),

        main='Number of Ice Cream Sold by Flavor',

        xlab='Ice Cream Flavors',

        ylab='Total Sale of Ice Cream')
Bar chart displaying ice cream sales data for vanilla, chocolate, and strawberry flavors. Vanilla has the highest sales, while chocolate has the lowest sales.

What is Histogram?

Histogram also represents continuous data using bars, but it does it in a slightly different way. It is created by dividing the range of values into intervals, called bins, and then counting the number of data points that fall into each bin.

Age for example can be grouped into different bins, this allows us to visualize the distribution of the data. 

The figure below shows the age from 1-80, and each bar represents the value that falls in that range. For example, the first bar is all the values that are greater than or equal to 0 but less than 5.

Histogram depicting the distribution of age from 1 to 80 years old.

Histograms are also great at showing the skewness of the data. The graph below can demonstrate how skewness can impact our data.

For example, people working in different jobs might have different physical activity levels. There can be more people who do not walk much compared to more active people.

Histogram displaying the distribution of the number of steps walked per day. The histogram reveals that fewer individuals tend to walk higher numbers of steps per day.

We can observe that more people have a sedentary lifestyle, and as the physical activity level increases, there are fewer people in that bin.

What this data shows is a right skew, meaning it has a tail on the right side of the data compared to left skew when the tail on the left side of the data.

In my Google Data Analytics post, I did a much more in-depth analysis for understanding the physical activity of people using smart devices.

How to make Histograms in R?

To build our Histogram in R, we can use the hist built-in function. Let’s use the following data as an example.

age <- c(sample(1:80, replace = T))

Output:

age

[1] 30 40 31 22 71 26 70  3 66 63 57 49 61 23  5 62 19 42 54 12 33 68 72  4 11 79 47 49 28 69 53 77 40  5 68 27 20 59 31 25 53 25 52 14 29 57 79 11 53 67 59  9  6 59 10

[56] 54 69 55 12 11 29 38 72  2 43 47 50 55 75  3 75 17 70 45 39 63  8 76  2 60

hist(age, col=rainbow(20),

     xlab = 'Age',

     main = 'Histogram of Age')
Histogram depicting the distribution of age from 1 to 80 years old.

Differences between Bar Plot and Histogram

Now that we have seen what both types of visualizations are for, and the precaution we should take when using pie charts.

Let us now discuss the difference between the two visualization methods.

Bar Plot

  • Used for categorical data with different groups
  • Each bar represents the occurrences of that group
  • For examining patterns and relationships for categorical variables

Histogram

  • Used for continuous data
  • Data falls under the ranges of respective bins
  • For examining the density of data over a continuous range
  • Detecting Skewness of the continuous data