In this data visualization project, we focus on analyzing the sales data of Super Store, a renowned retail outlet in the United States. The dataset includes comprehensive information about sales, profit, and regional statistics for a diverse selection of over 1,000 products. Our objective is to uncover significant insights from the data through visualization and analysis, using the tools Tableau and RStudio for efficient data processing.
By employing various visualization techniques such as line charts, bar graphs, and interactive dashboards, we aim to reveal sales trends, product performance, and opportunities for Super Store to grow in a competitive market. Our primary goals include identifying profitable product categories, assessing the impact of discounts on profitability, and understanding sales patterns across different states and product segments. Through this data-driven approach, we intend to provide stakeholders with actionable insights that will optimize sales strategies, enhance decision-making, and maximize profits for Super Store.
Goal : To find various supermarket statistics such as
The research is based on Kaggle's Superstore analytical data, providing insights into various aspects of profit, such as loss, order ID, shipment mode, delivery status, online transactions, and store descriptions. The dataset consists of 9,994 rows and 21 variables, covering 24 states and cities. To visualize the data effectively, we utilized a range of visualization techniques, including maps, bar charts, line plots, scatterplots, circular plots, violin plots, and mosaic plots. After thorough cleaning and analysis, we focused on different US cities with sales, profit, segment, shipment modes, product descriptions, and more to derive valuable insights.
Before beginning the exploratory analysis, data cleaning was conducted, ensuring the dataset's integrity by addressing missing data and eliminating duplicate files.
The initial exploratory visualizations focused on understanding the interactions between various variables. We started by examining the relationship between states and profit. The bar graph displayed multiple states in the United States, each represented with varying profit rates, providing an overview of the state-wise profit distribution.
The exploratory bar graph reveals that Indiana has the highest profit, while Ohio has the lowest profit among the states in the United States.
The Tree Map visualization displays the relationship between sales and product categories, enabling us to assess demand and purchase frequency per product. "Phones" emerge as the top-selling category, while "Fasteners" have the lowest sales compared to other categories.
The Cartogram visualization provides an intuitive representation of aggregate sales in different states, highlighting sales disparities between regions. The geographic map groups states together, displaying the total sales in each state over time. Polarized color standards effectively portray the sales rate distribution, ranging from mild to severe, using distinct colors like peach and red. While color may not be the best attribute for direct comparison, it offers a sense of scale and aids in understanding sales patterns. Annotations are also available to assist the audience in locating specific sales data.
The line chart illustrates the relationship between shipment dates and product categories, with data points connected to depict the rate of change from the previous year. Office supplies record the highest sales in December (Q4) and decline in January (Q1). Similarly, furniture and technology sales peak in December and decrease in January. This trend may be attributed to increased discounts and promotions available towards the end of the year, contributing to higher sales during that period. The line chart's connection of data points facilitates the visualization of seasonal sales patterns across different product categories.
The data set includes numerous countries for analysis, as depicted in our previous visualizations. Our exploration involved examining various factors, including sales, profit, discount, and shipment methods. After analyzing sales by state, we expanded our investigation to explore sales across other categories. Each team member pursued different variables through exploratory analysis, providing diverse perspectives before regrouping to synthesize our findings. This approach allowed us to delve deeper into different aspects of the data, making the visualization process fascinating and comprehensive.
This Tableau-generated visualization enables comprehensive comparisons between sales and profit across various segment groups. The line graph showcases quarterly sales differences, while the bar chart illustrates profit trends in furniture, office supplies, and technology subcategory groups over time. Additionally, the grouped bar chart displays combined sales rates of these subcategories based on profit levels, using a color scheme to encode ordered groups and show their interconnectedness by variable. The visual allows viewers to delve into sales and profit dynamics, making insightful and in-depth analyses across the subcategory groups.
The visualization compares sales and profit, focusing on the impact of the shipment mode. Using RStudio's ggplot, the scatterplot depicts data points for each shipment mode, revealing the relationship between sales and profit. The standard shipment class stands out as having generated more profit or losses, though it does not exhibit significantly higher range profits. This visual provides valuable insights into the correlation between sales and profit based on different shipment modes, aiding in decision-making and strategy optimization for the Superstore industry.
ggplot(data = store_data, aes(x = Sales, y = Profit, color = Ship.Mode)) + geom_point()
The visualization showcases the impact of the shipment mode on sales concerning the quantity of products sold. Using RStudio's ggplot, the bar graph displays the quantity of products on the x-axis and their corresponding sales on the y-axis, with each bar color-coded based on the shipment mode. Notably, the standard class of shipment mode has driven the highest sales among all modes.
The visualization showcases the impact of the shipment mode on sales concerning the quantity of products sold. Using RStudio's ggplot, the bar graph displays the quantity of products on the x-axis and their corresponding sales on the y-axis, with each bar color-coded based on the shipment mode. Notably, the standard class of shipment mode has driven the highest sales among all modes.
ggplot(data = store_data, aes(x = Quantity, y = Sales, fill = Ship.Mode)) + geom_bar(stat = "identity")
The visualization reveals a clear correlation between discounts and profitability for different product segments. As discounts increase, profitability decreases across the segments. While products with no discounts exhibit a wide range of profits, a higher range of discounts is associated with more losses and less profit.
ggplot() + geom_bar(data = store_data, aes(x = Discount, y = Profit, fill = Ship.Mode), stat = "identity")
The visualization highlights the sales distribution across different product categories. Technology emerges as the top-selling category, followed by Furniture and Office Supplies. Notably, the West and East regions contribute significantly to the majority of sales.
ggplot() + geom_bar(data = store_data, aes(x = Category, y = Sales, fill = Region), stat = "identity")
The visualization indicates that the Furniture category experiences more losses compared to the Technology and Office Supplies categories. Additionally, the profitability in the Furniture category varies across a wide range, mirroring the sales pattern from low to high.
ggplot() + geom_bar(data = store_data, aes(x = Category, y = Profit, fill = Region), stat = "identity")
"Thanks for diving into this case study! I would love to hear your thoughts and feedback. Feel free to reach out using the contact form below. Let's keep the conversation flowing!"