How to Interpret and understand DataHawk's Sales Estimator Tool

Take an insider's look at the new and improved Amazon sales estimator, from its inception to its latest version, released in September 2021. After reading this article, you will understand how feature building works at DataHawk and how to use the application-based sales estimator tool. Last but not least, you will see how our sales estimator tool leaves all our competitors in the dust. There is no product on the market more accurate!

DataHawk Sales Estimator

1. What is the DataHawk Sales Estimator?

The DataHawk Sales Estimator gives you an estimate of the units sold and of the sales made (in your marketplace's currency) during the last 30 days for a given product.

The Sales Estimator displays a historical view of daily sales estimations. Both of these features are available as soon as you start tracking a product on the DataHawk platform.

2. Why DataHawk Created the Sales Estimator Tool

At DataHawk, we excel at large-scale product data retrieval. Data, pulled daily from Amazon's product pages, include the name of the product, seller, best seller rank (referred to as BSR), reviews, ratings, and so forth. You name it, we extract it.

Producing beautiful display charts of historical data is another of our many talents. To support our growth, we're continuing to bring new members of the data team on board. This includes analysts, scientists, and machine learning experts. Our capacity to scale and leverage Big Data tables for your benefit is also growing exponentially. One of the ways to leverage big data is by creating homemade KPIs (Key Performance Indicators) based on the data we fetch and process. This is how sales estimates are determined.

Since our first sales estimator tool release in 2018, the goal has remained the same: deliver estimated product sales figures using only public data.

The 2018 DataHawk Sales Estimator was our first iteration. At that time, we pulled data from historical customer reviews. If a product had 20 additional reviews each day, we would consider these reviews as a percentage of all the sales made. This simple model worked well for some marketplaces and categories, but less so for others. Our sales estimates weren't as trustworthy as we wanted. In order to improve the accuracy of sales estimate results, we launched a renewal mission: EMS 2.0 (Estimated Monthly Sales 2.0) in November 2020.

3. November 2020 to January 2021: The research phase

The first question we asked ourselves was: Which data point, available for every product on Amazon, is most closely correlated to sales?

The obvious answer was the Best Seller Rank.

The vast majority of product listings on Amazon display two different bestseller ranks:

Guide to DataHawk's Sales estimator tool
  • the rank of a product in its root category (i.e. Toys & Games - #15,710)
  • the rank of a product in the category for which it ranks the best (i.e. Toys & Games > Kids' Electronics > MP3 Players - #13)

As one can expect, the best seller rank of a product in a category is highly correlated with its sales. DataHawk has an extensive repository of historized BSR data because of the hundreds of thousands of products tracked using our platform.

The research phase was documented using Google Sheets. We used a root category in the US marketplace for which we have a lot of data and tried to create a model linking the median of ranks for all the different quantities sold within a month. Here is what our first models looked like:

Sales estimator tool, how best seller rank is calculated

The resulting insights and iterations of the research phase are as follows:

  • Product sales estimates have to be on a Parent level, not on a Child level. For the vast majority of products, sales ranks are given at a parent level (every child has the same sales rank) and we have no solution (right now) to divide the estimated sales between the different child ASINs of a product. Example: If a product has 1000$ of estimated sales ⇒ the sales from all its variants summed up equal to 1000$ ⇒ We can't divide estimations between variants for now.
  • Product sales estimates are product level, not seller level. Sales ranks are also agnostic of the number of active sellers for a product. Thus, the estimate provided is for all of the sellers combined. We have no solution (as of right now) to divide estimated sales between sellers.
  • Our model has to follow a logarithmic scale for it to follow the rankings' evolution.
  • We need marketplace-wide models to cover products that were in categories with too few data points.
  • Our models should be based on root categories ranks. At least at that time, creating a limited number of models helped us maintain their accuracy.

After reproducing our Google Sheets models for different root categories and marketplaces, we figured that this - pretty straightforward - version worked well enough to go further. We were estimating sales in the same ranges as JungleScout or FBAtoolkit. It was encouraging enough that we decided to transfer our knowledge into automatized Python notebooks.

4. February and March 2021: EMS 2.0 into production

After mapping out a strategy, we started implementing data intelligence into a Python notebook on DataBricks. Its job was querying the data from a server, creating all the possible data science models linking units sold to ranks (following the discoveries we made in the previous months) using NumPy & Pandas functions.

For those interested in mathematical equations, a model is defined by 3 variables A, B & C and it looks as follows:

$EstimatedUnitsSold = Model (A,B,C,Rank)$

With the data science automation step, 3 variables were defined for all the root categories across marketplaces with enough data points. For every product having a root category BSR or being in a covered marketplace, we could then compute the sales of the last 30 days.

We implemented the new DataHawk sales estimator back in April 2020 and the review-based former estimator became legacy. This was a big moment because it showed us two important things:

  • We have the capacity to build data science models and put them into production within a few months.
  • Our new estimator worked well and with only a few engineers on the project and several months of work, we were surpassing competition with a much more accurate tool.

Each month, we would refresh our EMS models by launching the notebook and the data pipeline.

Problems of EMS 2.0

After several months of using this new version of the DataHawk sales estimator some problems surfaced:

  • We had recently started using Snowflake as a warehouse and dbt as a warehouse manager and we didn't take advantage of these amazing data tools.
  • Our category referential had worsened up and because of wrong category mapping, our estimates coverage (throughout all the tracked products on our platform) went from 75% down to 45%.
  • When tracking a product, we sometimes miss out on the price because of the BuyBox being unavailable. We can also miss out on a BSR because of layout changes on the product pages. Even if these glitches are super rare, they caused holes in the sales estimates data.
  • Last but not least, we had to launch our modeling and data pipeline each month and we needed an automated system.

5. August and September 2021: EMS 3.0

In August 2021, intending to resolve all of version 2's problems, we launched a new project: the third version of our sales estimator.

Browse node referential renewal

We are currently using the Amazon categories referential in several features of the DataHawk application: Sales Rank Tracker, BSR Browser, Best Selling Categories Report, Estimated Monthly Sales Chart, etc.

The way we used to create and update categories within our database wasn't working anymore. We started working on rebuilding this process. The transition to Snowflake and dbt was a blessing for this project and using views, tables, and procedures, we managed to create a new Amazon Browse Node referential pipeline taking its source in the BSR top 100 data and the product pages data.

With ranks mapped to correct categories, the number of data points per category grew, thus the number of created models grew, and ultimately, our coverage went from a sloppy 45% up to 83% of estimations available for all the tracked products in DataHawk. As you can expect, this renewal mission of our browse node referential has also positively impacted all the features using it, previously cited.

Data Smoothing

83% of coverage still wasn't up to the standards we wanted so we started working on filling out data holes, more commonly known as data smoothing.

First of all, as previously stated, we cannot always retrieve the price of a product page. Even if we estimate units sold, we need the price to compute estimated sales. We added a price wherever it was possible (referencing every tracked product), meaning we filled every null price with the last one we saw for that product. Price coverage, for each product, is either 0% or 100%. Overall 90% of products have a 100% price coverage.

Secondly, we had some data holes in the estimations as well (i.e. missing rank for a day meant no estimation for that day). The data smoothing we conducted was a weighted average that gives more importance to the data the closest to the missing data's date. Thanks to this data smoothing step, we have around a +6% flat increase of coverage for our estimates (from 83% to 89%)

Next, to reach our initial goal of 100% coverage for version 3.0 of the DataHawk Sales Estimator, we used the first version of estimates (v1.0), based on reviews for the 11% of uncovered products remaining. Because of the high error margin for this version, we first decided to shrink this error down by comparing estimated to real data and by having different Review-based models per marketplace. This process worked quite efficiently and we were able to divide the error by a lot (factor of 10 in the US marketplace).

Eventually, we merged both estimations strategies (BSR-based EMS and Review-based EMS) by prioritizing the first one over the latter and the coverage went from 89% to 99%.

To sum up, 89% of our estimations are based on the ranks, 10% of our estimations are based on the reviews of that product and the missing percent of estimates comes from the fact that we need a bit of historical data to have a review-based estimate.

Implementation and release of DataHawk's newest sales estimator

The newly-achieved data smoothing was all completed using dbt views, including tables. We created a brand-new Python Notebook for the model creation part that would query directly in our cleaned Snowflake data. The process was smooth enough that we were able to fully automate it, and every month our models were updated. We then worked with the front-end team to refresh all the dashboards, charts, tables, and reports where sales estimate data was shown. As of September 2021, all of our customers have been able to enjoy our brand new version of the DataHawk Sales Estimator. Now, let's have a look at the application-based tool in action.

6. Using the Sales Estimator on the DataHawk Dashboard

There are two places in the app where we display estimated monthly sales & estimated monthly units sold:

Product list

What you see in the sales column is the latest estimated monthly sales computed. As expected, the unit column displays the latest estimated monthly units sold.

On the right of the two metrics, you will see a colored dot representing the confidence score (green, orange, or red).

Tracked products in sales estimator tool

Product metrics chart

The two metrics on the top left are the same as the one shown in the product list. These are the latest estimated monthly sales and the latest estimated monthly unit sold.

Even though we compute estimated monthly sales and estimated monthly units sold (an estimation of how much a product was sold over the last 30 days), we can switch to a daily vision just by dividing estimated sales and estimated units sold by 30.

This chart is a visualization of estimated daily sales (center blue line & scatter - left axis) & units sold (green bars - right axis).

The top & bottom blue lines are the maximum estimated sales and the minimum estimated sales.

The last metric on the bottom left of the chart is the aggregated estimations. We complie what's shown on the chart above and display it here. This aggregation changes when the date range input (on the top of the page) is changed.

Confidence score proportions

The confidence score shown in the app depends on the computed error of the model used to compute the estimate.

Confidence Score:

  • Green: High Trust Index ⇒ ModelErrorRate ≤ 50%
  • Orange: Moderate Trust Index ⇒ ****50% < ModelErrorRate ≤ 100%
  • Red: Low Trust Index ⇒ ModelErrorRate > 100%

Using these parameters to determine confidence score, there are approximately 1/3 of high, 1/3 of moderate, and 1/3 of low confidence estimates throughout all our products. These parameters won't change and we're always working to better our algorithms to increase the high and moderate ratios.

The main reason your product could have a low confidence score is the lack of data points within that product's category. As a DataHawk customer, you should aim to track as many products as possible to provide our algorithm with more data to work with.

7. Error Rate & Competitors

We asked customers for real sales data to compute an error rate and to compare our estimator with our competitors'. We will not yet discuss the confidence score (computed upstream, depending on the model used) but instead, the error rate computed afterward: the difference between real sales and estimated sales figures.

We follow this strict process to compute our error rate:

  1. We sum the real quantity sold for all the ASINs for which we have sales data and group this sum by parent ASIN and by month. As said earlier, the estimates are on a parent level.
  2. We sum the estimated quantity sold and group it by parent ASIN and by month.
  3. We compare the estimate and real sales for each parent ASIN for each month as follows: $Error Rate = Abs(Estimate-Real)/Real$ As you can see, to always get a positive error rate and ease the analysis, we use the absolute value (always returns a positive difference).
  4. We take the median error rate of all parents within a marketplace within a month. We believe using the median is the most accurate way to understand our accuracy. The smaller the error the better.
  5. Another way to look at this is by computing an accuracy rather than an error rate. $Accuracy = 1-Median(ErrorRate)$

We computed the error rate of 20 products around the Amazon-US marketplace across 7 different root categories and compared the results with estimates from our competitors.

Sales estimator tool- Error Rate & Competitors

1st Quartile: The first quartile median error rate is 12%. This means that 25% of the estimations we give have an accuracy of 88% or more.

Median: The median error stays around 35% (accuracy of 52%), which means 50% of the estimations we give have an accuracy of 65% or more.

3rd Quartile: The third quartile median error rate is 52%. This means that 75% of the estimations we give have an accuracy of 48% or more.

Average: The average error for these estimations is 42%.

  1. DataHawk's estimator is by far the most stable out there (see average error differences). Even our worst estimates (3rd quartile of error rates) have a 52% error rate which is two times better than JungleScout's and four times better than Helium 10's. DataHawk's Sales Estimator doesn't allow estimates to be off. This chart shows that you can trust the vast majority of the estimates we provide.
  2. When looking at our best estimations (1st quartile and median figures), we can say that in addition to being stable, DataHawk's estimator has one of the best accuracy rates, especially among other top-selling products.

Of course, we understand our analysis' sample is rather small and the ranking between competitors could be different using a different one.

The output of this analysis shows an important thing: most Amazon third-party tools, like DataHawk, that provide sales estimates are using similar logarithmic, root category best seller rank-based models to compute these estimates and these estimates are tightly bunched between competitors.

8. What's next?

Our newly-released sales estimator tool was released in September 2021. In addition to working on planned feature updates, we are also implementing customer feedback (thanks!). The next update to our sales estimator tool will include:

  • Statistical differences of listings (reviews, keyword ranks, ratings) between children's variations of products. With additional analytics and intelligence, we aim to provide a trustworthy way to divide the sales estimates between different variants of the same product.
  • Looking at data such as BuyBox winners, and the number of sellers for a product to enable seller-level sales estimates.
  • Possibly include historical estimates for a maximum of products by using their ranks in sub-nodes categories thanks to the BSR top 100 data we already have. This would also enable estimates summed up for a whole category and the evolution of categories compared to each other.
  • We are aiming to increase our accuracy even more. You can help us achieve this by continuing to track as many products as you have.

9. Conclusion

Hopefully, you have a better understanding of what it's like to create a data-science feature from scratch at DataHawk. Those of you who made it this far are honorary DataHawk data analysts now! Thanks to everyone who took the time to read this article and thanks to everyone on the team who worked on this great feature. As an Amazon Vendor or Seller, the DataHawk Sales Estimator is a simple way for you to spy on your competitors' sales, unveil how competing products are doing and also analyze niches & improve your strategies. Enjoy, we cannot wait for your feedback!

Read also

March 18, 2021 Educational
Amazon Sales Rank: Rank Higher In 2021

Best Seller Rank(BSR) indicates how well your product sells. The sales...

September 10, 2020 Educational
Top Ways To Improve Your Amazon Sales Rank

Your Amazon Sales Rank is a significant factor in determining where yo...

March 20, 2018 Educational
Your eCommerce Guide to Amazon Product Research

Getting started with Amazon product research and marketing analytics? ...

Get the latest eCommerce and Amazon insights and trends delivered straight to your inbox

IE Notice
Please note that DataHawk no longer supports Internet Explorer.

We recommend upgrading to the latest Microsoft Edge, Google Chrome, or Firefox.