Using Python Jupyter
Using Python Jupyter with BigQuery
Jupyter notebooks are a great way to explore, analyze, and visualize your Amazon data stored in BigQuery.
Prerequisites
- You have completed the setup from How to connect (BigQuery integration + Google Cloud credentials).
- You have Python 3.10 or higher installed.
Step 1: Install dependencies
Create a project directory and install the required packages:
1mkdir my-datadoe-analysis && cd my-datadoe-analysis
2pip install jupyter google-cloud-bigquery pandas db-dtypes matplotlibStep 2: Start Jupyter
Launch the notebook server:
1jupyter notebookThis opens Jupyter in your browser. Create a new notebook with a Python 3 kernel.
Step 3: Set up the BigQuery client
In the first cell, import the packages and create a BigQuery client. You can find your project ID and dataset ID on the DataDoe BigQuery Integrations page.
1from google.cloud import bigquery
2import pandas as pd
3
4project = "YOUR_PROJECT_ID"
5dataset = "YOUR_DATASET_ID"
6
7client = bigquery.Client(project=project)Step 4: Run a test query
Create a simple query to verify the connection - daily sales for the last 30 days:
1query = f"""
2SELECT
3 order_date,
4 SUM(item_price_value) AS sales_value
5FROM `{project}.{dataset}.amazon_order_items_with_cogs`
6WHERE order_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
7GROUP BY order_date
8ORDER BY order_date
9"""
10
11df = client.query(query).to_dataframe()
12df["order_date"] = pd.to_datetime(df["order_date"])
13print(df.head())
14df.to_csv("sales_value_last_30d.csv", index=False)Step 5: Query product stats with ads data
This more advanced query joins orders and ads tables to compute product-level stats including ACoS, ROAS, CPC, and CTR:
1orders_table = f"`{project}.{dataset}.amazon_order_items_with_cogs`"
2ads_table = f"`{project}.{dataset}.amazon_ads_performance_by_child_asin_and_date`"
3
4query = f"""
5WITH
6 orders AS (
7 SELECT
8 child_asin,
9 product_name,
10 SUM(quantity) AS units_sold,
11 SUM(item_price_value) AS sales_value
12 FROM {orders_table}
13 WHERE order_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
14 GROUP BY child_asin, product_name
15 ),
16 ads AS (
17 SELECT
18 child_asin,
19 product_name,
20 SUM(ad_spend) AS ad_spend,
21 SUM(ad_clicks) AS clicks,
22 SUM(ad_impressions) AS impressions,
23 SUM(ad_units_sold_14d) AS ad_conversions,
24 SUM(ad_sales_14d) AS ad_attributed_sales
25 FROM {ads_table}
26 WHERE date >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
27 GROUP BY child_asin, product_name
28 )
29SELECT
30 COALESCE(o.child_asin, a.child_asin) AS child_asin,
31 COALESCE(o.product_name, a.product_name) AS product_name,
32
33 IFNULL(o.units_sold, 0) AS units_sold,
34 IFNULL(o.sales_value, 0) AS sales_value,
35
36 IFNULL(a.ad_spend, 0) AS ad_spend,
37 IFNULL(a.clicks, 0) AS clicks,
38 IFNULL(a.impressions, 0) AS impressions,
39 IFNULL(a.ad_conversions, 0) AS ad_conversions,
40 IFNULL(a.ad_attributed_sales, 0) AS ad_attributed_sales,
41
42 SAFE_DIVIDE(a.ad_spend, NULLIF(o.sales_value, 0)) AS acos_on_total_sales,
43 SAFE_DIVIDE(o.sales_value, NULLIF(a.ad_spend, 0)) AS roas_on_total_sales,
44 SAFE_DIVIDE(a.ad_spend, NULLIF(a.clicks, 0)) AS cpc,
45 SAFE_DIVIDE(a.clicks, NULLIF(a.impressions, 0)) AS ctr
46FROM orders o
47FULL OUTER JOIN ads a
48 USING (child_asin, product_name)
49ORDER BY sales_value DESC
50"""
51
52df_product_stats = client.query(query).to_dataframe(create_bqstorage_client=False)
53df_product_stats.to_csv("product_stats.csv", index=False)Step 6: Find your top products
Aggregate and rank products by sales value:
1top = (
2 df_product_stats
3 .groupby(["product_name", "child_asin"], as_index=False)
4 .agg(
5 sales_value=("sales_value", "sum"),
6 units_sold=("units_sold", "sum"),
7 ad_spend=("ad_spend", "sum"),
8 ad_conversions=("ad_conversions", "sum"),
9 clicks=("clicks", "sum"),
10 impressions=("impressions", "sum"),
11 )
12)
13top["roas"] = top["sales_value"] / top["ad_spend"].replace({0: pd.NA})
14top = top.sort_values("sales_value", ascending=False).head(20)
15
16print(top)
17top.to_csv("top_products.csv", index=False)Step 7: Visualize the data
Bar chart — top 5 products by sales
1import matplotlib.pyplot as plt
2
3top_five = top.head(5)
4plt.figure(figsize=(10, 4))
5plt.bar(top_five["child_asin"], top_five["sales_value"], color="skyblue")
6plt.xlabel("Child ASIN")
7plt.ylabel("Sales Value")
8plt.title("Top 5 Products by Sales Value")
9plt.show()Full example notebook
A complete, ready-to-run Jupyter notebook with more examples is available on GitHub: DataDoe BigQuery Jupyter Example
