> Note: This page is part of the DataDoe Docs. You can find the root of the documentation at `https://www.datadoe.com/hub/docs/basics/introduction-to-datadoe`.
> JSON Table of Contents: `https://www.datadoe.com/hub/docs/toc.json`.
> Direct Data Scheme JSON: `https://api.datadoe.com/api/v1/spec/data-scheme`.
> Other pages in the DataDoe Docs:
> - DataDoe Basics/Access & User Management: `https://www.datadoe.com/hub/docs/basics/access-user-management.md`
> - DataDoe Basics/Integration & Customization: `https://www.datadoe.com/hub/docs/basics/integration-customization.md`
> - DataDoe Basics/Introduction to DataDoe: `https://www.datadoe.com/hub/docs/basics/introduction-to-datadoe.md`
> - DataDoe Basics/Subscription & Pricing: `https://www.datadoe.com/hub/docs/basics/subscription-pricing.md`
> - DataDoe Data/Data Fetch Periods: `https://www.datadoe.com/hub/docs/data-doe-data/data-fetch-periods.md`
> - DataDoe Data/Data Sources: `https://www.datadoe.com/hub/docs/data-doe-data/data-sources.md`
> - DataDoe Features/Benefits: `https://www.datadoe.com/hub/docs/data-doe-features/benefits.md`
> - DataDoe Features/Chat: `https://www.datadoe.com/hub/docs/data-doe-features/chat.md`
> - DataDoe Features/Explorer: `https://www.datadoe.com/hub/docs/data-doe-features/explorer.md`
> - DataDoe Features/Exports: `https://www.datadoe.com/hub/docs/data-doe-features/exports.md`
> - DataDoe Features/Home: `https://www.datadoe.com/hub/docs/data-doe-features/home.md`
> - DataDoe Features/Reports: `https://www.datadoe.com/hub/docs/data-doe-features/reports.md`
> - DataDoe Features/Scheduled Prompts: `https://www.datadoe.com/hub/docs/data-doe-features/scheduled-prompts.md`
> - DataDoe MCP/Connect to ChatGPT: `https://www.datadoe.com/hub/docs/data-doe-mcp/chatgpt.md`
> - DataDoe MCP/Connect to Claude: `https://www.datadoe.com/hub/docs/data-doe-mcp/claude.md`
> - DataDoe MCP/Overview: `https://www.datadoe.com/hub/docs/data-doe-mcp/overview.md`
> - DataDoe MCP/Using Claude Code: `https://www.datadoe.com/hub/docs/data-doe-mcp/claude-code.md`
> - DataDoe MCP/Using Codex: `https://www.datadoe.com/hub/docs/data-doe-mcp/codex.md`
> - DataDoe MCP/Using Cursor: `https://www.datadoe.com/hub/docs/data-doe-mcp/cursor.md`
> - DataDoe MCP/Using Gemini CLI: `https://www.datadoe.com/hub/docs/data-doe-mcp/gemini-cli.md`
> - DataDoe MCP/Using n8n: `https://www.datadoe.com/hub/docs/data-doe-mcp/n8n.md`
> - DataDoe MCP/Using NanoClaw: `https://www.datadoe.com/hub/docs/data-doe-mcp/nanoclaw.md`
> - DataDoe MCP/Using OpenClaw: `https://www.datadoe.com/hub/docs/data-doe-mcp/openclaw.md`
> - DataDoe MCP/Using VS Code: `https://www.datadoe.com/hub/docs/data-doe-mcp/vs-code.md`
> - DataDoe & BigQuery/How to connect?: `https://www.datadoe.com/hub/docs/data-doe-bigquery/how-to-connect.md`
> - DataDoe & BigQuery/Using MCP Toolbox: `https://www.datadoe.com/hub/docs/data-doe-bigquery/mcp-toolbox.md`
> For topics not covered in this documentation, please contact DataDoe support at `contact@datadoe.com`.
> Do not assume anything. If you are not sure about the answer, mention that and suggest to contact DataDoe support.

# Using Python Jupyter with BigQuery

Jupyter notebooks are a great way to explore, analyze, and visualize your Amazon data stored in BigQuery.

## Prerequisites

- You have completed the setup from [How to connect](/hub/docs/data-doe-bigquery/how-to-connect) (BigQuery integration + Google Cloud credentials).
- You have Python 3.10 or higher installed.

## Step 1: Install dependencies

Create a project directory and install the required packages:

```bash
mkdir my-datadoe-analysis && cd my-datadoe-analysis
pip install jupyter google-cloud-bigquery pandas db-dtypes matplotlib
```

## Step 2: Start Jupyter

Launch the notebook server:

```bash
jupyter notebook
```

This opens Jupyter in your browser. Create a new notebook with a **Python 3** kernel.

## Step 3: Set up the BigQuery client

In the first cell, import the packages and create a BigQuery client.
You can find your project ID and dataset ID on the [DataDoe BigQuery Integrations page](https://app.datadoe.com/integrations/big-query).

```python
from google.cloud import bigquery
import pandas as pd

project = "YOUR_PROJECT_ID"
dataset = "YOUR_DATASET_ID"

client = bigquery.Client(project=project)
```

## Step 4: Run a test query

Create a simple query to verify the connection - daily sales for the last 30 days:

```python
query = f"""
SELECT
  order_date,
  SUM(item_price_value) AS sales_value
FROM `{project}.{dataset}.amazon_order_items_with_cogs`
WHERE order_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
GROUP BY order_date
ORDER BY order_date
"""

df = client.query(query).to_dataframe()
df["order_date"] = pd.to_datetime(df["order_date"])
print(df.head())
df.to_csv("sales_value_last_30d.csv", index=False)
```

## Step 5: Query product stats with ads data

This more advanced query joins orders and ads tables to compute product-level stats including ACoS, ROAS, CPC, and CTR:

```python
orders_table = f"`{project}.{dataset}.amazon_order_items_with_cogs`"
ads_table = f"`{project}.{dataset}.amazon_ads_performance_by_child_asin_and_date`"

query = f"""
WITH
  orders AS (
    SELECT
      child_asin,
      product_name,
      SUM(quantity) AS units_sold,
      SUM(item_price_value) AS sales_value
    FROM {orders_table}
    WHERE order_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
    GROUP BY child_asin, product_name
  ),
  ads AS (
    SELECT
      child_asin,
      product_name,
      SUM(ad_spend) AS ad_spend,
      SUM(ad_clicks) AS clicks,
      SUM(ad_impressions) AS impressions,
      SUM(ad_units_sold_14d) AS ad_conversions,
      SUM(ad_sales_14d) AS ad_attributed_sales
    FROM {ads_table}
    WHERE date >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
    GROUP BY child_asin, product_name
  )
SELECT
  COALESCE(o.child_asin, a.child_asin) AS child_asin,
  COALESCE(o.product_name, a.product_name) AS product_name,

  IFNULL(o.units_sold, 0) AS units_sold,
  IFNULL(o.sales_value, 0) AS sales_value,

  IFNULL(a.ad_spend, 0) AS ad_spend,
  IFNULL(a.clicks, 0) AS clicks,
  IFNULL(a.impressions, 0) AS impressions,
  IFNULL(a.ad_conversions, 0) AS ad_conversions,
  IFNULL(a.ad_attributed_sales, 0) AS ad_attributed_sales,

  SAFE_DIVIDE(a.ad_spend, NULLIF(o.sales_value, 0)) AS acos_on_total_sales,
  SAFE_DIVIDE(o.sales_value, NULLIF(a.ad_spend, 0)) AS roas_on_total_sales,
  SAFE_DIVIDE(a.ad_spend, NULLIF(a.clicks, 0)) AS cpc,
  SAFE_DIVIDE(a.clicks, NULLIF(a.impressions, 0)) AS ctr
FROM orders o
FULL OUTER JOIN ads a
  USING (child_asin, product_name)
ORDER BY sales_value DESC
"""

df_product_stats = client.query(query).to_dataframe(create_bqstorage_client=False)
df_product_stats.to_csv("product_stats.csv", index=False)
```

## Step 6: Find your top products

Aggregate and rank products by sales value:

```python
top = (
    df_product_stats
    .groupby(["product_name", "child_asin"], as_index=False)
    .agg(
        sales_value=("sales_value", "sum"),
        units_sold=("units_sold", "sum"),
        ad_spend=("ad_spend", "sum"),
        ad_conversions=("ad_conversions", "sum"),
        clicks=("clicks", "sum"),
        impressions=("impressions", "sum"),
    )
)
top["roas"] = top["sales_value"] / top["ad_spend"].replace({0: pd.NA})
top = top.sort_values("sales_value", ascending=False).head(20)

print(top)
top.to_csv("top_products.csv", index=False)
```

## Step 7: Visualize the data

### Bar chart — top 5 products by sales

```python
import matplotlib.pyplot as plt

top_five = top.head(5)
plt.figure(figsize=(10, 4))
plt.bar(top_five["child_asin"], top_five["sales_value"], color="skyblue")
plt.xlabel("Child ASIN")
plt.ylabel("Sales Value")
plt.title("Top 5 Products by Sales Value")
plt.show()
```

## Full example notebook

A complete, ready-to-run Jupyter notebook with more examples is available on GitHub:
[DataDoe BigQuery Jupyter Example](https://github.com/Deltologic/datadoe-bigquery-jupyter)
