# 6. Step E — Granule density comparison WT vs AD (`benchmark_subtyping.ipynb`)

[← Tutorial index](./README.md)

This step assumes each granule has **`granule_subtype_manual_simple`**, coordinates (**`global_x` / `global_y`** in **`obs`** after `profile`, or `sphere_*` before rename), and a **sample** column (**`"WT"`** / **`"AD"`** or batch names).

### Step E1 — Spatial reference: spots per sample

Load **`spots.h5ad`** for WT and AD separately. Each should expose **`brain_area`** and spot centroids **`global_x`**, **`global_y`** (after any sample-specific alignment used in your pipeline).

### Step E2 — Density definition (50 µm grid by default in the notebook helpers)

The helper **`compute_subtype_density_per_region`** (in the benchmark notebook) implements:

- For each **sample**, each **brain_area**, and each **subtype** (plus an **"overall"** row):
  - Sum over spots: for each spot center, count granules whose \((x,y)\) falls in a **square window** of half-width **`grid_len/2`** (default **`grid_len=50`**).
  - **Density** = (total granule–spot hits) / (**number of spots** in that brain area).

So density is “expected granules per spot” under that counting rule, not volume density in µm³.

### Step E3 — AD capture-efficiency correction

The notebook scales **AD** densities and per-spot counts by a fixed factor to compare to WT:

```python
CAPTURE_EFFICIENCY_COEF = 0.818691
# After computing AD densities or counts:
# density_ad = density_ad / CAPTURE_EFFICIENCY_COEF
```

Adjust or omit if your study does not use this calibration.

### Step E4 — Per-spot counts for statistics

**`compute_subtype_per_spot_counts`** builds one row per (sample, brain_area, subtype, spot) with the number of granules in that spot’s window. These streams feed:

- Bootstrap **95% CI** for mean density (optional loop in the notebook).
- **Welch t-test** on **`log1p(count)`** between WT and AD per (brain_area, subtype).
- **Bonferroni** and **Benjamini–Hochberg FDR** on p-values.

### Step E5 — Export

Results are merged into tables such as **`subtype_density_per_region_{setting_key}.csv`** and label Parquets (**`granule_subtype_labels_{setting_key}.parquet`**). Use the same **`setting_key`** string your benchmark loop uses for traceability.

**Next:** [Step F — Neuropil subdomains](./07_neuropil_subdomains.md)