# 7. Step F — Neuropil subdomain analysis, Isocortex (`7_neuropil_subdomains.ipynb`) [← Tutorial index](./README.md) The following is the **final-used** setting: **Isocortex**, **50×50** µm grids, **hard** `spot_embedding`, **row-normalized** subtype embedding, **K-Means with k = 4**. Soft embeddings, 25×25 grids, unnormalized embeddings, inertia/ARI sweeps, LDA/GMM/MiniBatch variants, and **k = 5** are omitted. ### Step F1 — Region and grid constants ```python ROI = "Isocortex" grid_len = "50_by_50" grid_len_num = 50 ``` ### Step F2 — Inputs (prepared elsewhere in the pipeline) You need consistent objects under your comparison output directory (paths in the notebook are relative to `code/`): - **`spots`** — Combined WT + AD spot `AnnData` for Isocortex with **`global_x`**, **`global_y`**, **`layer_labels`**, **`batch`**, and aligned coordinates (the notebook uses **`recover_spots`** for 50×50 to attach **`layer_labels`** from a reference `h5ad`). - **`adata`** — Cell-level `AnnData` (e.g. **`neuropil_subdomains_adata.h5ad`**). - **`granule_adata`** — Granule `AnnData` restricted to the neuropil workflow (e.g. **`neuropil_subdomains_granule_adata.h5ad`**) with **`granule_subtype_kmeans`** in **`obs`** and raw counts in **`layers["counts"]`**. - **`count_matrix`** — Per-cell gene counts with **`cell_id`**, gene columns aligned to **`granule_adata.var_names`**. **Neurons for soma features:** subset cells to Isocortex and excitatory/inhibitory types, e.g. `brain_area` matching **`ROI`** and **`cell_type`** in `["Glutamatergic", "GABAergic"]` → **`adata_neuron`**. ### Step F3 — Hard embedding (`spot_embedding`) Call **`spot_embedding`** from **`mcDETECT.downstream`** with **hard** assignment to 50×50 windows, Gaussian smoothing of subtype counts, and soma features: ```python import numpy as np from mcDETECT.downstream import spot_embedding embeddings, embeddings_features, aux_features, spot_granule_expression, spot_cell_expression = spot_embedding( spots=spots, granule_adata=granule_adata, adata=adata_neuron, count_matrix=count_matrix, spot_loc_key=("global_x", "global_y"), spot_width=grid_len_num, spot_height=grid_len_num, granule_loc_key=("global_x", "global_y"), granule_subtype_key="granule_subtype_kmeans", subtype_names=[str(i) for i in range(granule_adata.obs["granule_subtype_kmeans"].nunique())], granule_count_layer="counts", cell_loc_key=("global_x", "global_y"), cell_id_key="cell_id", count_matrix_cell_id_key="cell_id", include_soma_features=True, smoothing=True, smoothing_radius=np.sqrt(2) * grid_len_num + 1, smoothing_mode="gaussian", ) for aux_key, aux_val in aux_features.items(): spots.obs[aux_key] = aux_val ``` **Meaning:** **`embeddings`** is a 2D array (**`n_spots × n_subtype_columns`**): per spot, counts (and soma-related columns if included) aggregated into that grid cell. Granules/cells are assigned to **one** spot by containment in the **`spot_width × spot_height`** square around each centroid. ### Step F4 — Drop empty spots ```python mask = spots.obs["granule_count"] > 0 spots = spots[mask].copy() embeddings = embeddings[mask].copy() ``` ### Step F5 — Hard **normalized** embedding for clustering Normalize **each row** of **`embeddings`** to sum to 1 (subtype proportion per spot). This is the **“normalized”** mode used for clustering in the notebook: ```python row_sums = embeddings.sum(axis=1, keepdims=True) X = np.divide(embeddings, row_sums, out=np.zeros_like(embeddings, dtype=float), where=row_sums > 0) ``` ### Step F6 — K-Means with **k = 4** ```python from sklearn.cluster import KMeans n_clusters = 4 kmeans = KMeans(n_clusters=n_clusters, random_state=42, n_init=20) kmeans_labels = kmeans.fit_predict(X) spots.obs["subdomain_kmeans"] = [f"Subdomain {l + 1}" for l in kmeans_labels] ``` The full notebook optionally **relabels** clusters for consistent ordering across plots (a **`relabel_map`**); treat that as cosmetic once **`k = 4`** is fixed. ### Step F7 — Visualization and follow-ups (optional) - Spatial scatter of **`subdomain_kmeans`** colored by layer or subdomain. - **Clustermap** of mean **`X`** per subdomain × k-means cluster column (granule subtype dimensions). - For **differential expression** between two subdomains, the notebook uses **`spot_granule_expression`** / **`spot_cell_expression`** with **`sc.tl.rank_genes_groups`** on **`log1p`**-normalized counts — only if you need gene-level contrasts. **Next:** [Quick reference checklist](./08_checklist.md)