7. Step F — Neuropil subdomain analysis, Isocortex (`7_neuropil_subdomains.ipynb`)

← Tutorial index

The following is the final-used setting: Isocortex, 50×50 µm grids, hard spot_embedding, row-normalized subtype embedding, K-Means with k = 4. Soft embeddings, 25×25 grids, unnormalized embeddings, inertia/ARI sweeps, LDA/GMM/MiniBatch variants, and k = 5 are omitted.

Step F1 — Region and grid constants

ROI = "Isocortex"
grid_len = "50_by_50"
grid_len_num = 50

Step F2 — Inputs (prepared elsewhere in the pipeline)

You need consistent objects under your comparison output directory (paths in the notebook are relative to code/):

spots — Combined WT + AD spot AnnData for Isocortex with global_x, global_y, layer_labels, batch, and aligned coordinates (the notebook uses recover_spots for 50×50 to attach layer_labels from a reference h5ad).
adata — Cell-level AnnData (e.g. neuropil_subdomains_adata.h5ad).
granule_adata — Granule AnnData restricted to the neuropil workflow (e.g. neuropil_subdomains_granule_adata.h5ad) with granule_subtype_kmeans in obs and raw counts in layers["counts"].
count_matrix — Per-cell gene counts with cell_id, gene columns aligned to granule_adata.var_names.

Neurons for soma features: subset cells to Isocortex and excitatory/inhibitory types, e.g. brain_area matching ROI and cell_type in ["Glutamatergic", "GABAergic"] → adata_neuron.

Step F3 — Hard embedding (`spot_embedding`)

Call spot_embedding from mcDETECT.downstream with hard assignment to 50×50 windows, Gaussian smoothing of subtype counts, and soma features:

import numpy as np
from mcDETECT.downstream import spot_embedding

embeddings, embeddings_features, aux_features, spot_granule_expression, spot_cell_expression = spot_embedding(
    spots=spots,
    granule_adata=granule_adata,
    adata=adata_neuron,
    count_matrix=count_matrix,
    spot_loc_key=("global_x", "global_y"),
    spot_width=grid_len_num,
    spot_height=grid_len_num,
    granule_loc_key=("global_x", "global_y"),
    granule_subtype_key="granule_subtype_kmeans",
    subtype_names=[str(i) for i in range(granule_adata.obs["granule_subtype_kmeans"].nunique())],
    granule_count_layer="counts",
    cell_loc_key=("global_x", "global_y"),
    cell_id_key="cell_id",
    count_matrix_cell_id_key="cell_id",
    include_soma_features=True,
    smoothing=True,
    smoothing_radius=np.sqrt(2) * grid_len_num + 1,
    smoothing_mode="gaussian",
)

for aux_key, aux_val in aux_features.items():
    spots.obs[aux_key] = aux_val

Meaning: embeddings is a 2D array (n_spots × n_subtype_columns): per spot, counts (and soma-related columns if included) aggregated into that grid cell. Granules/cells are assigned to one spot by containment in the spot_width × spot_height square around each centroid.

Step F4 — Drop empty spots

mask = spots.obs["granule_count"] > 0
spots = spots[mask].copy()
embeddings = embeddings[mask].copy()

Step F5 — Hard normalized embedding for clustering

Normalize each row of embeddings to sum to 1 (subtype proportion per spot). This is the “normalized” mode used for clustering in the notebook:

row_sums = embeddings.sum(axis=1, keepdims=True)
X = np.divide(embeddings, row_sums, out=np.zeros_like(embeddings, dtype=float), where=row_sums > 0)

Step F6 — K-Means with k = 4

from sklearn.cluster import KMeans

n_clusters = 4
kmeans = KMeans(n_clusters=n_clusters, random_state=42, n_init=20)
kmeans_labels = kmeans.fit_predict(X)
spots.obs["subdomain_kmeans"] = [f"Subdomain {l + 1}" for l in kmeans_labels]

The full notebook optionally relabels clusters for consistent ordering across plots (a relabel_map); treat that as cosmetic once k = 4 is fixed.

Step F7 — Visualization and follow-ups (optional)

Spatial scatter of subdomain_kmeans colored by layer or subdomain.
Clustermap of mean X per subdomain × k-means cluster column (granule subtype dimensions).
For differential expression between two subdomains, the notebook uses spot_granule_expression / spot_cell_expression with sc.tl.rank_genes_groups on log1p-normalized counts — only if you need gene-level contrasts.

Next: Quick reference checklist

7. Step F — Neuropil subdomain analysis, Isocortex (7_neuropil_subdomains.ipynb)