7. Step F — Neuropil subdomain analysis, Isocortex (7_neuropil_subdomains.ipynb)
The following is the final-used setting: Isocortex, 50×50 µm grids, hard spot_embedding, row-normalized subtype embedding, K-Means with k = 4. Soft embeddings, 25×25 grids, unnormalized embeddings, inertia/ARI sweeps, LDA/GMM/MiniBatch variants, and k = 5 are omitted.
Step F1 — Region and grid constants
ROI = "Isocortex"
grid_len = "50_by_50"
grid_len_num = 50
Step F2 — Inputs (prepared elsewhere in the pipeline)
You need consistent objects under your comparison output directory (paths in the notebook are relative to code/):
spots— Combined WT + AD spotAnnDatafor Isocortex withglobal_x,global_y,layer_labels,batch, and aligned coordinates (the notebook usesrecover_spotsfor 50×50 to attachlayer_labelsfrom a referenceh5ad).adata— Cell-levelAnnData(e.g.neuropil_subdomains_adata.h5ad).granule_adata— GranuleAnnDatarestricted to the neuropil workflow (e.g.neuropil_subdomains_granule_adata.h5ad) withgranule_subtype_kmeansinobsand raw counts inlayers["counts"].count_matrix— Per-cell gene counts withcell_id, gene columns aligned togranule_adata.var_names.
Neurons for soma features: subset cells to Isocortex and excitatory/inhibitory types, e.g. brain_area matching ROI and cell_type in ["Glutamatergic", "GABAergic"] → adata_neuron.
Step F3 — Hard embedding (spot_embedding)
Call spot_embedding from mcDETECT.downstream with hard assignment to 50×50 windows, Gaussian smoothing of subtype counts, and soma features:
import numpy as np
from mcDETECT.downstream import spot_embedding
embeddings, embeddings_features, aux_features, spot_granule_expression, spot_cell_expression = spot_embedding(
spots=spots,
granule_adata=granule_adata,
adata=adata_neuron,
count_matrix=count_matrix,
spot_loc_key=("global_x", "global_y"),
spot_width=grid_len_num,
spot_height=grid_len_num,
granule_loc_key=("global_x", "global_y"),
granule_subtype_key="granule_subtype_kmeans",
subtype_names=[str(i) for i in range(granule_adata.obs["granule_subtype_kmeans"].nunique())],
granule_count_layer="counts",
cell_loc_key=("global_x", "global_y"),
cell_id_key="cell_id",
count_matrix_cell_id_key="cell_id",
include_soma_features=True,
smoothing=True,
smoothing_radius=np.sqrt(2) * grid_len_num + 1,
smoothing_mode="gaussian",
)
for aux_key, aux_val in aux_features.items():
spots.obs[aux_key] = aux_val
Meaning: embeddings is a 2D array (n_spots × n_subtype_columns): per spot, counts (and soma-related columns if included) aggregated into that grid cell. Granules/cells are assigned to one spot by containment in the spot_width × spot_height square around each centroid.
Step F4 — Drop empty spots
mask = spots.obs["granule_count"] > 0
spots = spots[mask].copy()
embeddings = embeddings[mask].copy()
Step F5 — Hard normalized embedding for clustering
Normalize each row of embeddings to sum to 1 (subtype proportion per spot). This is the “normalized” mode used for clustering in the notebook:
row_sums = embeddings.sum(axis=1, keepdims=True)
X = np.divide(embeddings, row_sums, out=np.zeros_like(embeddings, dtype=float), where=row_sums > 0)
Step F6 — K-Means with k = 4
from sklearn.cluster import KMeans
n_clusters = 4
kmeans = KMeans(n_clusters=n_clusters, random_state=42, n_init=20)
kmeans_labels = kmeans.fit_predict(X)
spots.obs["subdomain_kmeans"] = [f"Subdomain {l + 1}" for l in kmeans_labels]
The full notebook optionally relabels clusters for consistent ordering across plots (a relabel_map); treat that as cosmetic once k = 4 is fixed.
Step F7 — Visualization and follow-ups (optional)
Spatial scatter of
subdomain_kmeanscolored by layer or subdomain.Clustermap of mean
Xper subdomain × k-means cluster column (granule subtype dimensions).For differential expression between two subdomains, the notebook uses
spot_granule_expression/spot_cell_expressionwithsc.tl.rank_genes_groupsonlog1p-normalized counts — only if you need gene-level contrasts.