Transcriptomics-to-Image Retrieval

We use spatial transcriptomics (ST) data paired with H&E images to identify image features linked to the high expression of a specific gene (for example, SERPING1). Instead of simply showing image patches with high gene expression, which wouldn’t account for visual similarities, we take a more structured approach. For each sample, we group image patches around ST detection spots into eight clusters based on encoding similarity, using the k-means algorithm. We do this clustering separately for each sample to avoid batch effects. Next, we test whether the expression of the query gene is significantly higher in spots within a given image cluster compared to those outside the cluster. As a result, for each gene, we produce a set of image clusters strongly associated with its high expression (Cohen’s d > 1 and FDR < 0.05).