Project Log 2
The Needle in the Gigapixel Haystack
I am officially starting work on HistoHelper.
If you have read my About page, you know why I am in Medical Laboratory Science. I am here to understand the mechanics of disease, specifically melanoma, from the ground up. But understanding the theory is only half the battle. The other half is building tools that actually do something useful with that theory.
The Problem with SVS Files
In digital pathology, tissue biopsies are scanned into Whole Slide Images (WSIs), usually saved as .svs files. To call these files images in the same way we call a .png an image is a massive understatement. They are gigapixel monsters. A single file can easily be 2GB to 3GB in size and contain billions of pixels.
When a pathologist reviews one of these, they are panning and zooming across a massive digital landscape looking for microscopic anomalies. It is literally finding a needle in a gigapixel haystack. Visual fatigue is a very real threat to diagnostic accuracy.
I am building HistoHelper to act as a digital second pair of eyes. It isn’t there to replace the expert. It is there to flag regions of interest so the expert can focus their energy where it matters most.
The Boss Fight: Predictive Morphology
Basic anomaly detection is the first step, but the ultimate goal of this project—the real reason I am building the infrastructure is to answer a much harder question:
Can an AI model differentiate between a Melanoma that metastasised to the Lung vs. one that metastasised to the Brain, based solely on the morphology of the Primary Skin Lesion?
In oncology, this is known as Organ Tropism. Right now, predicting where a cancer will spread is largely reactive. But what if the ‘seeds’ of organ specific spread are already visible in the primary biopsy, just in patterns too subtle or complex for the human eye to reliably classify?
If a neural network can find those patterns, we move from reactive treatment to proactive, personalised oncology.
Step One: Don’t Melt the Homelab
Before I can train a model to detect cancer, I have to figure out how to load a 3GB image into Python without my computer catching fire.
The immediate roadmap isn’t about deep learning yet. It is about data pipelines. I need to build a robust way to open SVS files, tile them into thousands of manageable 256x256 pixel squares, filter out the blank background space, and normalise the tissue colours.
Once I have a pipeline that can reliably digest the data, then I get to play with PyTorch.
For now, it is time to dig into the OpenSlide documentation and see if I can get this first SVS file to render. I will log the progress (and the inevitable errors) here.