AI Algorithms in Radiology: How Machines Learned to See, Understand, and Assist Radiologists

AI Algorithms in Radiology - Interactive White Paper
AI in Radiology
MIT Style White Paper

AI Algorithms in Radiology

From Perceptrons to Foundation Models

How Machines Learned to See, Understand, and Assist Radiologists

Executive Summary

Radiology has been one of the earliest medical specialties to adopt artificial intelligence because imaging data are inherently digital, high-dimensional, and information-rich. The evolution of radiology AI has progressed through multiple generations of algorithms, each developed to overcome limitations of its predecessor.

The journey began with rule-based symbolic systems and handcrafted feature engineering, progressed through radiomics and classical machine learning, accelerated with deep learning and convolutional neural networks (CNNs), expanded through segmentation and detection architectures, and entered a new era with transformers, foundation models, vision-language systems, and multimodal AI.

This white paper reviews that evolution from first principles, explaining not only what each algorithm is, but why it emerged, what problem it solved, and how it contributed to modern radiology AI.

PART I: Foundations of Machine Intelligence

Chapter 1. Why Radiologists Should Understand AI

The Imaging Explosion

Modern radiology faces unprecedented challenges:

  • Increasing imaging volumes and image complexity
  • Multimodality datasets and longitudinal patient histories
  • Workforce shortages & Demand for faster reporting

AI emerged as a direct response to these compounding pressures, offering tools to augment human capabilities and manage the exponential growth in diagnostic data.

Chapter 2. What Does It Mean for a Machine to Learn?

The Umbrella Analogy

Suppose we must make a binary decision: Carry an umbrella? or Do not carry an umbrella?

Professor's Note: Bias and Weights

In AI, Bias acts as a baseline assumption or predisposition (e.g., human nature or the current season). If it's the rainy season, your bias to carry an umbrella is naturally very high, regardless of the daily forecast. Weights determine how much a specific feature matters. For instance, if hurricane-level winds are predicted, the weight to not take the umbrella (because it will break) must be overwhelmingly strong to override the rainy season bias.

Interactive Tool: Weights & Season Bias

Test how strongly negative weights (Wind) interact with strong foundational Biases (Season).

Chance of Rain 50%

Weight: +1.0 (Positive factor)

Wind Speed (Danger) Low

Weight: -2.0 (Strong negative factor)

Math: (Rain × 1.0) + (Wind × -2.0) + Bias = Z
Decision Pending...

Chapter 3. Symbolic AI: The First Generation

Before machine learning, AI relied primarily on human-encoded rules. This is often called "Good Old-Fashioned AI" (GOFAI).

IF (Pulmonary_nodule_size > 8mm)
AND (Margin == "Spiculated")
THEN {
  Risk_Level = "High Malignancy Risk"
}

While transparent and deterministic, this approach broke down on edge cases and couldn't scale to the immense complexity of human anatomy.

Concept Check

Why is a strong negative weight necessary in the Umbrella example when wind speeds are hurricane-level?

PART II: Neural Networks and Deep Learning

Chapter 4 & 5. Perceptrons & Non-Linearity

The perceptron was the earliest artificial neural network. It takes inputs, applies weights to them, adds a bias, and passes the result through an activation function.

Anatomy of a Perceptron

Input X₁
Input X₂
Input X₃
↓ w₁ ↓ w₂ ↓ w₃
+ Bias
Σ (X × W)
Activation Fn. (Non-linear)
Prediction (Y)

Without an activation function (like ReLU or Sigmoid) introducing non-linearity, chaining perceptrons together is mathematically identical to a single perceptron. Non-linearity allows AI to model the messy, complex curves of biology.

Interactive Tool: The Artificial Neuron Simulator

Feature 1: Nodule Size 0.5
Weight (Importance) 0.8
Feature 2: Spiculation 0.2
Weight (Importance) 0.6
Bias (Clinical Threshold) -0.5
Math: (F1×W1) + (F2×W2) + Bias = Z
Z = 0.02
Benign/Low Activation Fn (Sigmoid) Malignant/High
Prediction: 50%

Indeterminate

Chapter 6. Multi-Layer Perceptrons (MLPs)

By chaining these artificial neurons together in layers, we created the Multi-Layer Perceptron. The progression went:

Multi-Layer Perceptron (MLP)

Input Hidden 1 Hidden 2 Output

This architecture allows AI systems to learn increasingly complex, non-linear relationships hidden in data.

Chapter 7. Why Deep Learning Changed Everything

The major paradigm shift between Classical AI and Deep Learning lies in feature engineering.

Classical AI

Human defines the features (e.g., shape, texture, histogram metrics, density). The machine learns to weigh these predefined features.

Revolution

Deep Learning

Machine learns the features automatically. This is known as Representation Learning.

Visualizing Representation Learning

Interactive Sequence

Click through the network layers to see how a Deep Learning model learns a concept (e.g., a Cat) from pure pixels.

1. Pixels
2. Edges
3. Whiskers/Ears
4. The "Cat"

Layer 1: The machine only sees a grid of numbers (RGB pixel values). It has no concept of objects yet. Click the next node.

PART III & IV: Computer Vision & Quantitative Imaging

Chapter 8. Radiomics: The Quantitative Imaging Revolution

Radiomics represents a major milestone in radiology AI, creating the bridge between classical machine learning and deep learning.

Traditional Radiology: Image → Visual interpretation
Radiomics: Image → Feature Extraction → Machine Learning → Prediction

Radiomic features extract hundreds of mathematical metrics invisible to the human eye, including:

  • Texture (GLCM, GLRLM)
  • Shape and morphology
  • Entropy and histogram features
  • Wavelet transformations

Applications included precision oncology, tumor characterization, prognosis prediction, survival analysis, and response assessment.

Chapter 9. Classical Machine Learning in Radiology

Algorithms like Logistic Regression, Support Vector Machines (SVMs), Random Forests, and XGBoost were used to process radiomic data. While successful for focused tasks, they were limited by their dependence on handcrafted features, poor scalability, and limited generalization.

Chapter 10. CNNs: Teaching Machines to See

Convolutional Neural Networks (CNNs) revolutionized image analysis. Instead of learning from handcrafted features, CNNs learn representations directly from pixels. This is achieved via a series of mathematically specialized, feed-forward computational layers:

1. Convolution Layer

Applies small moving kernels (matrices) to the image pixels. By computing dot products over local receptive fields, the network extracts spatial hierarchies like edges, curves, and textures.

2. Activation (ReLU)

Passes raw outputs through element-wise non-linear functions (typically Rectified Linear Units: $f(x) = \max(0, x)$). Negative values are set to zero, enabling the model to learn complex, non-linear anatomical structures.

3. Pooling Layer

Reduces the spatial size of representation matrices. Governed by rules like Max Pooling (extracting the maximum value from a sub-window, e.g., 2x2 grid) or Average Pooling. It shrinks compute requirements and fosters translation invariance.

How Max Pooling (2x2 Matrix Reduction) Works

Before Max Pool (2x2 Windows)
12
20
8
12
8
19
5
6
3
0
2
4
2
9
1
8
After Max Pool (Max of each Color Grid)
20
12
9
8
Pixels Edges Shapes Structures Pathology

Chapter 11. Landmark CNN Architectures & Radiology Applications

Different CNN classes are custom-tailored for specialized medical imaging workloads. The matrix below outlines how modern clinical setups leverage these variations to solve specific tasks:

CNN Architecture / Family Primary AI Task Clinical Use Case Why Chosen (Clinical / Technical Logic)
U-Net / nnU-Net Semantic Segmentation Liver Donor Volumetry & Brain Tumor Contouring Features a symmetric contracting encoder path (for abstract context) and expanding decoder path (for localization) connected via skip connections. These preserve spatial pixel location details, allowing precise volumetric calculations of prospective liver transplant donor grafts.
YOLO (v4 - v8) Bounding-Box Object Detection Emergency Fracture & Pneumothorax Detection "You Only Look Once" processes entire scans in a single pass using simple bounding-box regressions instead of multi-stage cropping. This delivers near-instantaneous (sub-second) identification of acute, life-threatening pathologies on trauma screening.
ResNet (50 / 101) Disease Classification Mammographic Breast Density Grading Utilizes residual "shortcut connections" that bypass intermediate layers. This mitigates vanishing gradients, permitting extremely deep networks to extract highly complex micro-calcifications and subtle architectural breast tissues.
3D-CNN Volumetric Spatiotemporal Processing Sequential CT Lung Nodule Assessment Extends standard 2D convolutions into a third dimensional axis. Allows kernels to convolve across sequential CT or MRI slices simultaneously, capturing spatial continuity essential to differentiate actual nodules from adjacent vascular trees.

Chapter 12. Computer Vision Tasks in Radiology

Click on any flashcard below to flip it and reveal the precise clinical algorithms, setups, and technical architectures utilized to execute each computer vision task.

Computer Vision Task

Classification

Determines whether a specific disease or abnormality is present or absent within an entire medical scan.

Click to reveal algorithm
Target Architectures

ResNet-50, DenseNet-121, EfficientNet

These models employ global average pooling layers followed by fully connected layers. They output soft probabilities (0 to 1) indicating the global likelihood of pathologies like pneumonia, pleural effusion, or fractures.

Click to return
Computer Vision Task

Detection

Identifies and localizes the exact coordinates of lesions or anatomical milestones with bounding boxes.

Click to reveal algorithm
Target Architectures

YOLO (v5/v8), Faster R-CNN, RetinaNet

Utilizes Region Proposal Networks (RPNs) or multi-scale grid anchoring to map localized visual signals. Outputs exact box parameters [x, y, width, height] paired with categorical confidence metrics.

Click to return
Computer Vision Task

Segmentation

Delineates the pixel-by-pixel boundaries of organs, lesions, or pathways, enabling detailed volumetric diagnostics.

Click to reveal algorithm
Target Architectures

U-Net, nnU-Net, DeepLabv3

Applies symmetrical contracting-expanding layers linked by skip connections. This recovers fine spatial details, outputting a precise mask where each individual pixel is assigned a classification index.

Click to return
Computer Vision Task

Registration

Aligns multiple datasets (e.g., baseline vs. follow-up, or MRI vs. CT) into a shared geometric space.

Click to reveal algorithm
Target Architectures

Voxelmorph, Spatial Transformer Networks (STN)

These systems calculate a highly localized dense deformation field. They compute grid-warping matrices dynamically, letting algorithms map one diagnostic scan over another at a voxel level.

Click to return

Chapter 13. AI Across Radiology Subspecialties

Artificial intelligence has progressed far beyond academic theory, firmly embedding itself into real-world subspecialty workflows. The integration of specialized computer vision pipelines has altered clinical operations across four primary clinical pillars:

Neuroradiology & Acute Stroke Care

In acute neurology, "time is brain." CNNs are deployed as automatic triage engines to parse emergency non-contrast Head CT scans. They screen for Intracranial Hemorrhages (ICH), identifying subdural, epidural, or subarachnoid bleeds in seconds, immediately escalating the study in the PACS reading queue.

Additionally, on CT Angiography (CTA), deep learning architectures locate Large Vessel Occlusions (LVOs) within the middle cerebral artery branches, triggering cellular pager alerts to the neuro-interventional team. For oncology, longitudinal segmentation engines track therapeutic responses in high-grade gliomas, calculating exact tumor dimensions with zero human inter-observer variability.

Thoracic Imaging & Pulmonary Triage

The chest radiograph is the most commonly performed diagnostic test globally. Modern thoracic pipelines leverage CNNs to detect pneumothoraces, shifting high-risk trauma patients directly to critical reading priority.

For volumetric chest CT, AI processes Pulmonary Embolisms (PE), analyzing spatial vascular maps to pinpoint filling defects inside the segmental pulmonary arteries. Meanwhile, specialized segmentation networks characterize Interstitial Lung Diseases (ILD), tracking quantitative fibrotic changes over time to monitor immunomodulatory treatments.

Breast Imaging & Screening Analytics

Mammography screenings involve identifying exceptionally subtle microcalcifications and distortion fields buried within complex glandular tissues. ResNet-based models are utilized as double-reading assistants, reducing false-negative diagnostic rates by up to 15%.

These models score Breast Density categories automatically, identifying dense breast parenchyma that may mask hidden malignancies. Advanced multimodal frameworks combine high-resolution 2D mammograms and 3D digital breast tomosynthesis (DBT) scans with patient demographic risks to calculate long-term lifetime breast cancer hazard scales.

Abdominal & Musculoskeletal Interventions

In abdominal imaging, precise vascular and organ boundary tracing is crucial. For prospective liver transplant cases, U-Net models perform Liver Donor Volumetry, computing total hepatic volume and splitting ratios in under a minute—a task that previously took hours of tedious manual tracing.

In Musculoskeletal (MSK) radiology, deep learning classifiers detect hairline cortical fractures, particularly of the scaphoid and pediatric growth plates, which are frequently missed by fatigued clinicians. For chronic rheumatologic care, deep neural networks score osteoarthritis severity on radiographs, tracking cartilage thinning and joint space narrowing systematically.

Chapter 14. The Segmentation Revolution

Architectures like U-Net and nnU-Net enabled precise pixel-level contouring, vital for organ segmentation, radiation therapy planning, and surgical planning.

Interactive Tool: Image Segmentation Masking

Move the slider to overlay the AI-generated semantic segmentation mask (simulating a U-Net output localizing a brain tumor and ventricles).

Raw Scan AI Segmentation Mask
Tumor Ventricles

Chapter 15. Detection Networks

Architectures like Faster R-CNN, RetinaNet, and YOLO (You Only Look Once) specialize in drawing bounding boxes around findings rapidly. These are critical in emergency radiology for spotting lung nodules or fractures in real-time.

PART V: Teaching Machines Context

Chapter 15b. RNNs, LSTMs, and the Roots of Radiology NLP

Before the Transformer revolution, clinical language processing and speech recognition relied on architectures designed to understand time, order, and sequence. While images are spatial 2D structures, reports and dictated voice are sequential 1D flows.

Temporal Sequences: Recurrent Neural Networks (RNNs)

Traditional neural networks assume all inputs and outputs are independent of each other. However, if you want to predict the next word in a diagnostic sentence, you must know what words came before. Recurrent Neural Networks (RNNs) solve this by looping a hidden state memory block across temporal steps.

Overcoming Memory Loss: LSTMs

When training basic RNNs over long sequences (such as an extensive patient history or a long MRI report), the gradient signal fades away during backpropagation—a mathematical crisis known as the Vanishing Gradient problem. LSTMs (Long Short-Term Memory networks) solved this by introducing an internal "cell state" governed by three specialized gates:

  • Forget Gate: Decides how much historical memory to discard.
  • Input Gate: Selects which new diagnostic information to save into the cell state.
  • Output Gate: Controls what part of the state actually impacts the final output prediction.

Structured Reporting with NLP & Clinical Text Mining

Historically, radiologists produced narrative, free-text reports, which created huge challenges for secondary clinical data mining. Natural Language Processing (NLP) bridge networks—built initially on LSTMs and subsequently upgraded with clinical transformers—are used to structure these narratives into highly actionable datasets. This process hinges on three foundational tasks:

1. Named Entity Recognition (NER)

NLP engines scan unstructured diagnostic reports to isolate clinical concepts. By aligning free-text with standardized biomedical vocabularies (like RadLex for radiology terms, SNOMED-CT for clinical findings, and UMLS codes), the machine extracts key terms such as "spiculated mass", "left lower lobe", and "pleural effusion" and tags them automatically.

2. Clinical Relation Extraction

Identifying isolated words is insufficient; the model must parse syntax to connect relationships. It maps clinical entities to their modifiers—such as matching a size measurement ("12mm") with a lesion location ("right thyroid lobe") and asserting negation ("no evidence of acute fracture") to prevent false positives in the medical record.

3. Standardized Reporting Mapping (BI-RADS, PI-RADS, LI-RADS)

To ensure diagnostic clarity across care teams, NLP parsers automatically map free-text clinical summaries directly into structured risk classification templates. For example, a narrative describing a "strongly hypoechoic breast mass, taller than wide, with microcalcifications" is mapped to a structured BI-RADS 5 classification (highly suggestive of malignancy). Similar structures exist for prostate (PI-RADS) and liver lesions (LI-RADS).

Natural Language Processing & Speech Recognition

In modern reading rooms, LSTMs and sequential models became the backbone of two crucial helper technologies:

  • Speech-to-Text: Capturing audio waveforms from the radiologist's dictation microphone and mapping them sequentially into text strings.
  • Structured NLP Parsing: Scanning unstructured clinical EHR reports to automatically extract historic tumor staging criteria.

Chapter 16. Self-Supervised Learning

Traditional AI suffered from the "labeling bottleneck"—requiring thousands of radiologist-annotated images. Self-supervised learning (methods like SimCLR, MoCo, DINO) allows models to learn representations from unlabeled data by solving puzzles (e.g., predicting missing image patches). This fundamentally enabled modern foundation models.

Chapters 17 & 18. Self-Attention Mechanisms and Vision Transformers (ViT)

While Convolutional Neural Networks (CNNs) process medical images using localized slide-windows (receptive fields), they naturally miss wide-area spatial correlations. Transformers completely redefined this by adopting global Self-Attention Mechanisms, letting the network correlate details across the entire image landscape simultaneously.

The Math Behind Self-Attention

Self-Attention projects an input matrix into three distinct vector representations: Queries ($Q$), Keys ($K$), and Values ($V$). The relationship between any two areas in an image is calculated by computing the dot product of their Query and Key vectors, scaled by the square root of the channel dimension ($\sqrt{d_k}$) to prevent vanishing gradients:

$$\text{Attention}(Q, K, V) = \text{softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right)V$$

In **Multi-Head Attention**, this mathematical operation is executed in parallel across several independent projection subspaces ("heads"). The outputs are then concatenated and linearly projected back to the original dimension: $$\text{MultiHead}(Q,K,V) = \text{Concat}(\text{head}_1, \dots, \text{head}_h)W^O$$ $$\text{where} \quad \text{head}_i = \text{Attention}(QW_i^Q, KW_i^K, VW_i^V)$$ This allows the network to focus on multiple areas (such as lung margins and lymph nodes) simultaneously at various levels of abstraction.

Deconstructing the Vision Transformer (ViT) Pipeline

Standard Transformer architectures were built to process 1D sequences of text tokens. To apply them to 2D medical images, the Vision Transformer (ViT) utilizes a clever sequence of steps:

  1. Patch Extraction: A high-resolution 2D image $X \in \mathbb{R}^{H \times W \times C}$ is divided into a grid of non-overlapping flat patches $X_p \in \mathbb{R}^{N \times (P^2 \cdot C)}$, where $(P, P)$ is the patch resolution (typically 16x16 pixels) and $N = (H \cdot W)/P^2$ is the total number of patches (tokens).
  2. Linear Patch Projection: Each flattened image patch is passed through a trainable linear layer to project it into a vector space of dimension $D$. This is mathematically equivalent to a convolutional layer with a kernel size and stride equal to the patch size.
  3. Positional Embeddings: Because self-attention is permutation-invariant (it doesn't inherently know which patch came from where), trainable 1D positional vectors are added to the projected patch embeddings to preserve spatial topology: $$\mathbf{z}_0 = [\mathbf{x}_{\text{class}}; \, \mathbf{x}_p^1 \mathbf{E}; \, \mathbf{x}_p^2 \mathbf{E}; \, \dots; \, \mathbf{x}_p^N \mathbf{E}] + \mathbf{E}_{\text{pos}}$$
  4. The Class $[CLS]$ Token: Like BERT in NLP, a specialized learnable token ($\mathbf{x}_{\text{class}}$) is prepended to the patch sequence. As it passes through the Transformer's self-attention blocks, this token aggregates diagnostic information from all other patches. The final state of the $[CLS]$ token is then fed into a classification head (MLP) to determine whether a tumor or pathology is present.

Concept Check

What was the primary architectural limitation of early CNNs that Transformers solved?

Chapter 19. Hybrid Architectures

Why choose one? Architectures like TransUNet, UNETR, and Swin-UNet combine the local precision of CNNs with the global context understanding of Transformers.

PART VI: Foundation Models

Chapter 20. From Feature Learning to Representation Learning

The arc of AI evolution is clear:

  • Symbolic AI: Learns rules
  • Radiomics: Learns handcrafted features
  • CNN: Learns image features
  • Transformer: Learns relationships
  • Foundation Model: Learns general representations

Chapter 21. Medical Foundation Models: Pretraining and Scale

The current boom in AI is driven by **Foundation Models**—massive networks trained on giant, diverse, unlabelled datasets using self-supervised learning. In medicine, these models act as universal feature extractors that can adapt to many different clinical tasks with very little downstream training.

Self-Supervised Pretraining Paradigms

Instead of relying on millions of manual labels, medical foundation models learn anatomy and pathology by solving built-in data puzzles:

  • Masked Autoencoders (MAE): The model masks out large portions (e.g., 75%) of a medical image (like a Chest CT) and learns to reconstruct the missing pixels. To do this, it must build a highly accurate internal map of human anatomy.
  • Contrastive Language-Image Pretraining (CLIP): Models like BiomedCLIP learn by aligning diagnostic images with the free-text dictated reports that accompany them. It pulls corresponding image-text pairs close together in vector space while pushing unrelated ones apart.

Downstream Parameter-Efficient Fine-Tuning (PEFT)

Since updating billions of model parameters for every specific clinical site is impractical, radiologists leverage PEFT and Low-Rank Adaptation (LoRA):

  • LoRA (Low-Rank Adaptation): Freeze the foundation model's original pre-trained weight matrices ($W_0 \in \mathbb{R}^{d \times k}$) and inject small, low-rank decomposition matrices ($B \times A$) next to them.
  • Parameter Efficiency: Only these tiny low-rank adapter matrices are trained. This slashes GPU memory requirements during training, prevents model degradation, and allows rapid deployment of custom clinical tools (e.g., bone age calculators or COVID-19 markers).

Chapter 22 & 23. MedSAM, BiomedCLIP & Vision-Language Models

Segment Anything (MedSAM) provides universal segmentation and interactive annotation capabilities. By adapting Meta’s Segment Anything Model (SAM) with deep medical image datasets, MedSAM segments complex target volumes (like renal cysts or lung tumors) dynamically from a simple bounding box or point click.

Vision-Language Models (VLMs) like BiomedCLIP, LLaVA-Med, and Med-Flamingo bridge the gap between pixels and text. These models don't just classify images; they understand clinical context, enabling auto-generated report drafts, image-text search, and real-time clinical question answering in the reading room.

Chapter 24. Multimodal AI

The ultimate goal: Integrating Imaging, Clinical notes, Laboratory data, Pathology, and Genomics into a single model to output integrated clinical reasoning.

PART VII: The Future of Radiology AI

Chapter 25. Radiology Copilots

The near future is the Copilot. Capabilities include drafting reports, providing differential diagnosis support, retrieving patient context, and orchestrating workflow seamlessly.

Chapter 26. Neuro-Symbolic and Deterministic AI

Healthcare demands explainability and auditability. We are seeing a return to rules, combined with learning: Neuro-Symbolic AI. It combines the pattern-recognition of Neural Networks with the hard logic guardrails of Symbolic AI to ensure patient safety and regulatory governance.

Chapter 27. Validation, Drift and Governance

AI is not "set and forget." Models degrade over time.

  • Data Drift: Changes in scanner hardware or protocols.
  • Concept Drift: Changes in disease prevalence or definitions (e.g., COVID-19).
  • Automation Bias: The psychological risk of radiologists blindly trusting the algorithm.

Chapter 28. Emerging Horizons

Looking further ahead:

  • Radiogenomics: Fusing Imaging + Genomics for Precision Oncology.
  • Edge AI: Running AI directly on the CT/MRI scanner hardware.
  • Quantum Computing & Neuromorphic Computing: Currently exploratory, potentially revolutionizing optimization and simulation.

Chapter 29. Clinical Case Matcher: Which Algorithm is Required?

When deploying AI in actual medical workflows, choosing the right tool is vital. Click on any clinical scenario below to see which family of algorithms is suited to solve the problem and understand why.

Interactive Tool: The Diagnostic Sandbox


Final Synthesis: The Evolution Timeline

The history of radiology AI is the evolution of how machines represent, interpret, and integrate medical information. Click through the timeline to review the journey.

Select an era

Click the nodes above to explore the chronological evolution of AI algorithms in radiology.

Comments