PATENT APPLICATION SPECIFICATION
TITLE: SYSTEM AND METHOD FOR EXPERT-INFORMED MULTI-SPECTRAL FEATURE ENGINEERING FOR RADIOMIC CLASSIFICATION
INVENTOR: Moran Danieli Cohen
ASSIGNEE: Tenders Lab LTD
FIELD OF THE INVENTION
The present invention relates generally to the field of medical image processing, computer-aided diagnosis (CAD), and artificial intelligence. More specifically, it relates to a system and method for transforming monochromatic volumetric radiological data (such as Computed Tomography) into synthetic multi-spectral representations that encode expert radiological heuristics to facilitate automated classification of neoplasms using standard computer vision architectures.
BACKGROUND OF THE INVENTION
Medical imaging, particularly Computed Tomography (CT), is the standard of care for diagnosing internal pathologies, including renal neoplasms. However, the automated interpretation of these images by artificial intelligence systems presents significant challenges that have not been fully addressed by existing art.
1. The Dimensionality Mismatch Problem
Standard state-of-the-art deep learning architectures (e.g., ResNet, EfficientNet, Vision Transformers) are primarily engineered and pre-trained on natural optical images (e.g., ImageNet), which are composed of three spectral channels: Red, Green, and Blue (RGB). In contrast, radiological data such as CT is fundamentally monochromatic, consisting of a single scalar value per voxel representing radiodensity (Hounsfield Units).
Existing solutions typically either:
(b) Utilize 3D Convolutional Networks (3D-CNNs), which are computationally expensive, memory-intensive, and notoriously difficult to train on small medical datasets due to the "curse of dimensionality."
2. The Segmentation-First Bias
The dominant paradigm in renal analysis (e.g., KiTS23 Challenge) focuses on Semantic Segmentation (delineating boundaries) rather than Classification (determining pathology). State-of-the-art segmentation models (e.g., nnU-Net, 2D SCNet) often achieve high Dice scores (spatial overlap) but fail to distinguish between histologically similar but pathologically distinct subtypes (e.g., Oncocytoma vs. Chromophobe RCC). Furthermore, these Dual-Task networks require expensive pixel-level annotations, which creates a significant bottleneck for training data acquisition.
2. The Dynamic Range Problem
The human eye and standard 8-bit image formats (JPG/PNG) can only resolve approximately 256 shades of gray. However, a CT scan contains dynamic radiodensity data ranging from -1000 HU (Air) to +3000 HU (Cortical Bone/Metal).
Traditional "Windowing" techniques select a single sub-range (e.g., a Soft Tissue Window of -50 to +250 HU) to display. While effective for human viewing, this process explicitly destroys data outside the window. For example, a "Soft Tissue Window" will clip both Adipose Tissue (Fat, approx -100 HU) and Calcifications (Bone, approx +400 HU) to essentially uniform black and white pixels, respectively. This data loss is catastrophic for differentiating tumors such as Angiomyolipoma (benign, fat-containing) from Renal Cell Carcinoma (malignant, often calcified).
3. The Texture Subtlety Problem
Critical differentiators between benign neoplasms (e.g., Oncocytoma) and malignant ones (e.g., Clear Cell RCC) often reside in subtle, high-frequency local texture heterogeneity. Malignant tumors frequently exhibit chaotic localized angiogenesis and necrosis, while benign tumors often present with homogenous tissue density. Standard global linear windowing preserves absolute density but often compresses local contrast, rendering these subtle textural signatures invisible to the neural network.
4. The Small Data Problem
Medical datasets are notoriously small (often <500 cases) compared to natural image datasets (>10M images). Deep learning models trained on raw medical data often fail to converge or severely overfit because they must "re-discover" basic laws of radiology (e.g., that bone is dense, or that fat is distinctive) from scratch. There is a need for a feature engineering approach that explicitly encodes these radiological heuristics into the input data itself, thereby "teaching" the model to look for relevant features before training begins.
SUMMARY OF THE INVENTION
The present invention provides a system and method for generating a "Synthetic Multi-Spectral Sensor" view of radiological data. The method involves decomposing a single volumetric slice into three distinct spectral bands, each processed via a varying non-linear transfer function selected to highlight specific biological properties, and combining them into a composite multi-channel image that is compatible with standard pre-trained computer vision architectures.
Technological Advantages:
- Information Density: By mapping three different radiological "views" to the R, G, and B channels, the system triples the information content provided to the network without increasing the input tensor size.
- Expert Encoding: The specific transfer functions (Linear, Hyper-Wide, and Adaptive Equalization) mimic the cognitive process of an expert radiologist who scans for Structure, Composition, and Texture simultaneously.
- Attention Without Architecture: The invention introduces "Chromatically Baked Attention," a method of encoding the Region of Interest (ROI) directly into the pixel color space via a spectral shift, eliminating the need for complex, computationally heavy attention mechanisms in the neural network.
In one embodiment, the system generates:
- Use a Red Channel for Structural Morphology (Standard Soft Tissue Window).
- Use a Green Channel for Density Extremum detection (Hyper-Wide Window to capturing Fat and Calcification).
- Use a Blue Channel for Texture Variance (Contrast-Limited Adaptive Histogram Equalization).
The resulting composite image serves as a highly efficient training artifact that allows "Off-the-Shelf" 2D networks to achieve state-of-the-art performance on 3D volumetric tasks with limited training data.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a high-level block diagram of the system architecture.
graph LR
subgraph Input Data
A["Volumetric <br/> CT Scan"] --> B["Slice <br/> Extractor"]
M["Segmentation <br/> Mask"] --> B
end
subgraph "Spectral Transformation Engine"
B --> C1["Structural <br/> Channel Generator"]
B --> C2["Density-Extremum <br/> Generator"]
B --> C3["Texture-Variance <br/> Generator"]
C1 -- "Window(40,300)" --> R["Red Channel <br/> Buffer"]
C2 -- "Window(100,1000)" --> G["Green Channel <br/> Buffer"]
C3 -- "CLAHE(0.03)" --> BL["Blue Channel <br/> Buffer"]
end
subgraph "Chromatic Attention"
M --> ATT["Attention <br/> Module"]
R --> ATT
G --> ATT
BL --> ATT
ATT -- "Apply Spectral <br/> Shift" --> OUT["Composite 3-Channel <br/> Tensor"]
end
OUT --> CNN["Convolutional <br/> Neural Network"]
FIG. 2 is a flowchart illustrating the method of generating the multi-spectral composite image.
flowchart TD
Start([Start]) --> S201["201 Ingest Volume V <br/> and Mask M"]
S201 --> S202["202 Extract <br/> 2D Slice Sz"]
S202 --> S202b["202b Crop to ROI <br/> plus Margin"]
S202b --> S203["203 Decompose to <br/> Buffers B1 B2 B3"]
subgraph Parallel Processing
S203 --> T1["Transform B1 <br/> Soft Tissue Window"]
S203 --> T2["Transform B2 <br/> Wide Range Window"]
S203 --> T3["Transform B3 <br/> CLAHE Texture Filter"]
end
T1 --> S204["204 Merge to <br/> Composite Image"]
T2 --> S204
T3 --> S204
S204 --> S205{"205 Apply <br/> Mask?"}
S205 -- Yes --> S205b["205b Suppress <br/> Background"]
S205b --> S206["206 Apply Chromatic <br/> Weighting Matrix"]
S205 -- No --> S207["207 Output <br/> Final Tensor"]
S206 --> S207
S207 --> End([End])
FIG. 3 illustrates the spectral transfer functions applied to the Hounsfield Unit histogram.
graph LR
subgraph "Input Value (HU)"
H["-1000 to +3000"]
end
subgraph "Transfer Functions"
H --> F1["Soft Tissue: <br/> Linear Clip <br/> (-50 to 250)"]
H --> F2["Wide Range: <br/> Linear Clip <br/> (-400 to 600)"]
H --> F3["Texture: CLAHE <br/> Non-Linear <br/> Histogram Eq"]
end
subgraph "Output Channel"
F1 --> Red["Red Channel <br/> (0.0 - 1.0)"]
F2 --> Green["Green Channel <br/> (0.0 - 1.0)"]
F3 --> Blue["Blue Channel <br/> (High Freq)"]
end
FIG. 4 is a schematic comparison of a standard CT slice versus the multi-spectral embodiment.
graph LR
subgraph "Prior Art - Standard"
RAW1["Raw CT Data"] --> W["Soft Tissue Window"]
W --> GRAY["Grayscale Image <br/> 1 Channel"]
GRAY --> LOSS["Loss of <br/> Deep High Range <br/> Data"]
end
subgraph "Present Invention"
RAW2["Raw CT Data"] --> SPLIT["Split Channels"]
SPLIT --> CH1["Red Structure"]
SPLIT --> CH2["Green Density"]
SPLIT --> CH3["Blue Texture"]
CH1 --> RGB["Composite RGB <br/> 3 Channels"]
CH2 --> RGB
CH3 --> RGB
RGB --> GAIN["Full Data <br/> Retention"]
end
FIG. 5 is a diagram of the Deterministic Chromatic Weighting mechanism (formerly "Digital Staining") applied to a tumor ROI.
graph TD
Pixel["Pixel P at (x,y)"] --> Check{"Is P in Mask?"}
Check -- "No (Background)" --> Unchanged["Output = <br/> Original RGB"]
Check -- "Yes (Tumor ROI)" --> Shift["Apply Spectral <br/> Weights"]
Shift --> R["Red * 1.2 (Boost)"]
Shift --> G["Green * 0.8 (Suppress)"]
Shift --> B["Blue * 0.8 (Suppress)"]
R --> Tinted["Tinted Output <br/> Pixel"]
G --> Tinted
B --> Tinted
Tinted --> Result["Neural Net sees <br/> Red Feature"]
DEFINITIONS
For the purpose of this specification, the following terms are defined:
- "Voxel": A three-dimensional element of a medical image.
- "Hounsfield Unit (HU)": A quantitative scale for describing radiodensity.
- "Windowing": The process of selecting a subset of the full dynamic range of a medical image for display or processing.
- "Deterministic Chromatic Weight" (formerly "Digital Stain"): An algorithmic modification of pixel values to enhance range or feature discriminability, mimicking the effect of a chemical histological stain but using fixed mathematical scalars.
DETAILED DESCRIPTION OF THE INVENTION
1. Hardware and System Architecture (Ref. FIG. 1)
The invention may be implemented on a computing system comprising:
- Data Ingestion Module: Configured to read volumetric medical images (e.g., in DICOM, NIfTI, or MHD formats) and corresponding segmentation masks.
- Preprocessing Engine: A CPU or GPU-based processor carrying out the pixel-level transformation logic described herein.
- Training Interface: A neural network training environment (e.g., PyTorch, TensorFlow) configured to accept the 3-channel composite images as input tensors.
- Storage Memory: Storing the generated dataset of multi-spectral 2D slices.
- Computing Device: The system may differ from a general purpose computer, and may be embedded in a PACS workstation, a cloud server, or a specialized radiology appliance.
2. Method of Operation (Ref. FIG. 2)
Referring to FIG. 2, the method proceeds as follows:
- Step 201: Ingestion. The system loads a 3D volume $V$ and a Mask $M$.
- Step 202: Slicing. A 2D plane is extracted from $V$ at index $z$.
- Step 202b: Cropping & Zooming. The slice is cropped to the bounding box of the kidney plus a safety margin (e.g., 20px). This ROI is then resized (zoomed) to a fixed input tensor dimension (e.g., $256 \times 256$ pixels).
- Technical Effect: This normalization effectively "zooms in" on the organ, ensuring that small tumors occupy a significant portion of the receptive field, unlike standard whole-abdomen analysis where they might occupy <1% of pixels.
- Step 203: Channel Decomposition. The cropped slice $S_z$ is copied to three buffers: $B_1, B_2, B_3$.
- Step 204: Transformation.
- $B_1$ undergoes Transformation function $F_{struct}$.
- $B_2$ undergoes Transformation function $F_{density}$.
- $B_3$ undergoes Transformation function $F_{texture}$.
- Step 205: Background Suppression. Pixels outside the kidney Region of Interest (ROI) are set to zero to remove confounding anatomical context.
- Step 206: Chromatic Shift. The mask $M_z$ is applied to the composite buffer, shifting vector weights for tumor pixels.
- Step 207: Tensor Output. The result is serialized as a
.png or tensor file.
3. The Multi-Spectral Transformation Method
The core of the invention is the deterministic mapping of a single scalar input domain ($D_{in} \in \mathbb{R}$) into a multi-dimensional output vector space ($V_{out} \in \mathbb{R}^3$).
A. The Structural Channel (First Spectral Band)
The first channel (typically mapped to Red in an RGB container) is configured to provide the "Anchor View" or structural context.
- Objective: Maximize contrast in the biologically relevant range for soft organ tissue (Kidney, Liver, Spleen).
- Algorithm:
Let $I(x,y)$ be the input voxel density in HU.
Let $L_1$ be the Window Level (e.g., 40 HU).
Let $W_1$ be the Window Width (e.g., 300 HU).
The normalized output $C_1(x,y)$ is calculated as:
$$ C_1(x,y) = \text{clamp}\left( \frac{I(x,y) - (L_1 - \frac{W_1}{2})}{W_1}, 0, 1 \right) $$
- Implementation Note: This channel serves as the equivalent of a standard radiologist's workstation view, ensuring the neural network retains access to traditional morphological features (shape, size, location).
B. The Density-Extremum Channel (Second Spectral Band)
The second channel (typically mapped to Green) is configured to act as a "Range Compressor."
- Objective: To capture features that fall outside the dynamic range of the Structural Channel, specifically Adipose Tissue (Fat) and Calcified Tissue (Bone/Stones).
- Problem Solved: In the Structural Channel ($C_1$), both -100 HU (Fat) and -1000 HU (Air) are clamped to 0.0. Similarly, both +400 HU (Bone) and +3000 HU (Metal) are clamped to 1.0. This makes distinguishing an Angiomyolipoma (containing fat) from a Cyst (fluid) impossible if relying only on $C_1$.
- Algorithm:
Let $L_2$ be the Window Level (e.g., 100 HU).
Let $W_2$ be the Window Width (e.g., 1000 - 1500 HU).
$$ C_2(x,y) = \text{clamp}\left( \frac{I(x,y) - (L_2 - \frac{W_2}{2})}{W_2}, 0, 1 \right) $$
- Result: In this channel, Fat appears as distinct mid-dark gray (approx 0.3), while Air is black (0.0). Calcification appears as bright gray (0.8), distinct from Metal (1.0).
C. The Texture-Variance Channel (Third Spectral Band)
The third channel (typically mapped to Blue) is configured to act as a "High-Pass Texture Filter."
- Objective: To extract high-frequency variance independent of the basline DC offset (absolute density).
- Algorithm: The system applies Contrast-Limited Adaptive Histogram Equalization (CLAHE).
- The image is tiled into regions (e.g., $8 \times 8$ grid).
- For each tile, a local histogram is computed.
- The histogram is clipped at a pre-defined limit (e.g., 0.03) to prevent noise amplification in uniform regions. This parameter is critical for renal tissue; a higher limit (>0.05) would amplify the inherent "grain" of the CT acquisition, leading to false-positive texture detection in healthy parenchyma.
- The histogram is equalized and bilinearly interpolated across tile boundaries.
- Technical Effect: This process reveals "micro-textures." A malignant tumor with necrotic centers will exhibit a "noisy" or "rough" signature in this channel, whereas a simple cyst will appear "smooth," even if their average densities in HUs are identical. This solves the prior art problem where standard U-Nets conflate density and texture feature maps.
4. Mask-Guided Deterministic Chromatic Weighting ("Hard Attention")
Deep learning models typically struggle to focus on small Regions of Interest (ROIs) within large images. Rather than relying on a learned "Soft Prior" (e.g., Transformer Self-Attention) which requires heavy computation, the present invention utilizes a "Hard Prior" encoded directly into the pixel values.
Method:
The system accepts a binary segmentation mask $M(x,y)$. A transformation matrix $T$ is applied to the RGB vector $\vec{V} = [C_1, C_2, C_3]$ for all pixels where $M=1$.
The Warm-Shift Embodiment:
The system performs element-wise scalar multiplication using a fixed weighting vector $\vec{W} = [1.2, 0.8, 0.8]$:
$$ R_{new} = R_{old} \times 1.2 $$
$$ G_{new} = G_{old} \times 0.8 $$
$$ B_{new} = B_{old} \times 0.8 $$
- Effect: This tints the ROI reddish.
- Neural Impact: This deterministic weighting creates a linear separability in the color manifold, allowing the CNN to maintain focus on the ROI without an additional spatial attention head.
5. Alternative Embodiments
The invention is not limited to the specific parameters described above. The following variations are within the scope of the invention:
Variation A: HSV Color Space Encoding
Instead of mapping to RGB, the channels can be mapped to Hue, Saturation, and Value (HSV).
- Hue: Mapped to basic tissue density.
- Saturation: Mapped to texture variance (rougher = sharing more color).
- Value: Mapped to confidence or segmentation probability.
Variation B: 3D-to-2D Projection (The "Thick Slice" Embodiment)
Instead of a single slice, the input may be a grouping of 3 adjacent Z-axis slices ($Z-1, Z, Z+1$).
- Channel R = Slice $Z-1$
- Channel G = Slice $Z$
- Channel B = Slice $Z+1$
- Technical Effect: Encodes volumetric continuity data into a 2D image, allowing a 2D network to perceive 3D spherical features.
Variation C: Organ-Specific Parameter Tuning
- For Lung Analysis: $C_1$ uses Lung Window (-600/1500), $C_2$ uses Mediastinal Window (50/350), $C_3$ uses Edge Detection (Sobel/Canny).
- For Brain Hemorrhage: $C_1$ uses Stroke Window (40/40), $C_2$ uses Bone Window (for fracture detection), $C_3$ uses CLAHE (for identifying subtle midline shifts).
Variation D: Visual Verification Interface (Explainability Mode)
The system may further comprise a Graphical User Interface (GUI) configured to allow a clinician to "toggle" between the composite view and individual spectral channels. This enables a human operator to verify the specific radiological features driving the AI's decision (e.g., inspecting the Blue channel to confirm chaotic texture), thereby providing a "Safety by Explainability" mechanism that is absent in traditional "Black Box" deep learning systems.
6. Training and Inference Workflow ("Curriculum Learning")
The invention utilizes a novel Two-Stage Curriculum Learning approach to maximize both feature discovery and deployment robustness.
Phase 1: Mask-Guided Feature Discovery (Proximal Training)
- Ingest labeled volumetric dataset with precise segmentation masks.
- Apply "Hard Attention" (Deterministic Chromatic Weighting) to tint tumor ROIs.
- Train the CNN (e.g., ResNet18) to convergence.
- Technical Effect: The chromatic shifting acts as a "scaffolding" or "teacher signal," forcing the network to attend to the specific spectral signatures (texture/density) of the tumor relative to the background.
Phase 2: Unsegmented Adaptation (Distal Training)
- Initialize the model with the feature extraction weights learned in Phase 1.
- Ingest a dataset of Unsegmented (Cropped-Only) slices where the Multi-Spectral Transformation (CLAHE/Wide-Window) is applied, but the Hard Attention (Masking) is disabled.
- Fine-tune the entire model (or a subset of layers) on this unmasked data.
- Technical Effect: The model learns to locate and classify the tumor based solely on the Multi-Spectral signatures (e.g., heterogeneous texture in the Blue channel) without relying on the artificial color tint.
- Industrial Advantage: This enables deployment in scenarios where precise segmentation masks are unavailable or computationally expensive to generate.
Inference (Deployment) Phase:
- Receive raw patient CT scan.
- Perform coarse bounding-box detection (high tolerance) rather than precise pixel-wise segmentation.
- Generate Multi-Spectral composite (Red=Structure, Green=Density, Blue=Texture).
- Feed tensor to the Phase 2 Adapted Classifier.
7. Experimental Validation
Recent advancements in semantic segmentation, highlighted by the 2023 Kidney Tumor Segmentation Challenge (KiTS23) [1], have spurred development in automated volumetric analysis. While these segmentation-first approaches yield high volumetric overlap (Dice scores > 0.90 for kidney), they often falter in the specific downstream task of classifying the tumor pathology needed for surgical decision making.
The efficacy of the present invention was validated using the KiTS23 dataset [1], which represents the most comprehensive open-source repository of renal tumor CT scans to date. The KiTS23 cohort exhibits a significant class imbalance. To train a robust classifier without bias, we constructed a balanced training corpus using adaptive multi-planar sampling.
From the KiTS23 cohort, we utilized 210 cases for Phase 1 (The KiTS19 subset) and expanded to 422 cases for Phase 2 (The KiTS23 extension). The adaptive sampling strategy extracts slices relative to class scarcity, drawing significantly more slices from the rare benign cases to achieve parity. This process resulted in a balanced training corpus of 8640 labeled 2D slices.
Methodology:
A comparative analysis was performed using the Multi-Spectral Method of the present invention. The model utilized a ResNet18 architecture initialized with ImageNet weights and fine-tuned on the synthetic multi-spectral images. ResNet18 was selected for its computational efficiency and robustness in handling the specific feature sets of the processed 2D slices.
Results:
The inventive method was benchmarked against a rigorous baseline of standard monochromatic CT data (Soft Tissue Window).
- Phase 1 Performance (Mask-Guided): In the initial training phase utilizing Hard Attention, the model achieved a validation accuracy of 95.6% and an F1-Score of 0.96. This established a strong baseline for feature extraction.
- Phase 2 Performance (Unsegmented Adaptation): Following the unsegmented adaptation phase, the model's performance improved to an overall classification accuracy of 97.0%. The system achieved high precision and recall across both classes on the unsegmented validation set:
- Benign: Precision 0.96, Recall 0.98.
- Malignant: Precision 0.98, Recall 0.95.
- F1-Score: 0.97.
Robustness of Unsegmented Adaptation:
A critical strength of the inventive method is its certified performance on a significantly broader patient population in Phase 2. The data utilization and balancing strategies were purposefully inverted between phases to maximize learning:
- Phase 1 (Proximal Training - Benign Oversampling): Utilized the full 210 case KiTS19 cohort (0-209). To address scarcity (17 Benign vs 193 Malignant), we employed Adaptive Slice Oversampling, repeating benign views ~60x to force the model to learn rare features without discarding any malignant data.
- Phase 2 (Distal Adaptation - Malignant Downsampling): Utilized 422 cases (including 244 New Patients). With the influx of 92 new benign patients, we shifted strategy to Malignant Downsampling (selecting ~4230 malignant slices to match the ~4410 benign slices). This created a balanced 1:1 priors regime (~8640 total slices) driven by unique benign pathology rather than repetition.
- The Delta: Phase 1 optimized for exposure (seeing rare cases often), while Phase 2 optimized for generalization (seeing many unique cases once). The transition from "synthetic repetition" to "organic diversity" resulted in the final 97.0% accuracy.
Held-out Test Set Robustness:
To verify the system's generalization to completely new patients, we evaluated the model on the remaining held-out cases of the KiTS21/23 dataset (176 cases). Analysis confirmed this test set was 100% Malignant, reflecting the natural prevalence of renal cell carcinoma. On this independent set, the model achieved 100% Sensitivity, correctly identifying all 176 malignant cases. While specificity on new benign cases could not be assessed due to their absence in the hold-out set, the perfect sensitivity verifies that the model does not "forget" malignancy features when applied to unseen patients.
- Architectural Efficiency: The specific feature engineering enables the use of lightweight, high-inference-speed architectures (e.g., ResNet18, approx. 11M parameters) while matching the performance of significantly larger or more complex state-of-the-art models (e.g., ADP-CNN ensembles or DenseNets). This reduction in parameter complexity facilitates deployment on edge medical devices without specialized GPU hardware.
- Edge Viability (Latency vs Gain): While the pre-processing step introduces a nominal latency (measured at 33ms per slice on CPU), it enables the use of ResNet18 instead of ResNet50/VGG16. The inference speed gain from switching to the smaller backbone is approx. 85ms per slice. Thus, the net system latency is reduced by ~52ms per slice, validating suitability for edge-deployment in scanner consoles.
- Orthogonal Feature Separation: A key advantage over prior art is feature orthogonality. Standard U-Nets learning from raw Hounsfield units often entangle "density" and "texture" in the early feature maps. By explicitly separating these into the Green (Density) and Blue (Texture-CLAHE) channels, we enforce valid feature extraction, preventing the model from learning spurious correlations.
Comparative Analysis:
The following table illustrates the performance of the Inventive Method against state-of-the-art approaches. Note that direct comparison is limited by differences in reported metrics (Accuracy vs. AUC) and evaluation levels (Slice-Level vs. Patient-Level).
| Model / Paper |
Task Type |
Parameters (Approx.) |
Performance Metric |
Reference |
| Inventive Method (Phase 2) |
Unsegmented Class. |
~11.7 M |
97.0% Accuracy |
[Experimental] |
| Inventive Method (Phase 1) |
Masked Classification |
~11.7 M |
95.6% Accuracy |
[Experimental] |
| ADP-CNN-TTAO |
KiTS23 Classification |
High (Ensemble) |
99.3% Accuracy |
ResearchGate 2023 |
| DenseAUXNet-201 |
KiTS23 Classification |
~20 M (Heavy) |
98.0% Accuracy |
ResearchGate 2023 |
| KiTS23 ML Baseline |
Radiomics + XGB |
Low |
72% AUC (Low Recall) |
MDPI 2023 [4] |
| ColorNephroNet |
Slice Classification |
~138 M |
86.0% Accuracy |
Access 2020 [3] |
- Analysis of Prior Art:
- Complexity vs. Performance: Recent KiTS23 models like ADP-CNN-TTAO achieve marginally higher accuracy (99.3%) but require complex, heavy ensembles that are difficult to deploy on edge devices. The present invention achieves competitive accuracy (97%) with a standard, lightweight ResNet18.
- Recall Imbalance: Other baselines (e.g., KiTS23 ML Baseline) often trade Recall for Precision (High Precision, Low Recall ~68%). Our method maintains high recall (>95%) for Malignant cases, which is critical for safety.
- Data Efficiency: Pure Deep Learning approaches ("ColorNephroNet") struggle on the renal tumor datasets, achieving only 86-88% accuracy despite massive parameter counts (44M-138M). This confirms the "Data Scarcity" problem.
- Feature Engineering Validation: The only method to consistently exceed 90% is Radiomics (97.4%) [1], which uses explicit hand-crafted features. This validates the premise that explicit feature engineering is superior to implicit representation learning for this domain.
- The Inventive Step (Spectral vs. Spatial): While Radiomics relies on Spatial texture statistics (GLCM) calculated in the image domain, the present invention introduces Spectral feature engineering. By decomposing the scalar field into orthogonal frequency/density channels (CLAHE, Wide-Window), it achieves comparable high-tier performance (93-94%) suitable for Deep Learning pipelines, without the manual feature selection complexity of Radiomics or the pixel-level annotation cost of Dual-Task networks [2].
- Methodological Note: The Inventive Method results are reported on Slice-Level (2D) accuracy, which is inherently more challenging than Patient-Level (3D) classification where predictions can be averaged across hundreds of slices ("Voting") to reduce error. Achieving 93% accuracy on individual slices demonstrates a higher degree of featurization robustness compared to volumetric approaches that rely on aggregation to achieve similar metrics [2].
References:
- [1] Heller et al., "The 2023 Kidney Tumor Segmentation Challenge (KiTS23)," arXiv/MICCAI, 2023.
- [2] Gong et al., "Segmentation and classification of renal tumors based on convolutional neural network," Biomedical Signal Processing and Control, 2021.
- [3] Obuchowski et al., "ColorNephroNet: Kidney tumor malignancy prediction using medical image colorization," IEEE Access, 2020.
- [4] Bolocan et al., "Convolutional Neural Network Model for Segmentation and Classification," Diagnostics, 2023.
Conclusion:
While achieving state-of-the-art accuracy equivalent to standard methods, the Multi-Spectral Feature Engineering method provides a significant technical advancement in Interpretability and Training Efficiency. By actively encoding radiological heuristics into the data representation, the specific technical improvement lies in bridging the semantic gap between "Black Box" AI predictions and human-verifiable radiological features.
CLAIMS
-
A system for facilitating automated classification of radiological data, comprising:
a processor configured to access a volumetric dataset;
a transformation module configured to extract a scalar field slice from said dataset, identify a region of interest, and spatially normalize (zoom) said region to a fixed pixel dimension;
a first windowing unit applying a first transfer function defined by a first level and width parameters optimized for soft tissue contrast;
a second windowing unit applying a second transfer function defined by a width parameter sufficiently broad to encompass the full radiodensity dynamic range of the dataset (e.g., -1000 HU to +3000 HU) without signal clipping, thereby retaining compositional data for air, fat, and bone;
a texture extraction unit configured to apply a Contrast Limited Adaptive Histogram Equalization (CLAHE) operator with a clip limit optimized to separate tissue micro-texture from acquisition noise; and
a compositing unit configured to merge outputs of the first, second, and third units into a single multi-channel data structure.
-
The system of Claim 1, further comprising a deterministic chromatic weighting module configured to identify voxels corresponding to a region of interest and apply a fixed scalar multiplication vector (e.g., $\vec{V} = [1.2, 0.8, 0.8]$) to said voxels, thereby linearly shifting the chromatic manifold of the region of interest relative to the background.
-
The system of Claim 1, wherein said multi-channel data structure is an RGB image, and wherein the output of the first, second, and third units are mapped to the Red, Green, and Blue channels, respectively.
-
A method for training a neural network to classify neoplasms, comprising: (a) generating a dataset of synthetic multi-spectral images according to the system of Claim 1; (b) initializing a convolutional neural network with weights pre-trained on natural optical images; and (c) fine-tuning said weights using said synthetic dataset, thereby enabling transfer learning from optical to radiological domains.
-
The system of claim 1, further comprising a visualization interface configured to: (a) decompose the composite tensor into its constituent spectral channels; and (b) display said channels individually or as toggleable overlays alongside the original radiological slice, thereby enabling visual verification of the specific morphological features (texture, density, borders) contributing to the automated classification.
-
The system of Claim 1, wherein the compositing unit maps the outputs to an alternative color space such as YCbCr, LAB, or HSV to optimize feature decorrelation.
-
A non-transitory computer-readable medium storing instructions that, when executed by a processor, cause a computing device to perform the method of: extracting a slice from a volumetric dataset; decomposing said slice into structural, density-extremum, and texture-variance spectral bands; and compositing said bands into a tensor compatible with pre-trained optical neural networks.