Refining Vision-Language Models for Lithography Defect Detection

Refining Vision-Language Models for Lithography Defect Detection


The semiconductor industry continues to push the boundaries of chip miniaturization, making defect detection in lithography processes increasingly critical. By 2026, advanced nodes (sub-3nm) will require inspection systems capable of identifying defects at atomic scales while maintaining throughput. Vision-language models (VLMs) are emerging as a transformative solution, combining computer vision with natural language understanding to enhance defect classification and root cause analysis.


The Challenge in Lithography Inspection


Traditional rule-based and purely visual inspection systems struggle with the complexity of modern lithography defects. These include:

  • Bridge defects and pinch points in dense metal layers
  • Line edge roughness (LER) variations
  • Pattern collapse in high-aspect-ratio structures
  • Stochastic defects from extreme ultraviolet (EUV) lithography

Current machine vision approaches often require extensive labeled datasets and fail to generalize to novel defect types.


How Vision-Language Models Improve Detection


VLMs bridge the gap between image analysis and human-like reasoning. Key advancements include:


1. Multi-modal Feature Fusion

By jointly encoding SEM/TEM images with textual defect descriptions (e.g., “bridging between metal lines at pitch 32nm with 2nm CD variation”), VLMs can learn rich representations that separate subtle defect signatures from process variation noise.


2. Zero-shot and Few-shot Learning

In 2026 production environments, retraining models for every new defect type is impractical. VLMs like CLIP- and BLIP-based architectures enable zero-shot detection of previously unseen defects by leveraging pre-trained knowledge of physics-based defect mechanisms.


3. Interactive Defect Analysis

Engineers can query models via natural language (“show all stitching defects in block B of the DRAM array”), enabling rapid root cause analysis without manual image sifting.


4. Knowledge Distillation from Expert Annotations

Hybrid approaches combine synthetic defect generation with actual fab data. VLMs trained on synthetic-to-real domains show up to 40% improvement in recall for rare defect classes (according to 2025 studies by imec and MIT).


Implementation Strategies for 2026 Fabs


| Component | Recommendation |

|-----------|----------------|

| Model Architecture | Transformer-based dual encoders with cross-attention (e.g., Flamingo-like variants) |

| Training Data | Mix of real SEM images, synthetic defect inserts, and CAD layout overlays |

| Inference Hardware | Edge-optimized NPUs with 4-8 TOPS/W for inline inspection |

| Defect Taxonomy | Structured ontology covering structural, optical, and material defects |

| Retrieval Augmented Generation (RAG) | Integrate process control history for context-aware detection |


Real-World Results


Pilot deployments at leading foundries (2025–2026) demonstrate:

  • 15–25% higher capture rate for killer defects compared to traditional DL-based methods
  • 70% reduction in false positives on critical layers via language-guided filtering
  • Sub-5 minute adaptation time to new defect types using prompt tuning

Looking Ahead


By 2027, we anticipate fully closed-loop VLM systems that not only detect defects but also suggest lithography process adjustments (e.g., dose or focus corrections). The combination of large language models (LLMs) with physics-aware vision encoders will redefine defect metrology, making zero-defect manufacturing increasingly attainable.


This article is based on papers presented at SPIE Advanced Lithography 2026 and industry briefings from ASML, Applied Materials, and KLA.

via Semiconductor Engineering

Related