Article Overview
arXiv:2606.11207 (cs.AI) — Submitted on 23 Apr 2026
Authors: Liu Hung Ming
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Cite as: arXiv:2606.11207 [cs.AI]
Abstract
We present SemantiClean, a modular framework designed to extract structured semantic signals from e-commerce session data and drive pluggable inference targets—including purchase intent, customer segmentation, and product affinity—through a shared element library. In contrast to conventional end-to-end predictors that optimize solely for accuracy, SemantiClean prioritizes auditability, structural governance, and sigma=0 reproducibility. The framework explicitly trades marginal predictive gains for element-level transparency and defensible decision trails.
Built upon the Online Shoppers Purchasing Intention (OSPI) dataset, the framework organizes twenty-four behavioral elements into a four-layer architecture: Functional, Interaction, Systemic, and Contextual. Three anti-inflation mechanisms enforce signal quality:
- RedundancyGroup contribution caps
- TieredPenaltyCalculator bias penalties
- AdaptiveConstraintMode cold-start protection
This report introduces the LLM-Integrated Semantic Inference Engine, a fully implemented two-phase LLM-driven inference architecture that leverages complete element metadata at inference time. All quantitative results reported herein are produced by this engine. While deterministic engine outputs remain fully reproducible (sigma=0), LLM-dependent results for targets E8 and E10 are subject to controlled output variability under fixed provider, model, and temperature settings. The gender inference target remains non-functional in the current implementation and is excluded from all quantitative results.
Significance for 2026
As e-commerce platforms increasingly rely on AI for user understanding, the demand for explainable and auditable AI systems has never been higher. SemantiClean addresses the growing regulatory and ethical need for transparent decision-making in behavioral targeting. By providing a structured, reproducibility-focused alternative to black-box predictors, the framework aligns with emerging 2026 standards for responsible AI and governance-ready machine learning in commercial environments.
via ArXiv AI
