A Deep Reinforcement Learning (DRL)-Based Transformer Method for Solving the Open Shop Scheduling Problem

Overview


This paper introduces a novel Transformer-based scheduling policy for the Open Shop Scheduling Problem (OSSP), a computationally challenging combinatorial optimization problem common in industrial and service environments. As OSSP instances grow in size (number of jobs and machines), exact methods become intractable, and classical heuristics often require extensive tuning. The proposed approach leverages an encoder-decoder Transformer architecture with multi-head attention, trained exclusively on small-scale Taillard benchmark instances (4x4, 5x5, 7x7, and 10x10), using only the processing-time matrix as input.


Methodology


The model generates feasible schedules with makespans typically within 15–30% of best-known values on the training benchmarks. To evaluate scalability—a critical concern for real-world deployment in 2026 and beyond—the trained policy is applied without any retraining to randomly generated instances ranging from 40x40 to 100x100 jobs and machines.


Results


The Transformer is compared against classical dispatching heuristics: Shortest Processing Time (SPT), Longest Processing Time (LPT), Most Work Remaining (MWKR), and Earliest Start Time (EST). The key findings include:


  • Average gaps of 12.89–15.12% relative to a standard lower bound across large instances.
  • Competitive performance with EST, typically within a modest margin.
  • Substantial outperformance of SPT and LPT across all large-scale tests.

These results demonstrate that a Transformer policy trained on small OSSP instances can generalize effectively to substantially larger problems, offering a feature-light, learning-based alternative to classical dispatching rules. This approach is particularly relevant given the growing demand for scalable, automated scheduling solutions in smart manufacturing and logistics as of 2026.


Conclusion


The study highlights the potential of deep reinforcement learning combined with Transformer architectures for combinatorial scheduling, paving the way for further research into zero-shot generalization and integration with real-time decision systems.


Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)


Cite as: arXiv:2606.13682 [cs.AI]


DOI: https://doi.org/10.48550/arXiv.2606.13682

via ArXiv AI

Related