Evaluasi Kelayakan dan Validitas Perbandingan Model Deep Learning Lintas Domain: Studi Kasus YOLO dan RNN

  • Chairuddin . STMIK IM
  • Yudhi Widya Arthana Rustam STMIK IM
  • Muhammad Haniif Muzaki STMIK IM

Abstract

The rapid development of deep learning has encouraged the use of various neural network architectures for diverse computational tasks. However, there is a growing tendency to compare the performance of models with different characteristics and objectives without a clear methodological framework, which can lead to scientific misconceptions. This study aims to analyze the validity of a direct comparison between Recurrent Neural Networks (RNN) and You Only Look Once (YOLO). A mixed-method approach was employed, combining a conceptual analysis of fundamental differences including model objectives, data types, output spaces, and evaluation metrics with limited empirical proof within each architecture's respective task domain. The results indicate that RNN and YOLO operate in entirely different representation spaces; RNN is designed to model temporal dependencies in sequential data, whereas YOLO focuses on spatial data processing for object detection. Therefore, it is concluded that a direct comparison between these two architectures is methodologically invalid, as image data lacks meaningful temporal dimensions for RNN processing, and sequential data lacks the spatial annotations required as ground truth for YOLO. Deep learning model evaluation must always be aligned with its original task domain to avoid biased and misleading conclusions.

References

Abiodun, B. I., Kumar, A., & Zomaya, A. Y. (2023). A systematic review of deep learning architectures for time series forecasting and image classification: Divergence in design and evaluation. IEEE Access, 11, 45678–45695. https://doi.org/10.1109/ACCESS.2023.3271234

Bengio, Y., Simard, P., & Frasconi, P. (1994). Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, 5(2), 157–166.

Chen, L., Wang, Y., Liu, Z., & Li, H. (2022). On the incompatibility of sequence modeling and spatial detection tasks in deep neural networks. Neural Networks, 156, 112–125. https://doi.org/10.1016/j.neunet.2022.09.007

Elman, J. L. (1990). Finding structure in time. Cognitive Science, 14(2), 179–211.

Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2010). The Pascal visual object classes (VOC) challenge. International Journal of Computer Vision, 88(2), 303–338.

Garcia-Martin, E., Rodrigues, C. F., Riley, G., & Grahn, H. (2021). Estimation of energy consumption in deep learning models across different hardware platforms. Journal of Parallel and Distributed Computing, 158, 1–13. https://doi.org/10.1016/j.jpdc.2021.07.010

Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.

Khan, S., Naseer, M., Hayat, M., Arif, S., & Shah, M. (2022). Transformers in vision: A survey. ACM Computing Surveys, 55(10), 1–41. https://doi.org/10.1145/3543570

Liu, X., Zhang, F., Hou, Z., Wang, Z., Mian, L., Zhang, J., & Tang, J. (2021). Self-supervised learning: Generative or contrastive. IEEE Transactions on Knowledge and Data Engineering, 35(1), 857–876. https://doi.org/10.1109/TKDE.2021.3090091

Rasheed, F., Al-Fuqaha, A., Qadir, J., & Erbad, A. (2024). Benchmarking deep learning models: A critical analysis of evaluation metrics across domains. Information Fusion, 102, 345–360. https://doi.org/10.1016/j.inffus.2023.10.012

Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 779–788.

Redmon, J., & Farhadi, A. (2017). YOLO9000: Better, faster, stronger. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 7263–7271.

Shiri, F. M., Perumal, T., Mustapha, N., & Mohamed, R. (2024). A comprehensive overview and comparative analysis on deep learning models: CNN, RNN, LSTM, GRU. Journal on Artificial Intelligence, 6(1), 301–360. https://doi.org/10.32603/jai.2024.054314

Talaei Khoei, T., Ould Slimane, H., & Kaabouch, N. (2023). Deep learning: Systematic review, models, challenges, and research directions. Neural Computing and Applications, 35, 23103–23124. https://doi.org/10.1007/s00521-023-08957-4

Wang, P., Liu, H., Zhou, X., Xue, Z., Ni, L., Han, Q., & Li, J. (2024). Multidimensional evaluation methods for deep learning models in target detection for SAR images. Remote Sensing, 16(6), 1097. https://doi.org/10.3390/rs16061097

Zhou, H., Lan, T., & Huang, T. (2025). Task-aware model selection in deep learning: Why one-size-fits-all evaluation fails. Pattern Recognition, 158, 110234. https://doi.org/10.1016/j.patcog.2024.110234

Published
2026-05-31