Evaluasi Kelayakan dan Validitas Perbandingan Model Deep Learning Lintas Domain: Studi Kasus YOLO dan RNN
Abstract
The rapid development of deep learning has encouraged the use of various neural network architectures for diverse computational tasks. However, there is a growing tendency to compare the performance of models with different characteristics and objectives without a clear methodological framework, which can lead to scientific misconceptions. This study aims to analyze the validity of a direct comparison between Recurrent Neural Networks (RNN) and You Only Look Once (YOLO). A mixed-method approach was employed, combining a conceptual analysis of fundamental differences including model objectives, data types, output spaces, and evaluation metrics with limited empirical proof within each architecture's respective task domain. The results indicate that RNN and YOLO operate in entirely different representation spaces; RNN is designed to model temporal dependencies in sequential data, whereas YOLO focuses on spatial data processing for object detection. Therefore, it is concluded that a direct comparison between these two architectures is methodologically invalid, as image data lacks meaningful temporal dimensions for RNN processing, and sequential data lacks the spatial annotations required as ground truth for YOLO. Deep learning model evaluation must always be aligned with its original task domain to avoid biased and misleading conclusions.
References
Abiodun, B. I., Kumar, A., & Zomaya, A. Y. (2023). A systematic review of deep learning architectures for time series forecasting and image classification: Divergence in design and evaluation. IEEE Access, 11, 45678–45695. https://doi.org/10.1109/ACCESS.2023.3271234
Bengio, Y., Simard, P., & Frasconi, P. (1994). Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, 5(2), 157–166.
Chen, L., Wang, Y., Liu, Z., & Li, H. (2022). On the incompatibility of sequence modeling and spatial detection tasks in deep neural networks. Neural Networks, 156, 112–125. https://doi.org/10.1016/j.neunet.2022.09.007
Elman, J. L. (1990). Finding structure in time. Cognitive Science, 14(2), 179–211.
Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2010). The Pascal visual object classes (VOC) challenge. International Journal of Computer Vision, 88(2), 303–338.
Garcia-Martin, E., Rodrigues, C. F., Riley, G., & Grahn, H. (2021). Estimation of energy consumption in deep learning models across different hardware platforms. Journal of Parallel and Distributed Computing, 158, 1–13. https://doi.org/10.1016/j.jpdc.2021.07.010
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.
Khan, S., Naseer, M., Hayat, M., Arif, S., & Shah, M. (2022). Transformers in vision: A survey. ACM Computing Surveys, 55(10), 1–41. https://doi.org/10.1145/3543570
Liu, X., Zhang, F., Hou, Z., Wang, Z., Mian, L., Zhang, J., & Tang, J. (2021). Self-supervised learning: Generative or contrastive. IEEE Transactions on Knowledge and Data Engineering, 35(1), 857–876. https://doi.org/10.1109/TKDE.2021.3090091
Rasheed, F., Al-Fuqaha, A., Qadir, J., & Erbad, A. (2024). Benchmarking deep learning models: A critical analysis of evaluation metrics across domains. Information Fusion, 102, 345–360. https://doi.org/10.1016/j.inffus.2023.10.012
Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 779–788.
Redmon, J., & Farhadi, A. (2017). YOLO9000: Better, faster, stronger. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 7263–7271.
Shiri, F. M., Perumal, T., Mustapha, N., & Mohamed, R. (2024). A comprehensive overview and comparative analysis on deep learning models: CNN, RNN, LSTM, GRU. Journal on Artificial Intelligence, 6(1), 301–360. https://doi.org/10.32603/jai.2024.054314
Talaei Khoei, T., Ould Slimane, H., & Kaabouch, N. (2023). Deep learning: Systematic review, models, challenges, and research directions. Neural Computing and Applications, 35, 23103–23124. https://doi.org/10.1007/s00521-023-08957-4
Wang, P., Liu, H., Zhou, X., Xue, Z., Ni, L., Han, Q., & Li, J. (2024). Multidimensional evaluation methods for deep learning models in target detection for SAR images. Remote Sensing, 16(6), 1097. https://doi.org/10.3390/rs16061097
Zhou, H., Lan, T., & Huang, T. (2025). Task-aware model selection in deep learning: Why one-size-fits-all evaluation fails. Pattern Recognition, 158, 110234. https://doi.org/10.1016/j.patcog.2024.110234
Copyright (c) 2026 INFORMASI (Jurnal Informatika dan Sistem Informasi)

This work is licensed under a Creative Commons Attribution 4.0 International License.
Tanggung jawab Penulis
- Penulis menyajikan artikel penelitian atau hasil pemikiran secara jelas, jujur, dan tanpa plagiarisme.
- Penulis harus menunjukkan rujukan dari pendapat dan karya orang lain yang dikutip.
- Penulis bertanggungjawab atas konfirmasi yang diajukan atas artikel yang telah ditulis.
- Penulis harus menulis artikel secara etis, jujur, dan bertanggungjawab, sesuai dengan peraturan penulisan ilmiah yang berlaku.
- Penulis tidak keberatan jika artikel mengalami penyuntingan tanpa mengubah substansi









