Perbandingan Metode Random Forest dan LightGBM untuk Prediksi Harga Berlian: Pendekatan Probabilistik dan Statistik

  • Daud Aldo Santoso Universitas Katolik Darma Cendika
  • Yosefina Finsensia Riti Universitas Katolik Darma Cendika

Abstract

Figuring out diamond prices in the market is actually pretty tricky because the prices go up and down a lot and depend heavily on what the diamond looks like physically. The main problem here is that people often struggle to guess the right price manually, which can easily lead to losing money if they guess wrong. Through this study, we really want to test and directly compare two smart Machine Learning tools, which are the Random Forest and LightGBM algorithms, to see which one is actually better and more reliable at forecasting these prices without just blindly guessing. For how we did it, we used a huge dataset containing about 53,940 past diamond sales as study material. This data came complete with nine main features like carat weight, cut quality, color pigment, and physical size. Before doing the math with the computer, we quickly threw away any weird or impossible data rows to keep it clean. Then, we changed all the text descriptions into numbers so the program could read them easily. Next, we split the whole dataset up so that 80 percent was used to teach the program, and the remaining 20 percent was kept purely for testing how smart their guesses really were.  The test results clearly showed that Random Forest did a much better job overall. It hit a super high R² accuracy score of 0.9835, and its guesses were only off by an average of $38.89 (MAE) and an RMSE of $77.24. This easily beat the LightGBM program, which only got an R² score of 0.9830, an MAE of $41.28, and an RMSE of $78.27. In short, this research proves that the team-work or ensemble system in Random Forest is way more reliable, stable, and accurate for predicting diamond prices in the market.

References

[1] W. Alsuraihi, E. Al-hazmi, K. Bawazeer, dan H. Alghamdi, “Machine Learning Algorithms for Diamond Price Prediction,” dalam Proceedings of the 2020 2nd International Conference on Image, Video and Signal Processing, dalam IVSP ’20. New York, NY, USA: Association for Computing Machinery, Mei 2020, hlm. 150–154. doi: 10.1145/3388818.3393715.
[2] H. Ranglani, “

Integrating Machine Learning for Diamond Price Prediction and Distinguishing Natural Diamonds from Lab Grown: A Unified Approach

,” 1 Maret 2025, Social Science Research Network, Rochester, NY: 5541978. doi: 10.2139/ssrn.5541978.
[3] A. Mankawade, C. Kokate, K. Soman, A. Mohite, A. Vispute, dan O. More, “Diamond Price Prediction Using Machine Learning Algorithms,” Int. J. Res. Appl. Sci. Eng. Technol., vol. 11, hlm. 4867–4871, Mei 2023, doi: 10.22214/ijraset.2023.52741.
[4] A. Agustina, Y. N. Putri, H. Nisah, S. D. Raihanah, A. Najib, dan V. Tundjungsari, “Analisis Prediksi Harga Berlian Menggunakan Metode Regresi Linier,” JATI J. Mhs. Tek. Inform., vol. 10, no. 1, hlm. 1825–1832, Feb 2026, doi: 10.36040/jati.v10i1.17567.
[5] M. A. Aulady, A. H. As, dan Z. Arifin, “Improve Metode Lightgbm untuk Prediksi Harga Mobil Bekas Menggunakan Hyper-Parameter Tuning,” TRILOGI J. Ilmu Teknol. Kesehat. Dan Hum., vol. 5, no. 3, hlm. 456–467, Sep 2024, doi: 10.33650/trilogi.v5i3.9000.
[6] D. Anggelia, Y. F. Riti, dan P. W. Siswanto, “Analisis Perbandingan Metode Arima Dan Least Square Untuk Prediksi Harga Emas: Pendekatan Probabilistik Dan Statistik,” J. Sist. Inf. Dan Inform. Simika, vol. 7, no. 1, hlm. 95–103, Mar 2024, doi: 10.47080/simika.v7i1.3197.
[7] T. Hastie, R. Tibshirani, dan J. H. Friedman, The elements of statistical learning: data mining, inference, and prediction, 2nd ed. dalam Springer series in statistics. New York, NY: Springer, 2009.
[8] L. Breiman, “Random Forests,” Mach. Learn., vol. 45, no. 1, hlm. 5–32, Okt 2001, doi: 10.1023/A:1010933404324.
[9] E. S. Lestari dan I. Astuti, “Penerapan Random Forest Regression Untuk Memprediksi Harga Jual Rumah Dan Cosine Similarity Untuk Rekomendasi Rumah Pada Provinsi Jawa Barat,” J. Ilm. FIFO, vol. 14, no. 2, hlm. 131, Nov 2022, doi: 10.22441/fifo.2022.v14i2.003.
[10] I. M. G. A. B. Putra dan I. K. G. Suhartana, “Implementasi Algoritma Random Forest Regression dalam Sistem Prediksi Harga Rumah di Jabodetabek,” J. Nas. Teknol. Inf. Dan Apl., vol. 4, no. 1, hlm. 27–38, Nov 2025, doi: 10.24843/JNATIA.2025.v04.i01.p04.
[11] P. S. Saputra, S.Ti., M.Kom. dan I. P. G. A. Sudiatmika, “Analisis Prediksi Harga Smartphone Tahun 2023 Menggunakan Model Random Forest Regression Berdasarkan Fitur-Fitur Spesifikasi Teknis,” KOMTEKS, vol. 3, no. 2, hlm. 13–17, Jan 2025, doi: 10.37637/komteks.v3i2.2233.
[12] G. Ke dkk., “LightGBM: A Highly Efficient Gradient Boosting Decision Tree,” dalam Advances in Neural Information Processing Systems, Curran Associates, Inc., 2017. Diakses: 30 Maret 2026. [Daring]. Tersedia pada: https://proceedings.neurips.cc/paper_files/paper/2017/hash/6449f44a102fde848669bdd9eb6b76fa-Abstract.html
[13] J. H. Friedman, “Greedy function approximation: A gradient boosting machine.,” Ann. Stat., vol. 29, no. 5, Okt 2001, doi: 10.1214/aos/1013203451.
[14] E. Febriantoro, E. Setyati, dan J. Santoso, “PEMODELAN PREDIKSI KUANTITAS PENJUALAN MAINAN MENGGUNAKAN LightGBM,” SMARTICS J., vol. 9, no. 1, hlm. 7–13, Apr 2023, doi: 10.21067/smartics.v9i1.8279.
[15] “Diamonds.” Diakses: 30 Maret 2026. [Daring]. Tersedia pada: https://www.kaggle.com/datasets/shivam2503/diamonds
Published
2026-06-02
How to Cite
Santoso, D. A., & Riti, Y. F. (2026). Perbandingan Metode Random Forest dan LightGBM untuk Prediksi Harga Berlian: Pendekatan Probabilistik dan Statistik. Jurnal Teknologika, 16(1), 1074-1082. https://doi.org/10.51132/teknologika.v16i1.685