Application of Machine Learning Models to Predict Small Rodent Populations (using Myodes rutilus as an example)
Yamborko A.V.1,2,3, Timoshilov V.I.3,4
1Federal Center for Analysis and Assessment of Technogenic Impact, Research Center for Rare and Endangered Species of Animals and Plants (branch), Moscow, Russia
2Institute of Biological Problems of the North, Far Eastern Branch of the Russian Academy of Sciences, Magadan, Russia
3Moscow Institute of Physics and Technology, Moscow, Russia
4Kursk State Medical University, Kursk, Russia
Abstract. A comparative analysis of machine learning regression models was conducted to predict changes in the relative abundance of natural northern red vole populations one year in advance. Demographic and climate data for the eastern subarctic as a whole, as well as for two locations within the region, were used to train, validate, and test random forest and multilayer perceptron models. Each dataset, containing annual observations of various indicators, is represented in the model as a feature vector (the target feature is relative abundance one year in advance; the predictors are population and climate data), without reference to a temporal structure. Traditional time series forecasting derives future values from a sequence of past values, while vector forecasting treats each observation as a separate point in feature space, forming vector representations of the data. A multilayer perceptron has been shown to yield better results and forecast accuracy across all samples. A random forest is characterized by lower robustness and accuracy. Neural network techniques, like a multilayer perceptron, function better for both the entire area and specific locations. A random forest model can be used if fast modeling or interpretability is required. Using red vole populations as an example it was shown that an accurate forecast of the relative abundance of small rodents one year in advance could be obtained using a time-limited dataset on the status of populations and their habitats. Machine learning models can be applied to solving problems in epidemiology and plant protection.
Key words: machine learning, random forest, multilayer perceptron, population forecasting, Myodes rutilus