| Peer-Reviewed

Estimating Population Total Using Machine Learning Logistic Regression: COVID-19 Pandemic Challenges Perspective

Received: 8 January 2021     Accepted: 15 January 2021     Published: 22 January 2021
Views:       Downloads:
Abstract

The estimation of the population total in undeveloped and developing countries in the recent past has attracted a lot of interest to many researchers due to the sole purpose of planning resource allocation, personnel training and infrastructure in social, health, transport, communication and education. The comprehensive census survey in many countries are conducted every ten years but the government administration changes in many counties every four to five years due to the limit of government terms as per the constitution and therefore does not coincide with the time of census survey. Further, due to the emerging COVID-19 pandemic challenges that requires ministry of health protocols of social distance, the census survey in which the methods of questionnaire and personal interview are commonly used need to be avoided and therefore there is need to search for a better and reliable estimating models for estimating the population total which is the main focus of the study. The existing and developed methods of exponential and logistic class of population total estimating modes have been considered and compared. The main problem in the logistic models in estimating the population total is the estimation of the highest possible population that can be attained for each of the administrative units. In this study a machine learning logistic regression has been proposed and incorporated to search and estimate the constant using the supervised learning process. The performance of the methods have been compared using the Root Mean Square Error (RMSE) whose values were recorded as 1.062, 1.524, 0.477, 0.819 and 0.286 for the exponential, logistic I, Logistic II, logistic III and machine learning logistic (logistic IV) in which the proposed model performed better with the least square error value of 0.286. The proposed model was then used to project the population total and projected the population total for all regions as 51.00, 55.02, 62.50, 69.10, 74.65 and 79.14 in millions in the years 2024, 2029, 2039, 2049, 2059 and 2069 respectively.

Published in American Journal of Theoretical and Applied Statistics (Volume 10, Issue 1)
DOI 10.11648/j.ajtas.20211001.14
Page(s) 22-31
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2021. Published by Science Publishing Group

Keywords

Population Total Estimates, Growth, COVID-19, Logistic Regression and Projection

References
[1] Osaki-Tomita, K., Mrkic, S., Mbogoni, M., Tadesse, S., & Demirci, M. (2017). Principles and Recomendations for Population and Housing Censuses. New York: United Nations Publication.
[2] Wesley, E., & Peterson, F. (2017). The Role of Population in Economic Growth. SAGE Open, 01 15.
[3] Heady, H., & Hodge, A. (2009). The Effect of Population Growth on Economic Growth: A Meta-Regression Analysis of the Micro-Economic Literature. Population and Development Review, 35, 221-248.
[4] Hathout, D. (2013). Modeling Population Growth: Exponential and Hyperbolic Modeling. Applied Mathematics, 4, 299-304.
[5] Gotelli, N. J. (2001). A Primer of Ecology. Sunderland: Sinauer Associates.
[6] Kabareh, L., & Mageto, T. (2018). Estimation of Finite Population Total Using Birth and Death Process. International Journal of Engineering, Science and Mathematics, 7 (3), 33-48.
[7] Kabareh, L., Mageto, T., & Mwema, B. (2017). Approximation of Finite Population Totals Using Lagrange Polynomial. Open Journal of Statistics, 7, 689-701.
[8] Kabareh, L., & Mageto, T. (2017). Comparison of the Piecewise Polynomial Approximation to the Newton Backward Difference Polynomial Approximation of Finite Population Totals. International Journal of Engineering, Science and Matheamtics, 6 (7), 12-26.
[9] Kabareh, L., & Mageto, T. (2017). Estimation of Bounded Population and Carrying Capacity with the Logistic Model. Open Journal of Statistics, 7, 936-943.
[10] Kabareh, L., & Mageto, T. (2018). Estimating Bounded Population Total Using Linear Regression in the Presence of Supporting Information. International Journal of Mathematics and Computational Science, 4 (3), 112-117.
[11] Kulkami, S., Kulkami, S., & Patil, S. (2014). Analysis of Population Growth of India and Estimation for Future. International Journal of Innovative Research in Science, Engineering and Technology, 3 (9), 15843-15850.
[12] Agarwal, B. L. (1991). Basic Statistics. New Delhi: Wiley Eastern Limited.
[13] Keyfitz, N., & Caswell, H. (2005). Applied Mathematical Demography. New York: Springer Science Business Media, Inc.
[14] Berman, A. K., & Paul, L. J. (2008). Algorithms. New Delhi: Centage Learning India Private Limited.
[15] Jhingan, M., Bhatt, B., & Desai, J. (2007). Demography. Delhi: Vrinda Publications (P) LTD.
[16] Mwangi, Z. (2019). 2019 Kenya Population and Housing Census Volume III: Distribution of Population by Age, Sex and Administration Units. Nairobi: Kenya National Bureau of Statistics.
[17] Secretariat, U. N. (2014). Country Classification. Data Sources, Country Classification and Aggregation Methodology, pp. 1-8.
[18] Pagano, M., & Gauvreau, K. (2008). Principles of Biostatistics. New Delhi: Cengage Learning India Private Limited.
[19] Kenya Infant Mortality Rate 1960 - 2020. (2020). Retrieved from Macrotrends: https://www.macrotrends.net/countries/KEN/kenya/infant-mortality-rate.
[20] Hair, J., Black, W., Babin, B., & Anderson, R. (2014). Multivariate Data Analysis. Harlow: Pearson Education Limited.
[21] Cochran, G. W. (1992). Sampling Techniques. New Delhi: Wiley Eastern Limited.
Cite This Article
  • APA Style

    Thomas Mageto. (2021). Estimating Population Total Using Machine Learning Logistic Regression: COVID-19 Pandemic Challenges Perspective. American Journal of Theoretical and Applied Statistics, 10(1), 22-31. https://doi.org/10.11648/j.ajtas.20211001.14

    Copy | Download

    ACS Style

    Thomas Mageto. Estimating Population Total Using Machine Learning Logistic Regression: COVID-19 Pandemic Challenges Perspective. Am. J. Theor. Appl. Stat. 2021, 10(1), 22-31. doi: 10.11648/j.ajtas.20211001.14

    Copy | Download

    AMA Style

    Thomas Mageto. Estimating Population Total Using Machine Learning Logistic Regression: COVID-19 Pandemic Challenges Perspective. Am J Theor Appl Stat. 2021;10(1):22-31. doi: 10.11648/j.ajtas.20211001.14

    Copy | Download

  • @article{10.11648/j.ajtas.20211001.14,
      author = {Thomas Mageto},
      title = {Estimating Population Total Using Machine Learning Logistic Regression: COVID-19 Pandemic Challenges Perspective},
      journal = {American Journal of Theoretical and Applied Statistics},
      volume = {10},
      number = {1},
      pages = {22-31},
      doi = {10.11648/j.ajtas.20211001.14},
      url = {https://doi.org/10.11648/j.ajtas.20211001.14},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajtas.20211001.14},
      abstract = {The estimation of the population total in undeveloped and developing countries in the recent past has attracted a lot of interest to many researchers due to the sole purpose of planning resource allocation, personnel training and infrastructure in social, health, transport, communication and education. The comprehensive census survey in many countries are conducted every ten years but the government administration changes in many counties every four to five years due to the limit of government terms as per the constitution and therefore does not coincide with the time of census survey. Further, due to the emerging COVID-19 pandemic challenges that requires ministry of health protocols of social distance, the census survey in which the methods of questionnaire and personal interview are commonly used need to be avoided and therefore there is need to search for a better and reliable estimating models for estimating the population total which is the main focus of the study. The existing and developed methods of exponential and logistic class of population total estimating modes have been considered and compared. The main problem in the logistic models in estimating the population total is the estimation of the highest possible population that can be attained for each of the administrative units. In this study a machine learning logistic regression has been proposed and incorporated to search and estimate the constant using the supervised learning process. The performance of the methods have been compared using the Root Mean Square Error (RMSE) whose values were recorded as 1.062, 1.524, 0.477, 0.819 and 0.286 for the exponential, logistic I, Logistic II, logistic III and machine learning logistic (logistic IV) in which the proposed model performed better with the least square error value of 0.286. The proposed model was then used to project the population total and projected the population total for all regions as 51.00, 55.02, 62.50, 69.10, 74.65 and 79.14 in millions in the years 2024, 2029, 2039, 2049, 2059 and 2069 respectively.},
     year = {2021}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - Estimating Population Total Using Machine Learning Logistic Regression: COVID-19 Pandemic Challenges Perspective
    AU  - Thomas Mageto
    Y1  - 2021/01/22
    PY  - 2021
    N1  - https://doi.org/10.11648/j.ajtas.20211001.14
    DO  - 10.11648/j.ajtas.20211001.14
    T2  - American Journal of Theoretical and Applied Statistics
    JF  - American Journal of Theoretical and Applied Statistics
    JO  - American Journal of Theoretical and Applied Statistics
    SP  - 22
    EP  - 31
    PB  - Science Publishing Group
    SN  - 2326-9006
    UR  - https://doi.org/10.11648/j.ajtas.20211001.14
    AB  - The estimation of the population total in undeveloped and developing countries in the recent past has attracted a lot of interest to many researchers due to the sole purpose of planning resource allocation, personnel training and infrastructure in social, health, transport, communication and education. The comprehensive census survey in many countries are conducted every ten years but the government administration changes in many counties every four to five years due to the limit of government terms as per the constitution and therefore does not coincide with the time of census survey. Further, due to the emerging COVID-19 pandemic challenges that requires ministry of health protocols of social distance, the census survey in which the methods of questionnaire and personal interview are commonly used need to be avoided and therefore there is need to search for a better and reliable estimating models for estimating the population total which is the main focus of the study. The existing and developed methods of exponential and logistic class of population total estimating modes have been considered and compared. The main problem in the logistic models in estimating the population total is the estimation of the highest possible population that can be attained for each of the administrative units. In this study a machine learning logistic regression has been proposed and incorporated to search and estimate the constant using the supervised learning process. The performance of the methods have been compared using the Root Mean Square Error (RMSE) whose values were recorded as 1.062, 1.524, 0.477, 0.819 and 0.286 for the exponential, logistic I, Logistic II, logistic III and machine learning logistic (logistic IV) in which the proposed model performed better with the least square error value of 0.286. The proposed model was then used to project the population total and projected the population total for all regions as 51.00, 55.02, 62.50, 69.10, 74.65 and 79.14 in millions in the years 2024, 2029, 2039, 2049, 2059 and 2069 respectively.
    VL  - 10
    IS  - 1
    ER  - 

    Copy | Download

Author Information
  • Department of Statistics and Actuarial Sciences, Jomo Kenyatta University of Agriculture and Technology, Juja, Kenya

  • Sections