## Saturday, 6 October 2018

### Religious demographics of India in future: A Machine Learning View

According to Sachar Committee ( ref-1) report in 2005, the religious demographics of India for next 100 years is below-

We took a machine learning approach and built different time series' to show demographics( of 2 major religion) in coming years. The data is taken from Wikipedia ( 2011 Census of India; ref 2) . Data used is given below-

Above image clearly shows that Hinduism is major religion followed by Islam. Lets create a new variable ratio of 'Hinduism to Islam' for these 70 years-

for 1951 ratio is 84.1/9.8, which is 8.581633, similarly for  other decades-

8.581633, 7.806361, 7.380018, 7.004255, 6.465504, 5.991065, 5.607871,

so Hinduism which was 8.5 times of Islam in 1951 is 5.6 times in 2011.

Now, let's build Arima time-series on ratio variable-

comman_ratio <- auto.arima(ratio)
forecasted_ratio <-forecast(comman_ratio, 10)

Above table and Image shows that around  2100, Islam and Hinduism will have equal number of followers. Is this forecasting correct??

Let's build another time series with different ratio, now variable is ratio of Islam to Hinduism population. This variable gives the percentage of Islam respect to Hinduism population in India.

0.1165279, 0.1281007 ,0.1355010, 0.1427704, 0.1546670, 0.1669152, 0.1783208 ( ratio1)

in 1951, Islam is 11 % of total Hinduism and in 2011 it's 17 % of total Hinduism in India.

comman_ratio1 <- auto.arima(ratio1)
forecasted_ratio <-forecast(comman_ratio, 80)

qq <- c(ratio1, forecasted_ratio\$mean)
year= seq(from = 1951, to=2811, by=10)
df <- data.frame(percentage_of_islam_compare_to_hinduism= qq, year =year )
ggplot2::ggplot(df, aes(year, percentage_of_islam_compare_to_hinduism)) + geom_line()

so this forecasting says that Islam is not going to be equal but 28% of total Hinduism and with current growth rate it would take 800 years for Islam to become equal to Hinduism in terms of followers.

So what is correct composition of demographics in 2100? Machine learning is giving different results based on variable taken. Plus 7 data points are not sufficient to forecast future 70 values. ☺☺Results might be different if we had taken only population of religions not the ratios.

ref:-

1) Sachar_Committee
2) 2011_Census_of_India
3) https://www.quora.com/What-was-the-Muslim-population-in-India-in-1947-and-now-in-2016