Winners of Zindi Kenya Festive Hack 2020: Tourism Expenditure Prediction Challenge
We don't grow when things are easy, we grow when we face challenges.
On the 19th of December 2020, the Zindi Ambassador team in Kenya organized a Kenya hacking challenge for students and the community to take part in. Zindi is Africa’s largest competition platform creating an opportunity for data enthusiasts to solve complex challenges using Artificial Intelligence and Machine Learning. The hack was sponsored by Nairobi Women in Machine Learning and Data Science, Alliance4ai, and Safaricom.
The hack turn up was impressive. This shows how the youth are motivated to create an impact in the world and they are working so hard. Congratulations to all participants who attempted the challenge. We were also happy to have 39% gender representation which is above 23% of the platform average. We had a total of 80 participants, 1465 submissions and our winner managed a prediction score of 4426609.717. Without further ado, this is the challenge, and the top three winners sharing their insights and approaches.
TANZANIA TOURISM EXPENDITURE PREDICTION CHALLENGE
Tanzania is an East African country known for its vast wilderness areas. They include the plains of Serengeti National Park, a safari mecca populated by the “big five” game (elephant, lion, leopard, buffalo, rhino), and Kilimanjaro National Park, home to Africa’s highest mountain. Offshore lie the tropical islands of Zanzibar, with Arabic influences, and Mafia, with a marine park home to whale sharks and coral reefs.
The tourism sector plays a significant role in the Tanzanian economy, contributing about 17% of the country’s GDP and 25% of all foreign exchange revenues. The sector, which provides direct employment for more than 600,000 people and up to 2 million people indirectly, generated approximately $2.4 billion in 2018 according to government statistics. Tanzania received a record 1.1 million international visitor arrivals in 2014, mostly from Europe, the US, and Africa.
The objective of this hackathon was to develop a machine learning model to predict what a tourist will spend when visiting Tanzania. The model can be used by different tour operators and the Tanzania Tourism Board to automatically help tourists across the world estimate their expenditure before visiting Tanzania.
The hackathon took 24 hours, from Saturday at 11.00 am to Sunday at 11.00 am. For sure these guys meant business.
MEET OUR AVENGERS OF THE ZINDI KENYA FESTIVE HACK 2020
WINNER: Daniel Kuria
Daniel Kuria is a recent Moringa School graduate. He has a background in Chemistry and previously worked in the manufacturing sector.
1 . Please explain your solution and approach to the challenge?
I explored various ways of encoding the categorical data from mean encoding, Catboost’s implementation, and a mix of Label Encoding and One Hot encoding, and the latter achieved the best score.
This was one of the most important parts of the solution. I explored feature interactions by testing combinations of different variables especially those that had the most predictive power from the baseline model feature importances plot.
My model consisted of a 2 layer stack of variations of diverse models whose predictions were used to train a meta learner, CatBoost each. The two predictions were then blended into the final solution.
3. What were the things that made a difference for you that you think others can learn from?
Most model’s default loss function is mean square error. It’s quite different from the evaluation metric of the hackathon, mean absolute error, as it penalizes more outliers. Therefore, that parameter was very important to set and improved the score dramatically.
4. What are the biggest areas of opportunity you see in AI in Kenya over the next few years?
One of the biggest areas of opportunity is in agriculture. Agriculture is the backbone of our economy yet agricultural productivity has stagnated due to inefficient technology. The use of AI-driven technologies from the soil and crop monitoring, as well as predictive weather forecasting, will bring a significant impact on the sector.
5. What are you looking forward to most about the Zindi community?
More hackathons. Participating in data science datathons has been an important step in my data science journey as I’ve come to learn new techniques while working to solve new problems.
6. Any advice you would like to leave the participants with or tips?
The importance of optimizing for the competition’s evaluation metric cannot be overstated!
FIRST RUNNER UP: PEREZ OGAYO
Perez Ogayo is a final year Computer Science student at African Leadership University. She is currently interested in Natural Language Processing.
Zindi Username: Ogayo
- Please explain your solution and approach in detail?
I chose to use XGBoost because it has proven to get really good results across different problem sets. Within the first few minutes of the competition, I realized that I was overfitting. From the private leaderboard scores, I found out that there was a range of scores on my validation set that I should not go beyond or above. This made it really difficult for me to figure out the best approach to take as normally you would want to go lower. So I figured I would try to get scores within that range but using different feature engineering and preprocessing.
2. What were the things that made the difference for you that you think others can learn from?
Choosing the right validation strategy. The dataset was really small and the private leaderboard scores were not really reliable. So having a good validation strategy and trusting it made the difference. Also, when you have such unreliable leaderboard scores, you might want to make submissions as you can. You never know which one will be the winning one. My winning score was never my best score on the private leaderboard at any point.
3. What are the biggest areas of opportunity you see in AI in Kenya over the next few years?
E-commerce, agriculture, and agribusiness. This is mostly because our economy is based on agriculture and anyone who can tap into this opportunity will reap big.
4. What are you looking forward to most about the Zindi community?
Taking part in many diverse challenges and learning from others.
5. Any advice you would like to leave the participants with or tips?
Keep learning and taking part in challenges. You only improve by putting by practicing what you learn.
Second Runner-up: TheGeeks
They are both fourth-year students; Wilberforce Wairagu from Technical University of Kenya Pursuing Bsc. Electrical and Electronics Engineering and Geoffrey Nyamwaya from Karatina University Pursuing Bsc. Computer Science.
Please explain your solution and approach?
For the Kenya Hack — Tanzania Tourism Prediction Challenge, you were to develop a machine learning model that would predict what a tourist will spend while visiting Tanzania based on the data given.
Our first approach was to identify what type of problem this is, we clearly noticed that it’s a regression problem. We then did some exploratory data analysis to understand and gain insights from the data then did some data cleaning by for instance dropping null values.
We then did some feature engineering where we tried coming up with new features from existing ones that would aid in the training of our model. We did four new features as explained in our GitHub repository: Github Repo
For the modeling part, our best score came from a catBoost Regressor where we made ten different predictions based on the depth of the model and averaged them then did our prediction on the test dataset that gave the third winning score.
What were the things that made the difference for you that you think others can learn from?
We spent most of our time in the analysis doing Exploratory Data Engineering(EDA) and performed feature engineering which did the trick. This helped us come up with new features that contributed most to the feature importance attributes hence great prediction scores.
What are the biggest areas of opportunity you see in AI in Kenya over the next few years?
Artificial intelligence is a predominant field in rising technology. First and foremost, in medicine, it can be coexistent with the human intelligence to make tasks easier for example performing major surgeries hence reducing human errors, manufacture of medicines, diagnosis, and treatment of diseases among others. Furthermore, it can be used in security by detecting imminent threats, analyzing log files hence leveraging cybersecurity and they can also be used in war for example robots for carrying dangerous weapons.
What are you looking forward to most about the Zindi community?
We are looking forward to competing in more challenges, share knowledge with other skilled data scientists to come up with great solutions, and finally, emerge the best of the best in other competitions and hackathons.
Any advice you would like to leave the participants with or tips?
We would like to encourage all the participants that we should never give up and let us keep on practicing each and every day as practice makes perfect. The tip we would love to leave is to let us all perfect the art of feature engineering.
Well, I guess you have heard of how they managed to practice their skills and secure the bag. If you would like to communicate more with them, visit the Zindi platform and message them via their username stated above.
Thank you for visiting this blog and for taking the time to learn from our winners. Feel free to share the blog link with your community, Follow, Clap, and Comment😁.
Happy Holidays 🎇🎊
David Davis: https://davis-david.medium.com/ — Tanzania’s approach to the tourist expenditure prediction challenge.