On 13 December, the 6th edition of the Challenges in Machine Learning (CiML) workshop took place in Vancouver, Canada. This event, which is held annually, acts as a discussion platform for stakeholders involved in challenges in machine learning and data science – open online competitions that address problems by providing datasets or simulated environments. The aim of CiML is to allow participants to share their experience with challenge organization and to discuss best practices in challenge design.
This year’s edition was co-organized by Dr Adriënne Mendrik, eScience Coordinator at the Netherlands eScience Center. In the following interview, Adriënne looks back on the event and her involvement in it.
Can you briefly explain what the workshop is about?
The workshop is mostly focused on challenge design and provides the opportunity for challenge organizers to share lessons learned, what worked and what didn’t. Organizing challenges is a complex task and a huge amount of work, that involves both social and technical aspects. Therefore, it is important to learn from each other and gain new insights. In general, challenges in machine learning and data science are open online competitions that address problems by providing datasets or simulated environments. They measure the performance of machine learning algorithms with respect to a given problem, resulting in a leaderboard. The playful nature of challenges naturally attracts students, making challenges a great teaching resource. However, in addition to the use of challenges as educational tools, challenges have a role to play towards a better democratization of AI and machine learning. They function as cost effective problem-solving tools and a means of encouraging the development of re-usable problem templates and open-sourced solutions.
How did the idea for this year’s workshop come about?
Last year, I submitted an abstract to the CiML workshop at the Conference on Neural Information Processing Systems (NeurIPS). During this workshop, the organizer Isabelle Guyon asked participants whether they would be interested in organizing the 2019 edition of CiML. I volunteered, since challenge design is one of my research interests – I recently submitted a joint paper on the topic of challenge design in medical image analysis to arXiv.
CiML is one of the few workshops I know of that focuses on challenge design, so it presented a great opportunity to get together with other experts in the field and discuss challenge design.
What were the main goals?
This years’ topic was “Machine Learning Competitions for All”. I think challenges or machine learning competitions are a great way to bring people with different expertise together to work on relevant problems, and to gain insight into the strengths and weaknesses of algorithms. However, I also think there is a lot to gain in terms of challenge design and also in terms of diversity and awareness.
If challenges are about bringing people together to work on relevant problems and gain insight, it is good to think about how we can increase diversity in order to get better and more interesting results. This means that we should carefully consider the topics that we choose for challenges, but also think about how to for example include people with less experience in data science, people from different countries and people who are not competitive in nature, by for example emphasizing collaboration.
Our objective was twofold:one, to enlarge the challenge design community and foster greater diversity among participants and organizers;and two, to promote the organization of challenges for the benefit of more diverse communities.
Do you think you achieved these?
We are not there yet, but I think we made a great start and brought some great people together with various backgrounds. I especially loved the talks by Dina, Tara, Emily, Isabell and Frank. Dr. Dina Machuve is lecturer and researcher at the Nelson Mandela African Institution of Science and Technology (NM-AIST) in Arusha, Tanzania. She talked about her outlook from Africa, all the great work that is being done there, but also the hurdles that still need to be overcome. As a result of her talk, Kaggle (Google) reached out to Dina to discuss what they could do to overcome some of the hurdles. Dr. Tara Chklovski (CEO and founder of Technovation) was invited by Amir Banifatemi (XPrize). She talked about applying challenges to democratize AI, and the wonderful work she is doing to empower girls and families to be leaders and problem solvers in their lives and communities. Professor Emily M. Bender is the director of the Computational Linguistics Laboratory at the University of Washington (USA). She made us aware that we should consider how challenges impact direct and indirect stakeholders, for example a challenge (shared task) organized in her field on “Prediction of Intellectual Ability”. She proposed to include feedback from stakeholders in challenge design. Dr. Isabell Kiral (data scientist and blockchain researcher at IBM, Australia) talked about the epilepsy detection challenge she organized within IBM, where she tried to make it as easy as possible for people to participate, which resulted in people with various backgrounds being able to participate. Professor Frank Hutter is head of the machine learning lab at the University of Freiburg (Germany) and shared his thoughts about how we could make challenges less about engineering and give computer scientists the opportunity to benchmark their latest methodologies. At the end of the workshop we had open space sessions in which we discussed the organization of challenges for the benefit of more diverse communities.
Why are workshops like CiML important?
At present, the geographic and sociological repartition of challenge participants and organizers is very biased. While recent successes in machine learning have raised much hopes, there is a growing concern that the societal and economic benefits might increasingly be in the power and under control of a few. Therefore, it is absolutely crucial that organizers and practitioners regularly come together to discuss the ways to increase diversity and design challenges for the benefit of more diverse communities.
Is this workshop related to your work at the eScience Center?
Yes, at the eScience center I’m leading the EYRA benchmark project together with Annette Langedijk from SURF. It is important to keep up-to-date with the latest developments in challenge design so as to be able to advise researchers on how to design good benchmarks for science.
Are you planning to organize a similar workshop or event in 2020?
Yes, many people were enthusiastic about this years’ workshop. As a result we had many people who were interested in co-organizing the workshop next year. Isabelle Guyon (UPSud/INRIA, U. Paris-Saclay and ChaLearn) and Evelyne Viegas (Microsoft Research) are going to step down, because they have been organizing this workshop since 2014. I will continue next year together with Wei-Wei Tu (4Paradigm Inc. and ChaLearn), Tara Chklovski, Max Cappellari (XPrize), Justin Guinney (Sage Bionetworks), Isabell Kiral, Amir Banifatemi, and Gustavo Stolovitzky (IBM).
Read more about CiML
Read more about Adriënne Mendrik