Build Machine Learning Model from Noisy Labeled Data
Learn how to build machine learning model from noisy labeled data. Machine learning model operating in big and small enterprises is often trained on noisy, crowdsourced, or user-generated data. As the annotators are not experts in the application domain or data labelling, ML specialists have to take this property into account when training and operating with the model. The talk is intended for ML engineers and researchers and will show them how to take into account the specifics of crowdsourced annotations when building or improving their own machine learning model. We will look at three important issues: • How to properly account for noisy labelled data when training a model • How to take into account the subjective responses of annotators • How to track distribution bias using model monitoring Resources: 1. Zheng et al. “Truth Inference in Crowdsourcing: Is the Problem Solved?” Proceedings of the VLDB Endowment, vol. 10, no. 5, Jan. 2017, pp. 541–552. https://doi.org/10.14778/3055540.3055547 2. Uma et al. “Learning from Disagreement: A Survey.” Journal of Artificial Intelligence Research, vol. 72, Dec. 2021, pp. 1385–1470. https://doi.org/10.1613/jair.1.12752 3. Ustalov et al. “Learning from Crowds with Crowd-Kit.” arXiv. https://arxiv.org/abs/2109.08584 & https://github.com/Toloka/crowd-kit 4. Ustalov. “Guide to Data Labeling for Search Relevance Evaluation.” Towards Data Science. https://towardsdatascience.com/guide-to-data-labeling-for-search-relevance-evaluation-a197862e5223 5. Maystre and Grossglauser. “Just Sort It! A Simple and Effective Approach to Active Preference Learning.” Proceedings of the 34th International Conference on Machine Learning, pp. 2344–2353. https://proceedings.mlr.press/v70/maystre17a.html 6. Zhang et al. “Crowdsourced Top-k Algorithms: An Experimental Evaluation.” Proceedings of the VLDB Endowment, vol. 9, no. 8, Apr. 2016, pp. 612–623. https://doi.org/10.14778/2921558.2921559 Table of Contents: 00:00 Introduction 11:56 Objective data 24:07 Subjective Data 33:17 Model Operations 50:47 Conclusion and Q&A For more captivating community talks featuring renowned speakers, check out this playlist: https://youtube.com/playlist?list=PL8eNk_zTBST-EBv2LDSW9Wx_V4Gy5OPFT For further tutorials on the fundamentals of machine learning, check out this exclusive playlist: https://youtube.com/playlist?list=PL8eNk_zTBST-RTog7CPYvRfs1pYRWkPHG 💼 Learn to build LLM-powered apps in just 40 hours with our Large Language Models bootcamp: https://hubs.la/Q01ZZGL-0 #machinelearningmodel #machinelearning #finetuning #llm
Learn how to build machine learning model from noisy labeled data. Machine learning model operating in big and small enterprises is often trained on noisy, crowdsourced, or user-generated data. As the annotators are not experts in the application domain or data labelling, ML specialists have to take this property into account when training and operating with the model. The talk is intended for ML engineers and researchers and will show them how to take into account the specifics of crowdsourced annotations when building or improving their own machine learning model. We will look at three important issues: • How to properly account for noisy labelled data when training a model • How to take into account the subjective responses of annotators • How to track distribution bias using model monitoring Resources: 1. Zheng et al. “Truth Inference in Crowdsourcing: Is the Problem Solved?” Proceedings of the VLDB Endowment, vol. 10, no. 5, Jan. 2017, pp. 541–552. https://doi.org/10.14778/3055540.3055547 2. Uma et al. “Learning from Disagreement: A Survey.” Journal of Artificial Intelligence Research, vol. 72, Dec. 2021, pp. 1385–1470. https://doi.org/10.1613/jair.1.12752 3. Ustalov et al. “Learning from Crowds with Crowd-Kit.” arXiv. https://arxiv.org/abs/2109.08584 & https://github.com/Toloka/crowd-kit 4. Ustalov. “Guide to Data Labeling for Search Relevance Evaluation.” Towards Data Science. https://towardsdatascience.com/guide-to-data-labeling-for-search-relevance-evaluation-a197862e5223 5. Maystre and Grossglauser. “Just Sort It! A Simple and Effective Approach to Active Preference Learning.” Proceedings of the 34th International Conference on Machine Learning, pp. 2344–2353. https://proceedings.mlr.press/v70/maystre17a.html 6. Zhang et al. “Crowdsourced Top-k Algorithms: An Experimental Evaluation.” Proceedings of the VLDB Endowment, vol. 9, no. 8, Apr. 2016, pp. 612–623. https://doi.org/10.14778/2921558.2921559 Table of Contents: 00:00 Introduction 11:56 Objective data 24:07 Subjective Data 33:17 Model Operations 50:47 Conclusion and Q&A For more captivating community talks featuring renowned speakers, check out this playlist: https://youtube.com/playlist?list=PL8eNk_zTBST-EBv2LDSW9Wx_V4Gy5OPFT For further tutorials on the fundamentals of machine learning, check out this exclusive playlist: https://youtube.com/playlist?list=PL8eNk_zTBST-RTog7CPYvRfs1pYRWkPHG 💼 Learn to build LLM-powered apps in just 40 hours with our Large Language Models bootcamp: https://hubs.la/Q01ZZGL-0 #machinelearningmodel #machinelearning #finetuning #llm