Huang, XiangjiHu, Haohao2019-12-042019-12-042019-082019-12-04http://hdl.handle.net/10315/36836In modern e-commerce systems, large volumes of new items are being added to the product list everyday, which calls for automatic product categorization. In this thesis we propose a weighted K-Nearest Neighbour (KNN) based classification system for solving large-scale e-commerce product taxonomy classification problem. We use information retrieval (IR) model as similarity function in our weighted KNN algorithm. Among all IR models used in this study, we achieved highest classification performance through using information-based (IB) model as similarity function in the KNN algorithm. Moreover, our proposed method can improve the overall performance when combining prediction results with those from advanced neural network based method, namely Long Short-Term Memory with Balanced Pooling Views (LSTM-BPV). The hybrid system could achieve results comparable to the state of the art (SotA). We also get good results by fine-tuning pre-trained Bidirectional Encoder Representations from Transformers (BERT) model.Author owns copyright, except where explicitly noted. Please contact the author directly with licensing requests.Artificial intelligenceA Hybrid Approach for Large-Scale Product Categorization Based on Weighted KNN and LSTM-BPVElectronic Thesis or Dissertation2019-12-04E-commerce Product Taxonomy ClassificationInformation RetrievalK-Nearest NeighbourEnsembleText ClassificationData Mining