22 lines (22 loc) · 2.25 KB

MNIST-Image-Recognition-Based-on-Xgboost-Algorithm-and-Features-Extraction

by Yingxin LIN

Introduction

Different from the common practice of MNIST image recognition using CNN algorithm, I apply NumPy and OpenCV to extract relevant features from each MNIST figure, and then train a Xgboost recognition model. After gradually adjusting parameters, the accuracy of the optimal model on the test set can reach 88%.
In addition, since I've made extensive use of the broadcasting mechanism of NumPy instead of loops when coding, the code can run at an excellent speed.
I also define the handwritten numeral edge scanning function totally based on NumPy, which can scan the number of on pixels within image edge with excellent speed and precision in a short time. Some scanning results are shown below:

Fig.1 Scanning from right to left (The first 49 pictures in MNIST)

Fig.2 Scanning from top to bottom (The first 49 pictures in MNIST)

Files loaded

Train set: train-labels.gz (label) + train-images-idx3-ubyte.gz (featrues)
Test set: test-labels.gz (label) + t10k-images-idx3-ubyte.gz (featrues)

Tips

It's necessary to unzip files suffixed with '.gz' before running the code.
You can learn more details from the PDF file Data ming report & Userguide (in Simplified Chinese).pdf.

Copyright notice

AUTHOR: Yingxin LIN
Company: School of Finance, Central University of Finance and Economics (CUFE)
Contact: lyxurthebest@163.com or lyxurthebest@outlook.com
The copyright belongs to Yingxin LIN , 2021/08/11.

Enjoy（。＾▽＾) ! (...and extend/modify) 😊