Machine Learning with Swift
上QQ阅读APP看书,第一时间看更新

Balanced dataset

The application allows you to record samples of different motion types. As you train the model, you may notice one interesting effect: to get accurate predictions, you need not only enough samples, but you also need the proportion of different classes in your dataset to be roughly equal. Think about it: if you have 100 samples of two classes (walk and run), and 99 of them belong to one class (walk), the classifier that delivers 99% accuracy may look like this:

func predict(x: [Double]) -> MotionType { 
   return .walk 
} 

But this is not what we want, obviously.

This observation lead us to the notion of the balanced data set; for most machine learning algorithms, you want the data set in which samples of different classes are represented equally frequently.