dealing with unbalanced data for classification?

Question

Ratan Singh · Accepted Answer

For Data perspective, Oversampling and Undersampling are the techniques which could be used. If the major class has a lot of data ( say 10 million samples) then undersampling could be used. But generally that poses a risk of losing information. Therefore it is preferable to use oversampling algos like SMOTE which helps in increasing samples of minor class.

From Algorithm perspective one should refrain using Random Forest and Neural Net techniques and should stick to techniques like SVM.

If data is extremely unbalanced with class ratio of  say 1:100, choose anomaly detection techniques like one class SVM.

J.P. Morgan

J.P. Morgan interview question

Interview Answer

Want the inside scoop on your own company?

Bowls

Followed companies

Job searches