scikit-learn 0.17 is out!

Scikit-learn 0.17 adds features and improvements that might help me:

  • stochastic average gradient solver for logistic regression is faster on big data sets
  • speed and memory enhancements in several classes
  • ensemble classifier that supports hard and soft voting as well as hyperparameter tuning of the components in grid search
  • robust feature scaler does standard scaling but excludes outliers from the standard range

The full changelog is here. I’ve been testing the changes to see how they’ll impact my work in predicting match winners in League of Legends.

Stochastic average gradient

Like lbfgs and newton-cg, sag supports warm_start so it works well in conjunction with LogisiticRegressionCV to tune the regularization parameter.

First I tried on the 200k match dataset with 61 features. I repeated the tests for better accuracy.

Solver Training time Accuracy
lbfgs 0.5 min 66.59%
lbfgs 0.5 min 66.59%
sag 1.7 min 66.60%
sag 1.8 min 66.60%
newton-cg 2.4 min 66.60%
newton-cg 2.6 min 66.60%

sag is faster than newton-cg but still about 3x slower than lbfgs. It does eke out that last 0.01% accuracy though.

sag is designed for large data sets so I also tried on the 1.8 mil x 61 dataset:

Solver Training time Accuracy
lbfgs 7.2 min 66.07%
sag 45.8 min 66.07%

It’s over 6x slower and achieves the same accuracy. Maybe sag’s benefit really shines on datasets with a large number of features: sklearn team testing used 500 features and 47k features.

Conclusion: Staying with lbfgs.

Other performance

The patch notes briefly mentioned speed and memory improvements in random forests and gradient boosting.


Tried this on the 200k x 61 dataset:

Version Training time Accuracy
Random Forest 0.16.1 14.1 min 66.34%
Random Forest 0.17 13.7 min 66.39%

The training time and accuracy fluctuations could just be differences due to randomization; random forests tend to fluctuate more than other methods from test to test. In the worst case, it doesn’t seem that much has changed. In the best case there are slight improvements.

Conclusion: Random forest is about the same, but I didn’t test memory usage.


Gradient boosting trains much more slowly than other methods so I started on the 50k x 61 dataset. I ran some tests multiple times to be certain of the results.

Version Training time Accuracy
Gradient Boosting 0.16.1 7.6 min 66.08%
Gradient Boosting 0.16.1 with feature scaling 8.8 min 66.10%
Gradient Boosting 0.17 11.0 min 66.17%
Gradient Boosting 0.17 11.6 min 66.34%
Gradient Boosting 0.17 with feature scaling 11.7 min 66.14%
Gradient Boosting 0.17 with feature scaling 11.7 min 66.17%
Gradient Boosting 0.17 presort=False 14.0 min 65.94%
Gradient Boosting 0.17 max_features=auto 11.2 min 66.19%

Gradient boosting is clearly slower in 0.17 and generally a tad more accurate. The default presort setting is good for runtime and accuracy. Feature scaling doesn’t really help. Adjusting the max_features setting seems to help a touch (should reduce variance and improve training time).

I also tested on the 200k x 61 data:

Version Training time Accuracy
Gradient Boosting 0.16.1 43.9 min 67.66%
Gradient Boosting 0.17 62.3 min 67.75%

Again it’s slower but more accurate. I’ve opened a ticket and right now it’s under investigation. It sounds like a change in the error computation may be the culprit.

Conclusion: Gradient boosting 45% slower but a little more accurate, fix is being investigated.


In the previous post I described possible directions to get from 67.9% accuracy up to 70.0% and suggested that an ensemble of the best classifiers may be a fruitful direction but may take a bit of time to code.

Well, two things changed. First off, I found a great guide on making an ensemble in scikit-learn. I implemented a simple ensemble and improved my best results from 67.9% accuracy to 68.0% accuracy by a soft-voting ensemble of gradient boosting and neural networks. It’s not as much as I expected but it’s progress.

The second change is that scikit-learn 0.17 added VotingClassifier, implemented by Sebastian Raschka (who wrote the guide and implementation I found earlier). I ported my ensemble code to scikit-learn and it works great (though I had to change my neural network wrapper to return two columns rather than one for binary classification).

That said, I wish it had a flag to perform calibration of the probabilities of the individual classifiers. I’m currently looking into calibrating but not finding that it helps; gradient boosting has more skewed probabilities than neural networks which leads to more weight on gradient boosting. That’s an unintentionally good decision: putting more weight on the stronger classifier.

Conclusion: VotingClassifier is easy and works like a charm.


In general using the robust scaler seems like an easy solution to save time in preprocessing your data.

I tried it with logistic regression because it’s so sensitive to feature scaling. But after several tests I didn’t find any difference in either scaling+training time or accuracy.


I bolded the main point of each section so I won’t summarize. But I like the direction the scikit-learn is taking.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s