Extended Version of Table 2

Due to space, we were not able to include complete information on the datasets and models learned within the paper. The table below extends Table 2 from the paper with additional information.

Explanation of column headings:

  1. Name -- name of dataset
  2. #Vars -- dimension of the dataset
  3. Examples -- total number of training and test examples for dataset
  4. Tuning -- number of tuning examples (all training examples except for the hold-out set
  5. Holdout -- number of examples in the hold-out set
  6. Test -- number of examples in the test set
  7. NBE -- log likelihood of NBE model on the test data
  8. WinMine -- log likelihood of WinMine model on the test data
  9. Marginal -- log likelihood of a one-cluster NBE model on the test data
  10. Clusters -- number of clusters chosen by NBE
  11. Splits -- number of splits in all decision trees in the model learned by WinMine

For cross-validated datasets, numbers listed are the means across all 10 training/test splits. When computing the mean, log likelihoods were properly weighted to give every test example equal weight, even though some cross-validation splits have more test examples than others.

Name#Vars ExamplesTuning ValidationTest NBEWinMine MarginalClusters Splits
1985 Auto Imports26205125.159.420.5-25.68 ± 0.53-24.55 ± 0.42-36.16 ± 0.2621169.6
Abalone94,1772,773985419-7.25 ± 0.13-7.27 ± 0.14-13.909 ± 0.0185190
Adult155,4233,895993535-13.00 ± 0.16-12.72 ± 0.14-15.96 ± 0.1450147
Annealing39798483.6234.679.8-10.784 ± 0.094-10.238 ± 0.090-15.08 ± 0.1322.3105.4
Anonymous MSWeb29437,71129,4413,2705,000-9.91 ± 0.10-9.69 ± 0.10-11.36 ± 0.12501,416
Audiology70226138.664.822.6-16.15 ± 0.38-15.65 ± 0.40-18.98 ± 0.4210.685.4
Auto MPG8398242.1116.139.8-9.17 ± 0.14-9.10 ± 0.13-12.720 ± 0.04227.872.6
Breast Cancer Wisconsin31569343.5168.656.9-36.57 ± 0.29-31.33 ± 0.24-48.9466 ± 0.008030.6171.6
BUPA7345209.6100.934.5-9.862 ± 0.075-9.874 ± 0.054-10.157 ± 0.031254.3
Car71,7281,037.2518172.8-7.8242 ± 0.0086-7.705 ± 0.010-8.301 ± 0.02050.335.4
Census1445,22236,6914,0754,456-11.028 ± 0.056-10.788 ± 0.050-15.161 ± 0.061306316
Chess Endgames373,1961,932937327-10.79 ± 0.11-9.724 ± 0.093-15.11 ± 0.1856723
Connect-44367,55754,7386,1116,708-15.079 ± 0.015-13.902 ± 0.017-20.629 ± 0.0393287,954
Contraceptive Method Choice101,473885.3440.4147.3-9.237 ± 0.051-9.305 ± 0.053-10.126 ± 0.0473748.9
Credit Screening16690416.4204.669-15.26 ± 0.11-14.785 ± 0.097-17.34 ± 0.1126.979.9
Forest Cover Type5528,86223,2962,6202,946-16.030 ± 0.050-14.455 ± 0.044-22.914 ± 0.0372883,629
Glass Identification10214130.961.721.4-11.04 ± 0.20-11.57 ± 0.17-13.201 ± 0.05416.835
Hepatitis2015593.246.315.5-17.68 ± 0.30-17.81 ± 0.32-18.78 ± 0.371117.2
House Votes17435264.4127.143.5-9.90 ± 0.20-10.52 ± 0.23-14.04 ± 0.1926.128.2
Housing14506306.1149.350.6-13.22 ± 0.17-13.08 ± 0.16-19.497 ± 0.06436.2169.4
Image Segmentation172,3101,396651263-11.74 ± 0.22-11.29 ± 0.21-24.4852 ± 0.0043150301
Ionosphere35351213.7102.235.1-37.49 ± 0.54-37.64 ± 0.57-52.292 ± 0.05130.7472.4
Iris Types515090.844.215-5.06 ± 0.12-5.28 ± 0.12-7.375 ± 0.04611.412.1
Isolated Letter Speech6187,7975,2509881,559-798.7 ± 1.9-542.1 ± 1.4-912.89 ± 0.28911,502
King Rook vs. King728,05622,6402,5482,868-11.217 ± 0.018-11.517 ± 0.016-13.141 ± 0.019454443
Labor Negotiations1757251517-21.04 ± 0.87-19.93 ± 0.59-19.79 ± 0.62111
Landsat376,4353,4499862,000-26.70 ± 0.26-24.42 ± 0.22-59.540 ± 0.030751,403
Letter Recognition1720,00016,2221,7901,988-15.734 ± 0.098-16.48 ± 0.10-26.927 ± 0.0396663,950
Monks Problem #175568737432-6.718 ± 0.027-6.573 ± 0.019-6.7803 ± 0.0094311
Musk167476288.9139.547.6-183.4 ± 1.8-125.4 ± 1.6-261.685 ± 0.08947.82,592.7
New Thyroid6215131.56221.5-7.61 ± 0.13-7.997 ± 0.078-8.832 ± 0.053234.3
Nursery911,0258,9729811,072-9.5277 ± 0.0080-9.4354 ± 0.0090-10.597 ± 0.015177103
Page Blocks115,4733,908991574-9.24 ± 0.11-9.33 ± 0.11-16.465 ± 0.041174430
Pima Indians Diabetes9768466.6224.676.8-11.982 ± 0.055-11.843 ± 0.049-12.586 ± 0.0302914
Poisonous Mushrooms238,1246,351986787-9.147 ± 0.022-9.222 ± 0.024-22.66 ± 0.15233221
Promoter5810664.331.110.6-78.81 ± 0.54-79.18 ± 0.34-79.37 ± 0.3418.81.4
Servo5167101.448.916.7-6.827 ± 0.083-6.648 ± 0.073-7.695 ± 0.05816.44.8
Shuttle1056,79339,1574,34313,293-6.958 ± 0.012-6.956 ± 0.012-11.983 ± 0.012358317
Solar Flare131,066641.7317.7106.6-5.225 ± 0.058-5.319 ± 0.058-7.014 ± 0.07015.623
Soybean Large3668321691376-18.12 ± 0.40-17.25 ± 0.35-37.3 ± 1.014145
Spambase584,6013,1661,000435-13.38 ± 0.21-13.53 ± 0.21-16.85 ± 0.1936327
Splice Junction613,1901,4316621,097-79.98 ± 0.38-80.01 ± 0.13-83.281 ± 0.069717127
Thyroid Disease (combined)333,7721,883917972-12.97 ± 0.10-12.365 ± 0.092-16.50 ± 0.1247237
Tic-Tac-Toe10958578.4283.895.8-9.021 ± 0.020-9.644 ± 0.034-10.262 ± 0.02135.977.3
Waveform225,0003,495989516-29.17 ± 0.15-29.49 ± 0.14-34.8955 ± 0.003643150
Yeast91,484891.9443.7148.4-10.182 ± 0.036-10.213 ± 0.035-10.870 ± 0.0243130.7
Zoo1710161.729.210.1-6.53 ± 0.31-7.23 ± 0.33-11.77 ± 0.258.820.5
EachMovie (subset)1,6486,1174,5241,002591-121.6 ± 5.5-120.9 ± 5.6-173.5 ± 7.8315,228
Jester10017,99814,6811,6211,696-95.44 ± 0.73-96.29 ± 0.71-130.2 ± 1.11422,552
KDD Cup 2000 (subset)6513,5529,0249763,552-2.103 ± 0.074-2.234 ± 0.087-2.41 ± 0.112989

Comments to Daniel Lowd (lowd at cs dot washington dot edu)