.. _reproducing_our_paper: ===================== Reproducing Our Paper ===================== In Hosseinzadeh et al. (`2020 `_), we trained the classifier on a sample of 557 spectroscopically classified supernovae from the `Pan-STARRS1 `_ Medium Deep Survey, and then applied it to an additional 2315 transients with host galaxy redshifts. To reproduce our work, first download the light curve data from `Zenodo `_. Unzip the file ``ps1_sne_zenodo.zip``. For each light curve file, run ``superphot-fit``. You can either run these in series, by using a glob pattern (``*``) as below, or in parallel, by giving only one filename at a time. Decide where you want to store your models. Here I am using a directory called ``stored_models/``. These will take a while to run. .. code-block:: bash superphot-fit ps1_sne_zenodo/*.dat --output-dir stored_models/ When all the fits are finished, compile the resulting parameters into a single file. (I'm calling this ``params.npz``.) To reproduce our results exactly, use 0 as the seed for the random number generator. .. code-block:: bash superphot-compile stored_models/ --output params --random-state 0 Next, download the machine-readable input tables from our paper. The test set is in `Table 3 `_. The training/validation set is in `Table 4 `_. These tables include the results. You can either delete these columns or just ignore them. Extract features for each of these sets. In our paper, we used the median parameters for training, and 10 draws from the posterior (the default) for validation and testing. Make sure to use the same PCA in the test and validation sets as in the training set. (This is saved automatically.) These commands will produce six output files (in addition to ``pca.pickle``): - ``train_data.txt`` - ``train_data.npz`` - ``validation_data.txt`` - ``validation_data.npz`` - ``test_data.txt`` - ``test_data.npz`` .. code-block:: bash superphot-extract t4.txt params.npz --use-median --output train_data superphot-extract t4.txt params.npz --pcas pca.pickle --output validation_data superphot-extract t3.txt params.npz --pcas pca.pickle --output test_data Now that the data are ready, you can initialize and train the classifier. Here I'm saving it to a Python pickle file ``pipeline.pickle``. Again, if you want to reproduce our results exactly, use a random seed of 0. .. code-block:: bash superphot-train train_data.txt --output pipeline.pickle --random-state 0 Finally, you can use the pipeline to classify the test set. Here the results will be saved to ``test_data_results.txt``. .. code-block:: bash superphot-classify pipeline.pickle test_data.txt --output test_data Note that in our paper, we also had a third set of "rare transients" in `Table 5 `_. To reproduce our classifications for these, repeat the above procedure using ``t5.txt`` instead of ``t3.txt``. .. code-block:: bash superphot-extract t5.txt params.npz --pcas pca.pickle --output other_data superphot-classify pipeline.pickle other_data.txt --output other_data To reproduce our confusion matrices, you can run the leave-one-out cross-validation. .. code-block:: bash superphot-validate pipeline.pickle validation_data.txt --train-data train_data.txt Finally, if you want to reproduce the hyperparameter optimization in Appendix D, copy and paste the following into a JSON file (here I'm calling it ``param_dist.json``): .. code-block:: json {"classifier__n_estimators": [500, 200, 100, 50, 20, 10], "classifier__criterion": ["gini", "entropy"], "classifier__max_depth": [30, 25, 20, 15, 10, 5], "classifier__max_features": [25, 20, 15, 10, 5]} Then run the optimizer. .. code-block:: bash superphot-optimize param_dist.json pipeline.pickle validation_data.txt --train-data train_data.txt