Usage

Superphot runs in five steps:

  1. For each light curve, fit a model to the observations. This is the slowest step, but it can be done in parallel for a sample of light curves.

  2. Compile the parameters from all the model fits into a single table. This is a separate step in case you did step 1 on a different computer (as we did).

  3. Extract features from those models to be used for classification.

  4. Initialize the classifier and train it on the training set.

  5. Use the trained classifier to classify the test set.

  6. (Optional) Cross-validate the classifier on the training set.

  7. (Optional) Optimize the hyperparameters of the classifier.

Running from the Command Line

For basic functionality, Superphot can be run from the command line. For example:

superphot-fit light_curves/*.dat --output-dir stored_models/  # this is parallelizable
superphot-compile stored_models/ --output params
superphot-extract train_input.txt params.npz --output train_data
superphot-extract test_input.txt params.npz --output test_data --pcas pca.pickle  # use the same PCA
superphot-train train_data.txt --output pipeline.pickle
superphot-classify pipeline.pickle test_data.txt
superphot-validate pipeline.pickle train_data.txt
superphot-optimize param_dist.json pipeline.pickle train_data.txt

To see additional command-line arguments for any of these scripts, give the -h argument (e.g., superphot-fit -h).

Scripting with Python

For more advanced use cases, you can import the module and use some version of the following workflow:

from superphot import fit, extract, classify
import numpy as np

# Fit the model to the data. Do this for each file.
light_curve = fit.read_light_curve('light_curves/PSc000001.dat')  # may need custom parser
fit.two_iteration_mcmc(light_curve, 'stored_models/PSc000001{}')

# Compile parameters
param_table = extract.compile_parameters('stored_models/', 'griz')
np.savez_compressed('params.npz', **param_table, **param_table.meta)

# Extract training features
train_params = extract.load_data('train_input.txt', 'params.npz')
train_data = extract.extract_features(train_params)

# Extract test features
test_params = extract.load_data('test_input.txt', 'params.npz')
test_data = extract.extract_features(test_params, stored_pcas='pca.pickle')

# Initialize and train the pipeline (can adjust hyperparameters here)
pipeline = classify.Pipeline([
    ('scaler', classify.StandardScaler()),
    ('sampler', classify.MultivariateGaussian(sampling_strategy=1000)),
    ('classifier', classify.RandomForestClassifier(criterion='entropy', max_features=5)),
])
classify.train_classifier(pipeline, train_data)

# Do the classification
results = classify.classify(pipeline, test_data)

# Validate the classifier (optional)
results_validate = classify.validate_classifier(pipeline, train_data)
classify.make_confusion_matrix(results_validate, pipeline.classes_)

Most of these functions have optional inputs that are not shown here. See the API Documentation.

Light Curve Data Formats

The light curve data we used in developing this package were stored in SNANA text format. Here is an example:

SURVEY: PS1MD
SNID:  PSc000001
IAUC:    UNKNOWN
RA:        52.4530625  deg
DECL:       -29.0749750  deg
MWEBV: 0.0075 +- 0.0003 MW E(B-V)
REDSHIFT_FINAL:  0.1260 +- 0.0010  (CMB)
SEARCH_PEAKMJD: 55207.0
FILTERS:    griz

# ======================================
# TERSE LIGHT CURVE OUTPUT
#
NOBS: 306
NVAR:   7
VARLIST:  MJD  FLT FIELD   FLUXCAL   FLUXCALERR    MAG     MAGERR

OBS: 55086.6 g NULL  -243.440 231.478 nan -1.032
OBS: 55089.6 g NULL  -62.931 13.480 nan -0.233
OBS: 55095.6 g NULL  -15.102 16.238 nan -1.167
OBS: 55098.6 g NULL  -94.646 13.910 nan -0.160
OBS: 55104.6 g NULL  -28.093 12.441 nan -0.481
OBS: 55191.3 g NULL  -27.414 10.304 nan -0.408
OBS: 55203.3 g NULL  1381.526 18.142 -12.851 0.014
OBS: 55446.6 g NULL  -3.432 9.291 nan -2.939
OBS: 55449.6 g NULL  9.291 10.095 -7.420 1.180
OBS: 55452.6 g NULL  -2.915 10.422 nan -3.881
...

Superphot includes a function that can parse data in this format (superphot.fit.read_light_curve()). It should also be able to recognize a simple text format like this:

PHASE FLT FLUXCAL FLUXCALERR
-120.4 g -243.44 231.478
-117.4 g -62.931 13.48
-111.4 g -15.102 16.238
-108.4 g -94.646 13.91
-102.4 g -28.093 12.441
-15.7 g -27.414 10.304
-3.7 g 1381.526 18.142
239.6 g -3.432 9.291
242.6 g 9.291 10.095
245.6 g -2.915 10.422
...

If your data are in an unrecognizable format, you will have to write your own parser. The data need to end up as an Astropy table with (at least) the following columns and metadata:

  • PHASE is the date of the observation in days relative to discovery (SEARCH_PEAKMJD in our case)

  • FLT is the filter

  • FLUXCAL and FLUXCALERR are the flux and its uncertainty

Input/Output Table Formats

Superphot writes all its outputs in Astropy’s ascii.fixed_width_two_line format, but it can read any plain text format guessable by Astropy.

The files called train_input.txt and test_input.txt should have the following columns:

  • filename: the name of the light curve data file, without the extension;

  • redshift: the redshift of the transient, used to calculate the luminosity distance and cosmological \(K\)-correction;

  • MWEBV: the Milky Way selective extinction \(E(B-V)\), used to correct the fluxes; and

  • type (optional): the supernova spectroscopic classification, used to train the classifier.

The filename column is used as the supernova identifier, so each filename must be unique (even if they are in different directories).

Superphot’s feature extraction step saves the features in two separate files with the same base name (e.g., test_data above) but different extensions. The test_data.txt file includes all the supernova metadata, which will be identical to test_input.txt unless stored model parameters are missing for any input supernovae. The test_data.npz file includes the features themselves, stored as a compressed multidimensional binary array.

The classification and validation results are also written to text files by superphot.classify.write_results(). The tables include the same metadata as the feature extraction step plus the photometric classification, the classification confidence, and probabilities for each possible classification.

Other Command Line Utilities

In addition to the main commands listed above, Superphot includes four utilities to help produce tables and figures for publications:

  • superphot-confuse validation_results.txt plots a confusion matrix from saved cross-validation results.

  • superphot-bar validation_results.txt test_results.txt plots stacked bar plots showing the class fractions of the training and test sets.

  • superphot-latex test_results.txt converts the plain text results table into a nicely formatted AASTeX deluxetable.

  • superphot-hyperparameters hyperparameters.txt plots 3D scatter plots of various performance metrics vs. the classifier hyperparameters.