Frequently Asked Questions
If you are having difficultly uploading a spreadsheet to WhoseEgg, first go through the following check list to ensure that all steps have been taken correctly:
If all of the above are met, try one of these suggestions below:
The random forests used by WhoseEgg to make predictions have been validated for eggs collected in the figure below. In particular, the data collected in 2014 and 2015 were used to train the random forests, and the data collected in 2016 were used to validate the models. The models trained using data from 2014 and 2015 showed great performance on the data from 2016 in locations that were sampled in 2014 and/or 2015 but were less successful in locations not previously sampled. However, the sample size in 2016 was smaller than the original dataset. See Goode et al. (2021) for more details. These results suggest that the models in WhoseEgg may not perform well on data collected in different geographic regions and additional validations are needed. Note that the final models used in WhoseEgg were trained using all three years of data (2014-2016) to improve the performance for future predictions.
If there is interest in using WhoseEgg to make predictions on data collected in different geographic regions, we recommend the following as possible options:
Be cautious interpreting the predictions from WhoseEgg if the models are applied to data collected in different geographic regions, especially if the regions have a different fish species composition.
Compare the predictor variable values to those used to train the random forests in WhoseEgg. If your predictor variable values differ from the WhoseEgg data (especially values outside the range of the variable values), the models will have to extrapolate to make predictions. This often leads to untrustworthy predictions.
Perform your own validation of the random forests by applying WhoseEgg to eggs that have been genetically identified. Compare the predictions from WhoseEgg to the genetic identifications to determine if the WhoseEgg predictions are reasonably trustworthy for the new region. See Goode et al. (2021) for an example model validation.
If familiar with R, try updating the WhoseEgg models by training your own random forests based on the code and data available at the WhoseEgg GitHub repository. Add your data to the WhoseEgg training data and train new random forest models.
If you try validating or updating the WhoseEgg models, the creators of WhoseEgg would be interested to hear about your results. Let us know by emailing whoseegg@iastate.edu.
Random forests are only able to make predictions for response variable levels included in the training data. See the table below for a list of the family, genus, and species levels included the WhoseEgg training data. If you believe that your data contains a level not present in the training data, we caution the use of WhoseEgg. If you would still like to apply WhoseEgg to your data, we recommend the following as possible options:
The random forests will classify observations based on predictor variable similarity to those in the training data. Think about whether the different species that may be present in your data are similar to any species in the training data. Check to see if these species appear in the predictions made by WhoseEgg.
Determine if the species have similar egg characteristics to invasive carp. If they are different from invasive carp, then it may be okay to proceed using WhoseEgg if your main objective is to identify invasive carp.
Family | Genus | Common Name | Number of Eggs in Training Data |
---|---|---|---|
Catostomidae | Carpiodes | Carpsuckers sp. | 1 |
Catostomidae | Carpiodes | Quillback | 1 |
Catostomidae | Carpiodes | River Carpsucker | 8 |
Catostomidae | Ictiobus | Bigmouth Buffalo | 7 |
Catostomidae | Ictiobus | Black Buffalo | 1 |
Catostomidae | Ictiobus | Buffalo sp. | 10 |
Catostomidae | Ictiobus | Smallmouth Buffalo | 2 |
Clupeidae | Alosa | Skipjack Shad | 1 |
Clupeidae | Dorosoma | Gizzard Shad | 2 |
Cyprinidae | Cyprinella | Spotfin Shiner | 6 |
Cyprinidae | Luxilus | Common Shiner | 1 |
Cyprinidae | Macrhybopsis | Silver Chub | 36 |
Cyprinidae | Macrhybopsis | Speckled Chub | 28 |
Cyprinidae | Notropis | Channel Shiner | 32 |
Cyprinidae | Notropis | Emerald Shiner | 201 |
Cyprinidae | Notropis | River Shiner | 16 |
Cyprinidae | Notropis | Sand Shiner | 1 |
Cyprinidae | Notropis | Shiner sp. | 69 |
Cyprinidae | Pimephales | Fathead Minnow | 5 |
Hiodontidae | Hiodon | Goldeye | 7 |
Invasive Carp | Invasive Carp | Invasive Carp | 782 |
Moronidae | Morone | Striped Bass | 17 |
Moronidae | Morone | White Bass | 1 |
Percidae | Etheostoma | Banded Darter | 1 |
Percidae | Percina | Common Logperch | 1 |
Percidae | Sander | Walleye | 2 |
Sciaenidae | Aplodinotus | Freshwater Drum | 733 |
The creators of WhoseEgg are interested in updating the models to contain data from different geographic regions and with more species, but there are not plans to do so at this time.
The validation of the random forests used by WhoseEgg focused on the classification of invasive carp. If you would like to use WhoseEgg to identify other fish species, please take into account the following considerations:
See the table listed under the question ‘What can I do if I believe I collected fish eggs containing species not included in the training data?’ to determine how many eggs were included in the training data of the fish species of interest. If the number is large, it may be okay to trust the predictions from WhoseEgg. If the number is small, we urge users to be cautious about interpreting the results.
Individuals are welcome to use the random forests and training data available at the WhoseEgg GitHub repository to conduct their own validation of the models focused on other species. See Goode et al. (2021) for an example model validation.
If you try validating the WhoseEgg models for other species, the creators of WhoseEgg would be interested to hear about your results. Let us know by emailing whoseegg@iastate.edu.
While it is okay to upload extra variables to WhoseEgg, these variables will not be used by the random forests to make predictions. As a result, they are excluded from the processed data tab, which only contains the variables that will be used to make predictions. However, these variables will be included in the spreadsheet with predictions available for download. See the preview of the table with data for download on the ‘Downloads’ page.
Try zooming in using control (Windows) or command (Mac) and the + key (or a similar technique available via your computer).