This page provides a general description of what split is.
While employing machine learning algorithm, there is a need to split the data such that the model is trained and on the basis of that model, we validate certain data that tells us about the general loss of the model.
The input data is split into three groups:
Trainingset : The training set is the one that is used to train the model, i.e, search for the parameters of the model.
Validation set : They are the set of data used to tune the hyperparameters of the model. For example, the validation set can be used to select the number of layers in a neural network.
Test set : Set of data used to assess the performance of the model.
Common ratios used for the split:
80% training set, 10% validation set, 10% test set
70% training set, 15% validation set, 15% test set