In this article, we will discuss the workflow to build a model and working on historical data (Data Preprocessing). Before building a model, we need to transform data unstructured format like incomplete data, inconsistent and lacking trends to a structured format. Most of the times we gather data from different sources it consists of different formats which is not feasible for analysis and prediction.
Data goes through a series of steps during preprocessing:
Data Reduction etc.
Data can be cleaned by filling missing values i.e. (There are some imputing techniques to fill the missing values), smoothing noisy data and removing unused columns like id columns.
Data from different sources are put together in one place.
In this step, the data is normalized, aggregated and generalized.
This step aims to present a reduced representation of the data in a data warehouse like applying slicing and dicing operations.
After doing different data preprocessing techniques our data is ready to build the model
After getting the final historical data split data into 2 parts as Training (70%) and Testing data (30%) sets. First, we will train our model with training data according to requirement and test the data with testing data set.
Now depending on the accuracy and performance of our model we check whether our model is over-fit or under-fit.
To resolve this problem, need to take certain measures while building the model if the problem is with accuracy (less accuracy) we need to take measures according to the algorithm we are using.
After ensuring that model performance is good and having good accuracy, we will pass new data for prediction and built the reports accordingly.