Data Science EP 11(Practical Exam)

Rushi Chudasama
5 min readNov 18, 2021

Dataset Description using Orange tool

11.1 Orange Tool Logo
  1. Creating Your Workflow
11.2 Creating Workflow

This is your blank Workflow on Orange. Now, you’re ready to explore and solve any problem by dragging any widget from the widget menu to your workflow.

2.1 Problem

The problem we’re looking to solve in this tutorial is the practice problem of Cancer Risk Prediction that can be accessed via this link on Datahack.

2.2 Importing the data files
We begin with the first and the necessary step to understand our data and make predictions: importing our data.

11.3 Step 1 for Importing Data Files

Step 1: Click on the “Data” tab on the widget selector menu and drag the widget “File” to our blank workflow.

Step 2: Double click the “File” widget and select the file you want to load into the workflow. In this article, as we will be learning how to solve the practice problem Loan Prediction, I will import the training dataset from the same.

11.4 Step 2 for Importing Data Files

Step 3: Once you can see the structure of your dataset using the widget, go back by closing this menu.

Step 4: Now since we have the raw .csv details, we need to convert it to a format we can use in our mining. Click on the dotted line encircling the “File” widget and drag, and then click anywhere in the blank space.

Step 5: As we need a data table to better visualize our findings, we click on the “Data Table” widget.

Step 6: Now double click the widget to visualize your table.

11.5 Step 4 for Importing Data Files

Let’s now visualize some columns to find interesting patterns in our data.

3. Preprocess the data to overwrite missing values

11.6 Preprocessing of given data
11.7 Preprocessed data

On the image above you can see preprocessed data, where all the missing values have been replaced by average values.

4. Applying and evaluating preprocessed data to different models

Step-1: Select columns

11.8 Selecting Biopsy with select columns

Now, we have selected a column for further process.

Step-2: Applying dataset to models

11.9 Applying data to different models

All our preprocessed data has been applied to different models like Random Forest, Naive Bayes, Nural Network, and kNN.

Step-3: Testing dataset

11.10 Testing dataset

Here, you can see the score of different models which has been derived.

Step-4: Confusion Matrix

11.11 Confusion matrix of a given dataset

We have derived a confusion matrix of the given dataset. Also, we can see which values are correct and which are misclassified according to a given model with this.

5. Processing of given data and analyzing them

Step-1: Here we have a dataset that needs to be Encoded, Normalized, and also missing values should be handled properly.

11.12 Processing data

As you can see the data have been processed in the way the examiner wanted us to.

After all the processing the dataset will look like the image below this.

11.13 Dataset after processing data

Step-2: Applying dataset to models

11.14 Applying data to different models

All our preprocessed data has been applied to different models like Random Forest, Naive Bayes, Nural Network, and kNN.

Step-3: Testing dataset

11.15 Testing dataset

Here, you can see the score of different models which has been derived.

Step-4: Confusion Matrix

11.16 Confusion matrix of a given dataset

We have derived a confusion matrix of the processed dataset. Also, we can see which values are correct and which are misclassified according to a given model with this.

11.17 Flow Diagram

6. Power BI

Now I will show you a Graph view of our preprocessed dataset

11.18 Power BI’s graph view

Conclusion

Orange is a platform that can be used for almost any kind of analysis but most importantly, for beautiful and easy visuals. In this article, we explored how to visualize a dataset. Predictive modeling was undertaken as well, using a logistic regression predictor, SVM, and a random forest predictor to find loan statuses for each person accordingly.

Hope this tutorial has helped you figure out aspects of the problem that you might not have understood or missed out on before. It is very important to understand the data science pipeline and the steps we take to train a model, and this should surely help you build better predictive models soon!

--

--