Data Science EP 11(Practical Exam)

5 min readNov 18, 2021

Dataset Description using Orange tool

Creating Your Workflow

This is your blank Workflow on Orange. Now, you’re ready to explore and solve any problem by dragging any widget from the widget menu to your workflow.

2.1 Problem

The problem we’re looking to solve in this tutorial is the practice problem of Cancer Risk Prediction that can be accessed via this link on Datahack.

2.2 Importing the data files
We begin with the first and the necessary step to understand our data and make predictions: importing our data.

Step 1: Click on the “Data” tab on the widget selector menu and drag the widget “File” to our blank workflow.

Step 2: Double click the “File” widget and select the file you want to load into the workflow. In this article, as we will be learning how to solve the practice problem Loan Prediction, I will import the training dataset from the same.

Step 3: Once you can see the structure of your dataset using the widget, go back by closing this menu.

Step 4: Now since we have the raw .csv details, we need to convert it to a format we can use in our mining. Click on the dotted line encircling the “File” widget and drag, and then click anywhere in the blank space.

Step 5: As we need a data table to better visualize our findings, we click on the “Data Table” widget.

Step 6: Now double click the widget to visualize your table.

Let’s now visualize some columns to find interesting patterns in our data.

3. Preprocess the data to overwrite missing values

On the image above you can see preprocessed data, where all the missing values have been replaced by average values.

4. Applying and evaluating preprocessed data to different models

Step-1: Select columns

11.8 Selecting Biopsy with select columns

Now, we have selected a column for further process.

Step-2: Applying dataset to models

All our preprocessed data has been applied to different models like Random Forest, Naive Bayes, Nural Network, and kNN.

Step-3: Testing dataset

Here, you can see the score of different models which has been derived.

Step-4: Confusion Matrix

11.11 Confusion matrix of a given dataset

We have derived a confusion matrix of the given dataset. Also, we can see which values are correct and which are misclassified according to a given model with this.

5. Processing of given data and analyzing them

Step-1: Here we have a dataset that needs to be Encoded, Normalized, and also missing values should be handled properly.

As you can see the data have been processed in the way the examiner wanted us to.

After all the processing the dataset will look like the image below this.

Step-2: Applying dataset to models

All our preprocessed data has been applied to different models like Random Forest, Naive Bayes, Nural Network, and kNN.

Step-3: Testing dataset

Here, you can see the score of different models which has been derived.

Step-4: Confusion Matrix

11.16 Confusion matrix of a given dataset

We have derived a confusion matrix of the processed dataset. Also, we can see which values are correct and which are misclassified according to a given model with this.

6. Power BI

Now I will show you a Graph view of our preprocessed dataset

Conclusion

Orange is a platform that can be used for almost any kind of analysis but most importantly, for beautiful and easy visuals. In this article, we explored how to visualize a dataset. Predictive modeling was undertaken as well, using a logistic regression predictor, SVM, and a random forest predictor to find loan statuses for each person accordingly.

Hope this tutorial has helped you figure out aspects of the problem that you might not have understood or missed out on before. It is very important to understand the data science pipeline and the steps we take to train a model, and this should surely help you build better predictive models soon!

LinkedIn:

Rushi Chudasama - Chandubhai S. Patel Institute of Technology - Bharuch, Gujarat, India | LinkedIn

I am a student perusing my B.Tech 4th year in Information technology at Charotar Institute of Science and Technology. I…

www.linkedin.com

More Projects and Blogs:

Rushi-45 - Overview

You can't perform that action at this time. You signed in with another tab or window. You signed out in another tab or…

github.com

Blogs:

Rushi Chudasama - Medium

Read writing from Rushi Chudasama on Medium. Studying B.Tech in Information Technology @ Charusat University. Every…

medium.com

Final Note:

Thanks for reading! If you enjoyed this article, please hit the clap 👏button as many times as you can. It would mean a lot and encourage me to keep sharing my knowledge. If you like my content follow me on medium I will try to post as many blogs as I can.