By Mukul Kumar

This model is developed using the Machine Learning classification algorithm Logistic Regression to classify the human gender based upon speech signal pre-processed data.

This project is developed in python language using a well-known classification Machine Learning algorithm called Logistic Regression. The model trained and tested on the Jupyter Notebook environment.

**ABOUT THE DATASET**

The dataset contains 20 data features and 1 target feature (data label)

**data.info() **command output represents all the data attributes along with Non-Null count and data type

<class 'pandas.core.frame.DataFrame'> RangeIndex: 3168 entries, 0 to 3167 Data columns (total 21 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 meanfreq 3168 non-null float64 1 sd 3168 non-null float64 2 median 3168 non-null float64 3 Q25 3168 non-null float64 4 Q75 3168 non-null float64 5 IQR 3168 non-null float64 6 skew 3168 non-null float64 7 kurt 3168 non-null float64 8 sp.ent 3168 non-null float64 9 sfm 3168 non-null float64 10 mode 3168 non-null float64 11 centroid 3168 non-null float64 12 meanfun 3168 non-null float64 13 minfun 3168 non-null float64 14 maxfun 3168 non-null float64 15 meandom 3168 non-null float64 16 mindom 3168 non-null float64 17 maxdom 3168 non-null float64 18 dfrange 3168 non-null float64 19 modindx 3168 non-null float64 20 label 3168 non-null object dtypes: float64(20), object(1) memory usage: 519.9+ KB

**INTRODUCTION**

Initially, I have implemented all the necessary libraries required for reading data, pre-processing data, plotting the data matrices, and even splitting data into training and testing sets, etc. Following are the steps performed during Model training -

Step 1: Reading the data which is a .csv file (voice.csv) using the pandas library

Step 2: Checking data if there is any need for data pre-processing

Step 3: Performing data normalization task to reduce the processing cost

Step 4: Splitting the data in training and testing sets

Step 5: Training the model using the well-known classification algorithm Logistic Regression to get best model train

**RESULTS**

The final test accuracy of the model is 97.791% when writing the whole logistic regression code from scratch by making use of the sigmoid function as our logistic function. But, the test accuracy became 98.26% when we are importing logistic regression from sklearn.liner_model the accuracy got increased by all most 1%.

Submitted by Mukul Kumar (mukul102000)

Download packets of source code on Coders Packet