Q:

Randomly split the messages into a training set D1 (80% of messages) and a testing set D2 (20% of messages). Calculate the testing accuracy, confusion matrix, precision, recall, and F-score of the Na¨ıve Bayes classifier in determining whether a message is spam or ham. Submit your source code. Note: Let’s assume that spam is the positive class

Accepted Solution

A:
Answer:In the step-by-step Step-by-step explanation:This is the code i created using the R software and the packages "caret" and "e0171".The script was supposed to work in all general cases.library(caret)library(e1071)# Categorical vectorspam <- c("spam","not_spam")spam_vec <- sample(spam,60, replace = T) # Supposing two independent variables so that the kappa will be close to 0.x1 <- rnorm(60)x2 <- rnorm(60)# Creating the datasetdata1 <- cbind(spam_vec,x1,x2)data1 <- as.data.frame(data1)names(data1) <- make.names(c("spamvec","x","y"))# Creating the partitionindex <- createDataPartition(data1$spamvec,                             p=0.8, list=FALSE)training_data <- data1[index,]testing_data <- data1[-index,]fitControl <- trainControl(method = "cv",                          number = 5,                          savePred = TRUE,                          classProb = TRUE)tune.grid <- expand.grid(C = seq(0, 10, .1))# Scaling the predictorspreProcess_cs <- preProcess(training_data[, -1],                                  method = c("center", "scale"))spam_training_cs      <- predict(preProcess_cs, training_data)spam_testing_cs       <- predict(preProcess_cs, testing_data)# Training a Naive Bayes to predict binary outcomeNaive_Bayes_Model=naiveBayes(spamvec ~.,                             data=spam_training_cs,                             tuneGrid = tune.grid,                             trControl = fitControl)# Confusion matrix prediction <- predict(Naive_Bayes_Model, spam_testing_cs)confusionMatrix(prediction, spam_testing_cs$spamvec, positive = "spam")confM <- confusionMatrix(prediction, spam_testing_cs$spamvec, positive = "spam")accuracy <- confM$overall[1]accuracy