Pavel Simakov - Sentiment analysis in massive online open courses (MOOCs)





by Pavel Simakov on 2013-04-22

I lead development of Course Builder, an open-source platform for running massive open online courses (MOOCs). It's a tool to teach hundreds of thousands of people online.

One of the most common complaints of all MOOCs, most recently mentioned just today in NYT, is the lack of direct student to professor communication. If you are a professor teaching a 100,000 student MOOC, how do you communicate with your students? If you are a student taking a MOOC, how do you reach out to your professor? How do you compete with the 99,999 other students for the professor's time?

We have numerous examples of how to communicate, including snail mail, SMS, EMail, chat, Twitter and so on. What is the best communication model for the MOOC?

First, let's see how we can capture the essence of such communication in a diagram. Look at the top portion of the picture below; portion above the dotted line. This is how one can represent communication between a teacher and a student. This may be face to face or by snail or electronic mail; one message or one phrase at a time. We show a teacher as the conversation initiator, but a student can be the initiator as well. The bottom portion of the picture shows the identical shorthand notation. I will only use this short notation further.

Email, SMS or online chat all have essentially the same communication model - one to one private messaging. Trying to use either one for MOOC will not work, because there are just way too many messages for teacher to process. And it's not just the messages. It's simply impossible for a human to have 100,000 concurrent conversation threads. Here is the model for this:

Twitter communication model is quite different. It uses multicast and lets anyone to reach thousands of recipients with just one message. It works great for the teacher sending a message to all his students at once. It does not work for the student as the teachers don't usually follow them back, unless Justin Bieber of course. Here is the model for Twitter:

One may improve the feedback loop in the Twitter model using data mining algorithms. Imagine that teacher sends a message (1) to all students asking to review videos one through five and asks each students for a free form feedback. Student replies arrive into a sort of sentiment analysis system, where a set of natural language processing and artificial intelligence algorithms "read" all the replies from all the students (2) and show a teacher a clustered summary (3). Here is the model for this:

A summary may be just one message that a professor actually receives. It may say:

  • Dead Dr. Smith,
  • 93% of students watched all the videos
  • 6% of students did not watched and quickly fast forwarded
  • 1% of students in NY were unable to complete the assignment due to power outage caused by the snow storm
  • there were 2,300 follow up and clarification questions posted, most focused around 15th minute of the third video
  • Yours,
  • - Course Builder Artificial Intelligence Teaching Assistant
While, not currently technically feasible for the free form replies, the approach will work well if the feedback are semi-structured. The feedback may be as simple as Like/Don't Like or Easy/Hard or Clear/Confusing or Rate 1-10, and still be very useful. One way or the other, the student feedback, which can be aggregated, clustered and machine processed is essential for closing the education loop.

Useful resources: