Saturday 20 December 2014

Competency 9.1

Competency 9.1: Identify and describe professional and research organizations that are prominent in developing learning analytics as a domain.

The Society for Learning Analytics Research (SoLAR)
  • SoLAR is an inter-disciplinary network of leading international researchers who are exploring the role and impact of analytics on teaching, training and development.  It pursues the development of research opportunities in learning analytics and educational data mining, increases the profile of learning analytics in education and advocates learning analytics to policy makers.
Columbia University
  • Column University articulates cognitive studies in education.  Through implementing a curriculum in this aspect, the university trains students in basic theories of human cognition, the practices and interpretation of empirical cognitive and developmental research, and the use of research to improve educational practices and develop innovative methods from new technologies.
Educational Technology & Society
  • Educational Technology & Society is a quarterly journal putting together issues affecting the developers of educational systems and educators who implement and manage such systems.  It actively promotes educators to proactively harnessing the available technologies and how they might be able to influence further developments through systematic feedback and suggestions.
Educause
  • Educause is a nonprofit association and a community of IT leaders and professionals committed to advancing higher education.  It supports those who lead, manage, and use information technology to shape strategic IT decisions at every level within higher education.

Friday 19 December 2014

Competency 8

Competency 8.1: Prepare data for use in LightSIDE and use LightSIDE to extract a wide range of feature types.


Competency 8.2: Build and evaluate models using alternative feature spaces.






Competency 8.3: Compare the performance of different models.



Competency 8.4: Inspect models and interpret the weights assigned to different features as well as to reason about what these weights signify and whether they make sense.


Competency 7.4

Competency 7.4: Describe how models might be used in Learning Analytics research, specifically for the problem of assessing some reasons for attrition along the way in MOOCs.

Experience reveals that there are couple of reasons for MOOC learners to attrit, typically:
  • the course content is not what the learner expect;
  • the learner finds the learning system tiresome;
  • the learner is frustrated with the learning system;
  • the learner finds the pace of the course too slow and is feeling boring;
  • the learner fails to catch up with the progress of the course;


All these will lead to a disengaged behaviour of the learner concerned.

In a MOOC learning environment, teachers and course leaders alike are bound to face some tens or even hundreds of thousand of learners at a time.  Apparently it is extremely challenging for teachers and course leaders to pay close attention to each individual learner to assess if they are  following the course comfortably, or that whether any one of them has problem in keeping pace with progress, or that whether any one of them is feeling bored, frustrated or disengaged with the pace.



This situation can be improved with support from analytic models.  By applying predictive models upon learning data, learner’s behaviour can be illuminated and monitored.  Some of these behaviour are clues of attrition such as when the learner is “gaming” the system, or that the learner submit an assignment earlier than normal, or that the learner falls behind the progress too far, and the like.  These are all clues of a potential attrition.  Predictive models can help single out these clues so that teachers or course leaders are able to “find the needle from the hay” and provide appropriate intervention to the learner in need.
Survival Modeling:
Survival model is a regression model that captures the changes in probability of survival over time. It captures the probability at each time point and it is measured in terms of  hazard ratio which indicates how much more or less likely a student is to drop out. If Hazard ratio>1, the student is significantly more likely to drop out in the next time point.

Sentiment analysis in MOOC forums looked at Expressed sentiment and Exposure to sentiment. The four independent variables Individual Positivity, Individual Negativity, Thread Positivity and Thread Negativity were used to calculate the dependent variable Dropout. The effects were relatively weak and inconsistent across courses.

Some factors that may contribute to student attrition like student's prior motivation, skill set/ knowledge in the area, previous experience in learning MOOCs are difficult to capture. We can link different analysis methods like social network analysis, text mining, predictive modeling and survey data analysis to try to get the complete picture of an individual student for more consistent results.

Competency 7.3

Competency 7.3: Use tools such as LightSIDE in a very simple way to run a text classification experiment.

I used LightSIDE tool as explained by Dr. Carolyn to run a simple classification experiment. The tool is easy to use and straightforward if we follow the steps. 


 
I got Accuracy 58% and Kappa = 0.45 for the model as given in the assignment.

Competency 7.2

Competency 7.2: Detail subareas of text mining such as collaborative learning process analysis.

Collaborative Learning Process Analysis
It is the process of analyzing the collaborative learning process of students using text mining techniques. Different indicators and language features are used for this study. Some of them are:

  • General indicators of interactivity
  • Turn length
  • Conversation Length
  • Number of student questions
  • Student to tutor word ratio
  • Student initiative
  • Features related to cognitive processes
  • Transactivity

Data familiarity in the domain is important to understand and develop features that are relevant.

Competency 7.1

Competency 7.1: Describe prominent areas of text mining.

Text mining, also referred to as text data mining, roughly equivalent to text analytics, refers to the process of deriving high-quality information from text. High-quality information is typically derived through the devising of patterns and trends through means such as statistical pattern learning. Text mining usually involves the process of structuring the input text (usually parsing, along with the addition of some derived linguistic features and the removal of others, and subsequent insertion into a database), deriving patterns within the structured data, and finally evaluation and interpretation of the output. 'High quality' in text mining usually refers to some combination of relevance, novelty, and interestingness. Typical text mining tasks include text categorization, text clustering, concept/entity extraction, production of granular taxonomies, sentiment analysis, document summarization, and entity relation modeling (i.e., learning relations between named entities).

Prominent Areas of Text Mining

Information Retrieval:

Information Retrieval is the process of searching and retrieving the required document from a collection of documents based on the given search query. The search engines we use like Google, Yahoo etc. make use of IR techniques for matching and returning documents relevant to the user's query.

Document Classification/ Text Categorization:

Classification is the process of identifying the category a new observation belongs to, on the basis of a training set consisting of data with pre-defined categories (supervised learning). An example is the classification of email into spam/non-spam.

Clustering:

Clustering is the unsupervised procedure of classification where a set of similar objects are grouped to a cluster. An example analysis would be the summarization of common complaints based on open-ended survey responses.

Trend Analysis:

Trend Analysis is the process of discovering the trends of different topics over a given period of time. It is widely applied in summarizing news events and social network trends. An example would be the prediction of stock prices based on news articles.

Sentiment Analysis:

Sentiment analysis is the process of categorizing opinions based on sentiments like positive, negative or neutral. Sample applications include identifying sentiments in movie reviews and gaining real-time awareness to users' feedback.

Thursday 18 December 2014

Competency 6.2

Competency 6.2: Learn about key diagnostic metrics and their uses.

 

Metrics for Classifiers

Accuracy:

The easiest measure of model goodness is accuracy. It is also called agreement, when measuring the inter-rater reliability.

Accuracy = # of agreements/ Total # of assessments

It is generally not considered a good metric across fields, since it has non even assignment to categories and not useful. E.g. 92% accuracy in the Kindergarten Failure Detector Model in the extreme case always says Pass.

Kappa:

Kappa = (Agreement - Expected Agreement) / (1 - Expected Agreement)

If Kappa value
= 0, agreement is at chance
= 1, agreement is perfect
= negative infinity, agreement is perfectly inverse
> 1, something is wrong
< 0, agreement is worse than chance
0<Kappa<1, no absolute standard. For data-mined models, 0.3-0.5 is considered good enough for publishing.
Kappa is scaled by the proportion of each category, influenced by the data set. We can compare the Kappa values within the same data set, but not between two data sets.

ROC:

The Receiver Operating Characteristic Curve (ROC) is used while a model predicts something having two values (E.g correct/incorrect, dropout/not dropout) and outputs a probability or other real value (E.g. Student will drop out with 73% probability).

It takes any number as cut-off (threshold) and some number of predictions (maybe 0) may then be classified as 1's and the rest may be classified as 0s. There are four possibilities for a classification threshold:
True Positive (TP) - Model and the Data say 1
False Positive (FP) - Data says 0, Model says 1
True Negative (TN) - Model and the Data say 0
False Negative (FN) - Data says 1, Model says 0

The ROC Curve has in its X axis Percent False Positives (Vs. True Negatives) and in Y axis Percent True Positives (Vs. False Negatives). The model is good if it is above the chance line in its diagonal.

A':

A' is the probability that if the model is given an example from each category, it will accurately identify which is which. It is a close relative of ROC and mathematically equivalent to Wilcoxon statistic. It gives useful result, since we can compute statistical tests for:
- whether two A' values are significantly different in the same or different data sets.
- whether an A' value is significantly different than choice.

A' Vs Kappa:

A' is more difficult to compute and works only for 2 categories. It's meaning is invariant across data sets i.e) A'=0.6 is always better than A'=0.5. It is easy to interpret statistically and has value almost always higher than Kappa values. It also takes confidence into account.

Precision and Recall:

Precision is the probability that a data point classified as true is actually true. Precision = TP / (TP+FP) Recall is the probability that a data point that is actually true is classified as true. Recall = TP / (TP+FN) They don't take confidence into account.

 

Metrics for Regressors

Linear Correlation (Pearson correlation):

In r(A,B) when A's value changes, does B change in the same direction?
It assumes a linear relationship.
If correlation value is
1.0 : perfect
0.0 : none
-1.0 : perfectly negatively correlated
In between 0 and 1 : Depends on the field
0.3 is good enough in education since a lot of factors contribute to just any dependent measure.
Different functions (outliers) may also have the same correlation.

R square:

R square is correlation squared. It is the measure of what percentage of variance in dependent dependent measure is explained by a model. If predicting A with B,C,D,E, it is often used as the measure of model goodness rather than r.

MAE/MAD:

Mean Absolute Error/ Deviation is the average of absolute value of actual value minus predicted value. i.e) the average of each data point's difference between actual and predicted value. It tells the average amount to which the predictions deviate from the actual value and is very interpret able.

RMSE:

Root Mean Square Error (RMSE) is the square root of average of (actual value minus predicted value)^2. It can be interpreted similar to MAD but it penalizes large deviation more than small deviation. It is largely preferred to MAD. Low RMSE is good.

RMSE/ MAD
Correlation
Model
Low
High
Good
High
Low
Bad
High
High
Goes in the right direction, but systematically biased
Low
Low
Values are in the right range, but doesn’t capture relative change

Information Criteria:

BiC:

Bayesian Information Criterion (BiC) makes trade-off between goodness of fit and flexibility of fit (number of parameters). The formula for linear regression:
BiC' = n log (1-r^2) + p log n 
where n - number of students, p - number of variables
If value > 0, worse than expected, given number of variables
   value <0, better than expected, given number of variables
It can be used to understand the significance of difference between models. (E.g. 6 implies statistically significant difference)

AiC:

An Information Criterion/ Akaike's Information Criterion (AiC) is an alternative to BiC. It has slightly different trade-off between goodness and flexibility of fit.

Note: There is no single measure to choose between classifiers. We have to understand multiple dimensions and use multiple metrics.

Types of Validity

Generalizability:

Does your model remain predictive when used in a new data set? Generalizability underlies the cross-validation paradigm that is common in data mining. Knowing the context of the model where it will be used in, drives the kind of generalization to be studied. Fail: Model of boredom built on data from 3 students fails when applied to new students

Ecological Validity:

Do your findings apply to real-life situations outside of research settings? E.g. If a behavior detector built in lab settings work in real classrooms.

Construct Validity:

Does your model actually measure what it was intended to measure? Does your model fir the training data? (provided the training data is correct)

Predictive Validity:

Does your model predict not just the present, but the future as well?

Substantive Validity:

Does your results matter?

Content Validity:

From testing; Does your test cover the full domain it is meant to cover? For behavior modeling, does the model cover the full range of behavior it is intended to?

Conclusion Validity:

Are your conclusions justified based on evidence?