Saturday 20 December 2014

Competency 9.1

Competency 9.1: Identify and describe professional and research organizations that are prominent in developing learning analytics as a domain.

The Society for Learning Analytics Research (SoLAR)
  • SoLAR is an inter-disciplinary network of leading international researchers who are exploring the role and impact of analytics on teaching, training and development.  It pursues the development of research opportunities in learning analytics and educational data mining, increases the profile of learning analytics in education and advocates learning analytics to policy makers.
Columbia University
  • Column University articulates cognitive studies in education.  Through implementing a curriculum in this aspect, the university trains students in basic theories of human cognition, the practices and interpretation of empirical cognitive and developmental research, and the use of research to improve educational practices and develop innovative methods from new technologies.
Educational Technology & Society
  • Educational Technology & Society is a quarterly journal putting together issues affecting the developers of educational systems and educators who implement and manage such systems.  It actively promotes educators to proactively harnessing the available technologies and how they might be able to influence further developments through systematic feedback and suggestions.
Educause
  • Educause is a nonprofit association and a community of IT leaders and professionals committed to advancing higher education.  It supports those who lead, manage, and use information technology to shape strategic IT decisions at every level within higher education.

Friday 19 December 2014

Competency 8

Competency 8.1: Prepare data for use in LightSIDE and use LightSIDE to extract a wide range of feature types.


Competency 8.2: Build and evaluate models using alternative feature spaces.






Competency 8.3: Compare the performance of different models.



Competency 8.4: Inspect models and interpret the weights assigned to different features as well as to reason about what these weights signify and whether they make sense.


Competency 7.4

Competency 7.4: Describe how models might be used in Learning Analytics research, specifically for the problem of assessing some reasons for attrition along the way in MOOCs.

Experience reveals that there are couple of reasons for MOOC learners to attrit, typically:
  • the course content is not what the learner expect;
  • the learner finds the learning system tiresome;
  • the learner is frustrated with the learning system;
  • the learner finds the pace of the course too slow and is feeling boring;
  • the learner fails to catch up with the progress of the course;


All these will lead to a disengaged behaviour of the learner concerned.

In a MOOC learning environment, teachers and course leaders alike are bound to face some tens or even hundreds of thousand of learners at a time.  Apparently it is extremely challenging for teachers and course leaders to pay close attention to each individual learner to assess if they are  following the course comfortably, or that whether any one of them has problem in keeping pace with progress, or that whether any one of them is feeling bored, frustrated or disengaged with the pace.



This situation can be improved with support from analytic models.  By applying predictive models upon learning data, learner’s behaviour can be illuminated and monitored.  Some of these behaviour are clues of attrition such as when the learner is “gaming” the system, or that the learner submit an assignment earlier than normal, or that the learner falls behind the progress too far, and the like.  These are all clues of a potential attrition.  Predictive models can help single out these clues so that teachers or course leaders are able to “find the needle from the hay” and provide appropriate intervention to the learner in need.
Survival Modeling:
Survival model is a regression model that captures the changes in probability of survival over time. It captures the probability at each time point and it is measured in terms of  hazard ratio which indicates how much more or less likely a student is to drop out. If Hazard ratio>1, the student is significantly more likely to drop out in the next time point.

Sentiment analysis in MOOC forums looked at Expressed sentiment and Exposure to sentiment. The four independent variables Individual Positivity, Individual Negativity, Thread Positivity and Thread Negativity were used to calculate the dependent variable Dropout. The effects were relatively weak and inconsistent across courses.

Some factors that may contribute to student attrition like student's prior motivation, skill set/ knowledge in the area, previous experience in learning MOOCs are difficult to capture. We can link different analysis methods like social network analysis, text mining, predictive modeling and survey data analysis to try to get the complete picture of an individual student for more consistent results.

Competency 7.3

Competency 7.3: Use tools such as LightSIDE in a very simple way to run a text classification experiment.

I used LightSIDE tool as explained by Dr. Carolyn to run a simple classification experiment. The tool is easy to use and straightforward if we follow the steps. 


 
I got Accuracy 58% and Kappa = 0.45 for the model as given in the assignment.

Competency 7.2

Competency 7.2: Detail subareas of text mining such as collaborative learning process analysis.

Collaborative Learning Process Analysis
It is the process of analyzing the collaborative learning process of students using text mining techniques. Different indicators and language features are used for this study. Some of them are:

  • General indicators of interactivity
  • Turn length
  • Conversation Length
  • Number of student questions
  • Student to tutor word ratio
  • Student initiative
  • Features related to cognitive processes
  • Transactivity

Data familiarity in the domain is important to understand and develop features that are relevant.

Competency 7.1

Competency 7.1: Describe prominent areas of text mining.

Text mining, also referred to as text data mining, roughly equivalent to text analytics, refers to the process of deriving high-quality information from text. High-quality information is typically derived through the devising of patterns and trends through means such as statistical pattern learning. Text mining usually involves the process of structuring the input text (usually parsing, along with the addition of some derived linguistic features and the removal of others, and subsequent insertion into a database), deriving patterns within the structured data, and finally evaluation and interpretation of the output. 'High quality' in text mining usually refers to some combination of relevance, novelty, and interestingness. Typical text mining tasks include text categorization, text clustering, concept/entity extraction, production of granular taxonomies, sentiment analysis, document summarization, and entity relation modeling (i.e., learning relations between named entities).

Prominent Areas of Text Mining

Information Retrieval:

Information Retrieval is the process of searching and retrieving the required document from a collection of documents based on the given search query. The search engines we use like Google, Yahoo etc. make use of IR techniques for matching and returning documents relevant to the user's query.

Document Classification/ Text Categorization:

Classification is the process of identifying the category a new observation belongs to, on the basis of a training set consisting of data with pre-defined categories (supervised learning). An example is the classification of email into spam/non-spam.

Clustering:

Clustering is the unsupervised procedure of classification where a set of similar objects are grouped to a cluster. An example analysis would be the summarization of common complaints based on open-ended survey responses.

Trend Analysis:

Trend Analysis is the process of discovering the trends of different topics over a given period of time. It is widely applied in summarizing news events and social network trends. An example would be the prediction of stock prices based on news articles.

Sentiment Analysis:

Sentiment analysis is the process of categorizing opinions based on sentiments like positive, negative or neutral. Sample applications include identifying sentiments in movie reviews and gaining real-time awareness to users' feedback.

Thursday 18 December 2014

Competency 6.2

Competency 6.2: Learn about key diagnostic metrics and their uses.

 

Metrics for Classifiers

Accuracy:

The easiest measure of model goodness is accuracy. It is also called agreement, when measuring the inter-rater reliability.

Accuracy = # of agreements/ Total # of assessments

It is generally not considered a good metric across fields, since it has non even assignment to categories and not useful. E.g. 92% accuracy in the Kindergarten Failure Detector Model in the extreme case always says Pass.

Kappa:

Kappa = (Agreement - Expected Agreement) / (1 - Expected Agreement)

If Kappa value
= 0, agreement is at chance
= 1, agreement is perfect
= negative infinity, agreement is perfectly inverse
> 1, something is wrong
< 0, agreement is worse than chance
0<Kappa<1, no absolute standard. For data-mined models, 0.3-0.5 is considered good enough for publishing.
Kappa is scaled by the proportion of each category, influenced by the data set. We can compare the Kappa values within the same data set, but not between two data sets.

ROC:

The Receiver Operating Characteristic Curve (ROC) is used while a model predicts something having two values (E.g correct/incorrect, dropout/not dropout) and outputs a probability or other real value (E.g. Student will drop out with 73% probability).

It takes any number as cut-off (threshold) and some number of predictions (maybe 0) may then be classified as 1's and the rest may be classified as 0s. There are four possibilities for a classification threshold:
True Positive (TP) - Model and the Data say 1
False Positive (FP) - Data says 0, Model says 1
True Negative (TN) - Model and the Data say 0
False Negative (FN) - Data says 1, Model says 0

The ROC Curve has in its X axis Percent False Positives (Vs. True Negatives) and in Y axis Percent True Positives (Vs. False Negatives). The model is good if it is above the chance line in its diagonal.

A':

A' is the probability that if the model is given an example from each category, it will accurately identify which is which. It is a close relative of ROC and mathematically equivalent to Wilcoxon statistic. It gives useful result, since we can compute statistical tests for:
- whether two A' values are significantly different in the same or different data sets.
- whether an A' value is significantly different than choice.

A' Vs Kappa:

A' is more difficult to compute and works only for 2 categories. It's meaning is invariant across data sets i.e) A'=0.6 is always better than A'=0.5. It is easy to interpret statistically and has value almost always higher than Kappa values. It also takes confidence into account.

Precision and Recall:

Precision is the probability that a data point classified as true is actually true. Precision = TP / (TP+FP) Recall is the probability that a data point that is actually true is classified as true. Recall = TP / (TP+FN) They don't take confidence into account.

 

Metrics for Regressors

Linear Correlation (Pearson correlation):

In r(A,B) when A's value changes, does B change in the same direction?
It assumes a linear relationship.
If correlation value is
1.0 : perfect
0.0 : none
-1.0 : perfectly negatively correlated
In between 0 and 1 : Depends on the field
0.3 is good enough in education since a lot of factors contribute to just any dependent measure.
Different functions (outliers) may also have the same correlation.

R square:

R square is correlation squared. It is the measure of what percentage of variance in dependent dependent measure is explained by a model. If predicting A with B,C,D,E, it is often used as the measure of model goodness rather than r.

MAE/MAD:

Mean Absolute Error/ Deviation is the average of absolute value of actual value minus predicted value. i.e) the average of each data point's difference between actual and predicted value. It tells the average amount to which the predictions deviate from the actual value and is very interpret able.

RMSE:

Root Mean Square Error (RMSE) is the square root of average of (actual value minus predicted value)^2. It can be interpreted similar to MAD but it penalizes large deviation more than small deviation. It is largely preferred to MAD. Low RMSE is good.

RMSE/ MAD
Correlation
Model
Low
High
Good
High
Low
Bad
High
High
Goes in the right direction, but systematically biased
Low
Low
Values are in the right range, but doesn’t capture relative change

Information Criteria:

BiC:

Bayesian Information Criterion (BiC) makes trade-off between goodness of fit and flexibility of fit (number of parameters). The formula for linear regression:
BiC' = n log (1-r^2) + p log n 
where n - number of students, p - number of variables
If value > 0, worse than expected, given number of variables
   value <0, better than expected, given number of variables
It can be used to understand the significance of difference between models. (E.g. 6 implies statistically significant difference)

AiC:

An Information Criterion/ Akaike's Information Criterion (AiC) is an alternative to BiC. It has slightly different trade-off between goodness and flexibility of fit.

Note: There is no single measure to choose between classifiers. We have to understand multiple dimensions and use multiple metrics.

Types of Validity

Generalizability:

Does your model remain predictive when used in a new data set? Generalizability underlies the cross-validation paradigm that is common in data mining. Knowing the context of the model where it will be used in, drives the kind of generalization to be studied. Fail: Model of boredom built on data from 3 students fails when applied to new students

Ecological Validity:

Do your findings apply to real-life situations outside of research settings? E.g. If a behavior detector built in lab settings work in real classrooms.

Construct Validity:

Does your model actually measure what it was intended to measure? Does your model fir the training data? (provided the training data is correct)

Predictive Validity:

Does your model predict not just the present, but the future as well?

Substantive Validity:

Does your results matter?

Content Validity:

From testing; Does your test cover the full domain it is meant to cover? For behavior modeling, does the model cover the full range of behavior it is intended to?

Conclusion Validity:

Are your conclusions justified based on evidence?

Competency 6.1

Competency 6.1: Learn how to engineer both features and training labels
  
Feature Engineering
Feature engineering is the art of creating predictor variables. The model will not be good if our features (predictors) are not good. It involves lore rather than well-known and validated principles.It is the “massaging” (transformation) or “finishing touch” (organisation) of raw data into features that better represent the underlying problem to a model so that the model can learn a solution to a problem from data (or provide improved and accurate prediction of the unseen data).

We need good features to highlight the constructs inherent in the data.  With good features, one can choose the “wrong” model and still get “good” results.  The flexibility of good features allows us to use less sophisticated models that are faster to run, easier to maintain and more friendly to understand.  We are then closer to the underlying problem with the features more readily illuminating the problem.

The big idea is how we can take the voluminous, ill-formed and yet under-specified data that we now have in education and shape it into a reasonable set of variables in an efficient and predictive way.

Process:
  1. Brainstorming features - IDEO tips for brainstorming
  2. Deciding what features to create - trade-off between effort and usefulness of feature
  3. Creating the features - Excel, OpenRefine, Distillation code
  4. Studying the impact of features on model goodness
  5. Iterating on features if useful - try close variants and test
  6. Go to 3 (or 1)
Feature engineering can over-fit --> Iterate and use cross-validation, test on held-out data or newly collected data.Thinking about our variables is likely to yield better results than using pre-existing variables from a standard set.
There are different approaches to engineer features and learning labels.  Here are some classical ones:
Feature Extraction
An automatic process to reduce the dimensionality of observations into smaller set that can be modelled.

Feature Selection
An automatic process to select a subset most relevant to the problem (e.g. scoring)

Manual Construction
A manual process to craft features in a way to expose them to the model (e.g. combining or decomposing features to create new ones)

Wednesday 17 December 2014

Competency 5.2

Competency 5.2: Understand core uses of prediction modeling in education.

Prediction modelling in education is the process of applying statistical techniques in examining education data with the intention of constructing a model to infer (predict) one perspective of the data from some combination of other perspectives of the data.

Educators are able to leverage on insights from “prediction models” to analyse learners’ behavior traits as they interact during the learning process and left as data in some databases of the learning community.  “Learning Models”, when appropriately applied, would “infer” certain aspects of the learners and fuel as insight to educators to:
  • Design course content which impacts positively to learners;
  • Introduce automatic software to provide instant help/assistant to learners as and when necessary during the learning interactions;
  • Prompt teachers of at-risk students for necessary follow up and/or intervention.

My ideas in using prediction modeling for education:

1. Predict future career path and train accordingly:
If we are able to predict the future career path of students based on their interests in subjects, we can give more field-level training. That kind of education will be more meaningful to students to gain the skills required in the industry. Students will also be more interested to learn what they like, rather than being forced upon to learn something they don't prefer.

2. Provide help for weak students:
Not all students require the same amount of help to understand the subject. Some students may learn easier than others. If we predict the different possible points where students may find difficulties, we can provide help in the specific areas.

3. Identify competencies:
If we can identify the competencies of students and what they lack, we can provide more guidance in that area. For example, a student does his work perfectly and exhibits good leadership, but doesn't practice teamwork, we can guide him to learn teamwork competency better.

Competency 5.1

Competency 5.1: Learn to conduct prediction modeling effectively and appropriately.

Screenshots of the prediction modeling done on data set "CogSci-Godwinetal-2013-3" with dependent variable as ONTASK.





Competency 4.2

Competency 4.2: Describe and interpret the results of social network analysis for the study of learning.

Social Network Analysis (SNA) has been widely adopted as a scientific and empirical tool to understand the characteristics and habitat of a social environment.  Its various metrics represent insights to the behaviour of its constituents and thus the community as a whole.
In the context of a learning environment, inferences could be drawn from the respective SNA metrics as follows:
Homophily
  • It could be interpreted as the similarity of the learners profile within a cluster (group) versus the others (e.g. age, race, gender, majors, etc.)
Multiplexity
  • It could be interpreted as the strength (cohesiveness) of the relationship between two learners within the learning network.
Mutuality
  • It could be interpreted as the extent to which two learners reciprocate each other in their learning interactions.
Closure
  • It could be used to understand when one learner is having another learner as friend, to what extent this relationship could be transit to another learner(s).
Propinquity
  • It could be interpreted as the impact of geographic proximity is having on the degree of learning interactions.
Bridge
  • It could be seen as a measure of an individual learner filling a “structure hole” (absence of ties) of a learning network.
Centrality
  • There is a good set of measurements around this metrics (e.g. “betweenness”, “closeness”, etc.) which intend to describe the influence of a particular learner to the community.
Density
  • It could be interpreted as the cohesiveness of the learners within the community and the frequency of learning interactions therein.
Distance
  • It could be seen as the length of the path where information is transmitted within the learning community.
Strength
  • It could be seen as the intensity of the relationship and interactions amongst learners within the community.
Clustering Coefficient
  • It could be interpreted as the likelihood of two learners are associated with each other.

Competency 4.1

Competency 4.1: Describe and critically reflect on approaches to the use of social network analysis for the study of learning.

Social Network Analysis (SNA) is an useful tool to sketch out and analyse within a social network, its constituents, structure and the inter-relationship of the constituents.  Learning, by and large, is an interactive activity within a community (be it a Mass Open Online Course or a classroom-based lecture).

By consolidating and analysing the data pertaining to the learning community using SNA approaches will facilitate a scientific examination of the features of the community, its structure, might as well the characteristics of its participants.

These approaches will greatly enhance understanding of the learning processes and the learners, deepen what works well and what destroys value to the learning processes.  Insight can also be illuminated from the SNA findings to support teaching staff in providing appropriate support and/or intervention to learners.  These insight should also shed light to educators in designing learning curricula that meets the knowledge state of the learners and tailors the delivery approach to suit the peculiar circumstances of the learning environment.
 
The impact of social network analysis on educational constructs like learning design, sense of community, creative potential, social presence, academic performance and MOOC pedagogy looks promising. The possible data sources could be discussion boards, course enrollments, twitter and other social networks data, self-reports or course design. Metrics like network density, degree centrality, eccentricity, modularity etc. help us to get an idea about the network of and individuals in a network.

Learning design could affect students’ activities in a big way. Students who are familiar with a design are generally more comfortable using it. To see if students in a course are learning as expected, we can monitor them using SNA and guide them as needed. We can see at what stage the instructor's role is more important than peer-facilitation by seeing the interactions and provide help to students.

Monitoring the sense of community will be useful in identifying isolated groups/ individuals who may not receive all information. We can in such cases guide them to be part of larger communities. We can also advise students to join new groups for assignments to get connected to more students. These factors can impact the creative potential, social presence and academic performance of students if suitable help is provided. Awareness of more ways of communication and their usefulness should be advocated to students to help them understand the distributed structure of MOOCs and be better involved. 

Educators must, nevertheless, be mindful of a key consideration when applying SNA approaches in the context of “self-help” learning.  The study of learning using the SNA approaches rest on the productiveness of learners to participate and therefore “behave” during the learning processes.  In the worst scenario where certain learners are “inert” in the processes, SNA approaches might be too remote to provide adequate data for the analysis and thus fail to illuminate situation to fuel the study of learning.

Competency 3.2

Competency 3.2: Perform social network analysis and visualize analysis results in Gephi.


Here are few visualizations related to Modularity , Closeness , Betweenness measures in GEPHI
 



 

Tuesday 16 December 2014

Competency 3.1

Competency 3.1: Define social network analysis and its main analysis methods.

Social Network Analysis (SNA) provides insights into how different social processes unfold while learning happens in any learning environment. It helps us to study the effects of interaction and social context in education. The different network elements are actors and their relations. Social Network Analysis is defined as the mapping of connections between the nodes or actors (peoples) , communities (Organizations) and etc based on few measures.
 

The nodes/ actors could be students email addresses, tweets or any such actions. I would typically use SNA to see the interaction between students, for example in a chatroom/ discussion forum, to see who is talking to whom, who replies to whom, who is following what question, who voted for a question etc. Based on the interaction patterns, we can construct the network graph. We can from here see if any measure from the network can correlate to learning or performance.  
Some measures in SNA for analysis are below:

Diameter:

Diameter determines the longest distance between any pair of nodes in a network. It measures the extent to which each individual node can communicate with any other node in the network. 

Density:

Density determines the potential of the entire network to talk to each other. It can be used to determine the extent to which some individual nodes share the information. The spread of information is very fast in a highly dense network. 

Degree Centrality:

Degree centrality is a simple measure that indicates the overall number of connections for each actor in a network. Network measures may have specific meaning when considered in the context of directed graphs.
In-Degree Centrality:
In-degree centrality is a measure of the number of other nodes that directly try to establish connection to a particular node. Also refers to the popularity or prestige of a node in a network.
Out-Degree Centrality:
Out-degree centrality is the measure of the number of nodes to which particular nodes are talking. 

Betweenness Centrality:

Betweenness centrality indicates the ease of connection with anybody else in the network, in particular, to try to connect all small sub communities in the network. Brokerage role is best measured by this measure.

Closeness Centrality:

Closeness centrality measures the ease or the shortest distance of a node to anybody else in the network. It indicates how quickly a node can get to another node in the network.

Network Modularity:

Network modularity is used to identify common sub-groups talking to each other where a group of actors have close ties to each other. An algorithm for finding the giant component can be used to identify the largest component of all connected nodes in the network. This filters out single nodes that are not connected to the network to easily identify and analyse communities in the network.

Monday 15 December 2014

Competency 2.3

Competency 2.3: Evaluate the impact of policy and strategic planning on systems-level deployment of learning analytics.

Learning Analytics will help in real time assessment , inturn focused on improving individuals learning skills . This cannot be achieved with no good leadership, proper strategic plans.Planning on systems-level deployment of learning analytics depends on the size of the data sets . We need required data to deal with analytics in small or big data and Whats the role of the data .
 
Learning analytics could create a bigger impact on learning if implemented top-down than bottom up due to the availability of big data.However, the deployment of learning analytics faces many challenges at the institutional level:
 
1. Acceptance:
To work big on big data, big support is needed from the top management. The top management should foresee the future and possibilities of learning analytics as to what it can achieve. Only with promised outcomes, they can be expected to support it at a big level. It is not a small change to bring about in a day.

2. Management:
A new department may be needed to manage what should be done in learning analytics. This will require funding, responsible experts, manpower and technical training. Do the institutions have what it takes to commit to this new venture?

3. Ethics:
Personal Data Protection is a growing concern these days. When data is analyzed, it has to pass through humans and systems. How safe can our data be? Could there be a possible breach in security and what could be its implication?

When we have answers for all these questions, we could probably move forward to the next era of data analytics!