Learning Analytics: Competency 6.1

Competency 6.1: Learn how to engineer both features and training labels

Feature Engineering

Feature engineering is the art of creating predictor variables. The model will not be good if our features (predictors) are not good. It involves lore rather than well-known and validated principles.It is the “massaging” (transformation) or “finishing touch” (organisation) of raw data into features that better represent the underlying problem to a model so that the model can learn a solution to a problem from data (or provide improved and accurate prediction of the unseen data).

We need good features to highlight the constructs inherent in the data. With good features, one can choose the “wrong” model and still get “good” results. The flexibility of good features allows us to use less sophisticated models that are faster to run, easier to maintain and more friendly to understand. We are then closer to the underlying problem with the features more readily illuminating the problem.

The big idea is how we can take the voluminous, ill-formed and yet under-specified data that we now have in education and shape it into a reasonable set of variables in an efficient and predictive way.

Process:

Brainstorming features - IDEO tips for brainstorming
Deciding what features to create - trade-off between effort and usefulness of feature
Creating the features - Excel, OpenRefine, Distillation code
Studying the impact of features on model goodness
Iterating on features if useful - try close variants and test
Go to 3 (or 1)

Feature engineering can over-fit --> Iterate and use cross-validation, test on held-out data or newly collected data.Thinking about our variables is likely to yield better results than using pre-existing variables from a standard set.

There are different approaches to engineer features and learning labels. Here are some classical ones:

Feature Extraction

An automatic process to reduce the dimensionality of observations into smaller set that can be modelled.

Feature Selection

An automatic process to select a subset most relevant to the problem (e.g. scoring)

Manual Construction

A manual process to craft features in a way to expose them to the model (e.g. combining or decomposing features to create new ones)

Learning Analytics

Thursday, 18 December 2014

Competency 6.1

No comments:

Post a Comment