Big Data in Learning: The Emerging Value of Online Learning Datasets

Part 1 of Our 4 Part Series

This series is extracted from the CogBooks white paper, “Big Data.” You’ll link to the next post at the end of each article. You can also use this menu to read the articles in the order you prefer.

Part 2: Learning Data Is Incompetent: Refocusing Education Measurement
Part 3: Ten Level Taxonomy of Data: Potential Sources of Learner Insights
Part 4: Flipped Statistics: The Changing Paradigm of Education Data

Big Data, at all sorts of levels in learning, reveals secrets we never imagined we could discover. It reveals things to you, the user, searcher, buyer, and learner. It also reveals things about you to the seller, ad vendors, tech giants, and educational institutions. Big data is now big business, where megabytes mean megabucks. Given that less than 2% of all information is now non-digital, it is clear where the data mining will unearth its treasure—online. As we do more online, searching, buying, selling, communicating, dating, banking, socializing, and learning, we create more and more data that provides fuel for algorithms that improve with big numbers. The more you feed these algorithms, the more useful they become. 

Among the most fascinating examples is Google’s success with big data in their translation service, where a trillion-word data-set provides the feed for translations between over a dozen languages. Amazon’s recommendation engine looks at what you bought, what you didn’t buy, how long you looked at things, and what books are bought together. This big data-driven engine accounts for a third of all Amazon’s sales. With Netflix, their recommendation engine accounts for an astonishing three-quarters of all new orders. Target, the U.S. retailer, knows when someone is pregnant without the mother-to-be telling them. This led to an irate father threatening legal action when his daughter received a mail voucher for baby clothes. He returned a few days later, sheepishly apologizing!

Why Online Data Is Big Data

Online learning, by definition, is data. It can also produce data. This is one of the great advantages of being online that it is a two-way form of communication. For many years data has been gathered and used in online learning. De facto standards even emerged, making this data interoperable, namely Sharable Content Object Reference Model (SCORM) and now TinCan.

[Data produced by online learning] can be gathered and used to solve all sorts of difficult problems in learning, problems that have plagued education and training—formative assessment, dropout rates, course improvement, productivity, and cost reduction.

However, something new has happened: the awareness that the data produced by online learning is much more powerful than we ever imagined. It can be gathered and used to solve all sorts of difficult problems in learning, problems that have plagued education and training—formative assessment, dropout rates, course improvement, productivity, and cost reduction.

Learning Data Is Merely Large Data

We need to start with an admission that big data in learning is really just ‘large data.’ We’re not dealing with the unimaginable amounts of relevant data that Google brings to bear when you search or translate. The datasets we’re talking about coming from individual learners, courses, individual institutions, and sometimes, but rarely from groups of institutions, national tests, and examinations, and rarer still, from international tests or large complexes of institutions where the same platform is used. 

It is only when you get to very large populations of learners that you get BIG Data, in the sense that Meyer-Schonberger uses the term in his book Big Data. So perhaps we should be a bit realistic about the word ‘big’ in an educational context, as it is unlikely that many—other than a few large multinational—private companies will have truly ‘big’ data. Blackboard, Coursera, and others may be able to muster massive data sets, but a typical school, college, or university may not. Big data theorists are really talking about data sets that are many orders of magnitude bigger. 

Nevertheless, data on learners across an institution or number of institutions may be useful in terms of performance, possible course improvements, and drop-outs. At this organizational level, it is vital that institutions gather data that is much more fine-grained than just assessment scores and numbers of students who leave. 

Even within delivered courses, large data sets (‘large’ may be more realistic than ’big’ in this context) may be useful in course design and delivery. When smart algorithms are applied to these data sets, real improvements in course design and even real-time delivery of courseware can be implemented.

Read Part Two: "Learning Data is Incompetent: Refocusing Education Measurement"

Give Students Greater Agency and Instructors More Control

CogBooks weaves student agency, instructor empowerment, and curriculum affordability ($39.95 per course) into a comprehensive, adaptive learning platform. This simple to adopt and manage tool is a direct replacement for textbooks. Higher education institutions or instructors can choose CogBooks for a single course or create an entire degree program such as the Biospine Initiative at Arizona State University. The CogBooks adaptive learning platform has been used by more than 200,000 students worldwide. It is proven to reduce dropouts by 90%* while improving student performance by 24%.* Connect with us if you’re interested in learning more, creating a custom course, or developing an entire degree program.

*Data from a consecutive four-year study in Introduction to Biology for Non-Majors at Arizona State University.