Where Dark Data Hides and How to Use It to Your Advantage

We’re diverting from our Stakeholders in the Modern Credential Marketplace series in this blog post to talk for a moment about Dark Data. What is Dark Data? You probably generated Dark Data at some point today, not because you were travelling the Silk Road or looking for a new identity – but more for an activity like checking your bank balance, looking at your medical records, or digitally signing a document.

Dark Data Defined

To define simply: Dark Data is data which is acquired through various computer network operations but not used in any manner to derive insights or for decision making. In other words, it’s data that exists but is not easily accessible because it has not been indexed, is not searchable, or it exists only in physical form in a single location. 

Dark data can also be defined as all of the “unused, unknown and untapped data across an organisation, generated as a result of users’ daily interactions.” In the case of education, this is a perfect description of hundreds of millions of records of learner performance and assessment data that gets locked away inside LMSs and assessment platforms. While this data is often valuable, it fails to be useful.

Globally, about 55% of an organisation’s data is considered “dark”. When we talk specifically about education, library data, and museum data, that number is often much higher. When it comes to education we often call this chaotic assessment data, but that is not entirely accurate. The data is not usually chaotic or a mess, it’s usually just hidden in the dark.

According to Katrina Biscay, Director – Office of Information Security at the University of Cincinnati “Universities are competing with each other for enrolment numbers and research dollars. Deeper understanding of the student and faculty experience on campus can give a competitive advantage to those willing to invest in ‘dark analysis’. The embedded advantage is that this data is already possessed and stored by the organisation. The only investments necessary would be targeted analysis and business correlation.”

The good news is that through the use of programs like Credentialate, you can not only bring this data into the light, but work to gather better data in the first place. Instead of requiring new assessments to demonstrate learners’ skills, Credentialate surfaces personalised evidence from existing dark data. This can provide insights for those who are still developing their skills, which can be just as essential as those demonstrating achieved competencies.

Dark Data and Big Data

Dark Data should not be confused with Big Data. Gartner defines big data as “the high-volume and/or high-variety information assets that demand cost-effective, innovative forms of information processing.” Bringing data into the light is more about using data that already exists but is hidden, or that is not captured in a meaningful way and in an accessible format in the first place.

But the reality is that the more Big Data we gather, the more data is at risk of going “dark”. It’s important that meaningful data isn’t allowed to go dark in the first place. This is easier said than done. When it comes to education, much of this data could be important to making better strategic decisions and establishing meaningful alternative certifications and credentials.

The same data is vital for helping employers understand the value of those alternate credentials.

So if we are going to keep this data from going dark and if we are going to use it effectively, we inevitably face certain challenges.

The Need to do a Better Job of Collecting Data

For education to take the next steps when it comes to establishing a more robust and skills-based personal evidence record, we need to get better at both creating, collecting, and analysing data.

In a recent ANSI project, nearly every report in the project emphasised the lack of good data on certifications. In reality, some of this needed data is simply dark: It exists, but is not utilised effectively. The truth of the matter is that this problem is systemic.

Students and career counsellors need more and better data to make intelligent decisions. The same is true for the education and training providers, whose jobs involve gathering the data from curriculum and training, and preserving it in the form of certifications. Often, they don’t have the data they need because it is not captured. In other words, the data they need to enrich certifications and share them with the students and career counsellors is worse than dark… as it’s not captured, it doesn’t exist in the first place, resulting in a significantly missed opportunity.

The same is true for the next level of educators as well – policymakers, researchers, and others need more data as well. They need data from the education and training providers, the learners and the employers who hire them, and others to make data driven policy decisions.

This data brings value to everyone involved. But like any other data system, there are challenges, and they are not small ones.


The Language of Dark Data

One of the most frequent things we talk about in relation to education and alternative credentials is language. We must speak a common language. A common skills language is enabled by Rich Skills Descriptors (RSDs) and through applying those same principles to the personal evidence record of every learner.

One of the issues with dark data is language. If common terms are not used in daily communication, credential and assessment data, and other places, the data becomes “dark”. Ideally, a shared language is in place from the beginning, from when the data is generated to when it is seen by the end user. The use of RSDs enables this (although we note that even without RSDs implemented, Credentialate can identify and use assessment data).

RSDs help to contextualise the dark data once it is brought to light, adding greater value and meaning for the learner and those they choose to share their credentials with. An example of this is in one element of the RSD metadata – the skill statement. The skill statement is a description of the applied capabilities and behaviours of an individual for a given task, occupation, or need and defines what the RSD is all about. This enables a curriculum to connect tightly with the skills that employers need and helps match emergent and hidden talent with the employers in need of those skilled workers.

Data Openness and Privacy in Credentials

Data openness in credentials is important to data mobility- equally so, an intentional data design for privacy. Both work to enable learner agency.

The Digital Credentials Consortium recommends optimising the credential infrastructure for openness whenever possible, as a decentralised credentialing ecosystem is more robust, scalable, and flexible. The activities on the decentralised infrastructure can be transparently observed and, in combination with internal records of the centralised components, a complete audit trail can be created. Administrators can confidently monitor credential issuance and revocation.

Central to this effort, they say, is ensuring that issuers can easily integrate the issuing tools and services into their existing Student Information Systems and workflows, something Credentialate was built for. Among others, they specify the following requirements for prioritising learner agency and privacy:

  • Issue credentials that optimise for learner flexibility and privacy – for example, they are not locked-in to a specific system and privacy-enhancing measures ensure that only the learner has agency, while limiting other parties who may want to exploit the learner’s data
  • Enable seamless verification without involvement of the issuer – so that learners can present their credentials for frictionless verification
  • Offer multiple options for credential storage – the learner can choose where to store and manage their credentials
  • Minimise the need for disclosure – in particular for any personally identifying information (PII)

The implementation of the General Data Protection Regulation (GDPR) by the European Commission can be a useful starting point – that is, consider the privacy implications of every decision in the design process. This combines technology design decisions, how individuals and organisations interface with the technology and ways to document levels of compliance with the guidelines.

In education, regulated information includes files containing Personally Identifiable Information (PII) – any information that can be used to distinguish one person from another or can be used to deanonymise previously anonymous data. Failing to adequately secure this information could open your organisation up to a data breach, regulatory fine and ultimately reputation loss.

As you can see, the above points are all important considerations – but if you don’t know what data you have now, how can you secure it, manage it or utilise it effectively?


Using Dark Data

So how do you use Dark Data to your advantage? The first key is to bring it into the light, or to never let it get dark in the first place:

  • Work to capture the data contained in everyday communications and machine data that informs both strategies and decisions
  • Analyse the data you already have. Ensure that dark data is gathered, indexed, searchable, and secure
  • Watch your language, and ensure everyone uses the same terminology to mean the same things
  • Constantly re-evaluate assessment and certification data to ensure that vital information does not go “dark”

The future of alternative credentials is data-dependent. Through openRSD and Credentialate, we work to improve data gathering and utilisation institution-wide.

If you haven’t checked out what we are up to, Book a Skills Chat today and see how we can help make your data work not only for you, but for learners, employers and recruiters as well.

Connect with us

As the world’s first Credential Evidence Platform, Credentialate helps you discover and share evidence of workplace skills. It creates a highly interoperable skills infrastructure that connects, collates and creates order from chaotic or dark data. It identifies skills in the curriculum, maps them to globally recognised skill definitions and aligns to frameworks. Institutions can manage and track skills attainment across the institution, against frameworks and see where improvements can be made. For each learner, a personal evidence record is created – as unique as every learners’ journey. Rich skills information, qualitative and quantitative performance data and links to artefacts of learning are baked into a verifiable digital badge that can be shared. This gives learners the confidence to speak to their strengths, the evidence to prove it and boosts their employability by sending a ‘ready-to-hire’ signal.

If you’d like to learn more about Credentialate, we invite you to Learn More or Schedule a Demo.

Stakeholders in the Modern Credential Marketplace

Read more on the Edalex blog series as we explore who stakeholders in the modern credential marketplace are, what drives them and what challenges and opportunities they face:

  • Final-Stakeholders-Modern-Credential-Marketplace-1200x685 Lens on Learners – how can micro-credentials help today’s learners achieve their education and employment goals?
  • Lens on Educators – if micro-credentials are driving a shift in how education providers approach skill development, how are they responding and what impact does it have?  
  • Lens on Employers – how are companies responding to the need for work-specific skills and how are micro-credentials impacting on their ability to verify candidate skills?


Related Posts

Scroll to Top