Although big data governance on cloud architectures uses the same principles as “regular” data governance, there are several considerations businesses should take into account in order to prevent data assets becoming data liabilities.
In any type of IT environment it is important policies are implemented and enforced to govern the availability, usability, integrity, and security of data. The failure to govern data effectively can result in one of a business’s major assets becoming a liability if it is incapable of providing the insights businesses need to increase revenue, reduce risk, and drive competitive advantage.
When a business operates in the cloud, huge volumes of data can be amassed quickly. Data governance policies have to be capable of managing the huge volumes from potentially thousands of sources so that businesses can be assured the data remains trustworthy. Without trustworthy data it is impossible to make well-informed and accurate decisions.
The exact content of data governance policies will vary from business to business depending on the nature of its operations and the industry it operates in. However, there are four basic policy types that should be the foundation of big data governance on cloud architectures. These address the structure of data, how it is accessed, how it is used, and how it is secured.
Data structure policy
The data structure policy controls the format in which data is collected, processed, and stored. The policy’s purpose is to prevent unstructured data compromising the ability to turn raw data into actionable intelligence.
Data access policy
The data access policy defines which users and applications can access which types of data. Predominantly a security measure, the data access policy also aims to preserve the integrity of data and prevent unauthorized alterations.
Data usage policy
A data policy usage policy is necessary for regulatory compliance. The policy should stipulate data can only be used for the purpose it was collected for and, once the purpose is concluded, the data should no longer be retained.
Data security policy
A separate data security policy is necessary to govern how data is stored. For example, data containing personal identifiable information (PII) should always be encrypted both at rest and in transit.
Considerations for big data governance on cloud architectures
The four policies above are the basics of data governance regardless of how much data is being collected, processed, or stored. However, due to the way in which big data is processed (i.e. using a parallel processing model rather than a linear processing model), there are many more endpoints to govern, which increases the complexity of big data governance in cloud architectures.
Cloud experts agree that big data governance is too complex for human capabilities and recommend automated governance solutions. An automated cloud governance solution monitors the cloud environment to ensure compliance with data policies and, when a policy violation is found, can either alert administrators to the violation or initiate a function to prevent or correct the violation.
Examples of how automated big data governance works
To automate big data governance on cloud architectures, system administrators simply configure the automated cloud governance solution with the policies they want to enforce and the measures the solution should take when a policy violation is identified. In relation to the four types of policy mentioned above, an automated cloud governance solution can:
- Ensure data is collected in the required format by managing how it is recorded (i.e. alpha-numeric or case sensitive)
- Revoke user access if an attempt is made to access data outside of working hours or via an unrecognized IP address
- Enforce data retention policies by managing archived data and automatically deleting data that no longer serves a purpose
- Initiate a function to encrypt unencrypted storage volumes or block access to publicly-accessible storage volumes
Learn more about the best policies for governance in our whitepaper “Benchmark Your Cloud Maturity: A Framework For Best Practices”.