Git Repository Security Hygiene – Part 2: How to React When Sensitive Information Leaves Your Internal Workspace
Last time, we covered what you can proactively do to prevent sensitive information from leaving your internal workspace when you make open source commits. But what should you do when that information has already made it out onto the internet? Unfortunately, even if it was just exposed for a millisecond, someone can copy it – so you have to assume that anything you inadvertently release is potentially out there for good. You can still minimize your exposure, however.
For GitHub projects, one helpful tool is the GitHub dependency graph, found in the “insights” section of your repository. This maps out the projects that your project is dependent on and which others depend on it.
You can combine this with GitHub’s vulnerability handling process, which can automatically issue advisories when security issues are found and then propagate those advisories as folders to all your dependent projects. Alerts appear as yellow boxes listing the kind of alert found.
These lists can sometimes be accompanied by pull requests from a tool called dependabot, which offers automated dependency updates. These are very helpful but need to be addressed carefully.
You should definitely review a dependabot pull request as you would any other PR. Don’t immediately merge them, however, because dependabot is an open source tool, and you can’t be absolutely certain that it hasn’t been compromised. So, just as a matter of security hygiene, primarily use the dependabot PR as an indication that you need to have a serious look at the piece of code it flagged.
Your company might also consider placing tokens in its contributed software. These are typically deployed to recognize and then validate specific sections of code. But you can use the same mechanism to identify problematic code associated with a particular token and prevent it from being used.
GitHub tools already exist that will automatically scan commits for tokens from services like AWS and Azure, for example, and then invalidate those tokens when necessary to ensure that there’s a narrow window where those tokens could be exploited.
Even better, if there’s a clear pattern that you want people to be able to look for, you can put that pattern into a pre-commit hook, proactively ensuring that those tokens will never be exposed in the first place. That can work nicely for you and also benefit your community.
Unfortunately, there’s no perfect way to automatically recover from sharing sensitive information once it is online. All tools have their pros and cons and are only as good as the effort that has been put into developing them. When you become aware of information in your project that shouldn’t be there, there really is no substitute for going back into the code and doing your best to remediate the issue yourself.
Of course, the ideal thing is to be proactive and reduce the chances of ending up in that position as much as you can. Use pre-commit hooks, scan, and filter your code before merging it upstream.
Finally, here’s a table indicating where different approaches are useful
You’ll see that rebase and filtering tools are really only useful as proactive aids in the origination phase, but lose value after that. Pre-commit scripts, however, remain useful throughout. In the proactive phase, they prevent you from releasing problematic code or other information. In the reactive or maintenance phase, they continue to help you avoid sharing sensitive information when you’re creating your fixed commits.