Microsoft’s quest for open-source sharing through its AI research division has led to an inadvertent exposure of sensitive company data. This incident underscores the vulnerabilities inherent in sharing and accessing large data sets, especially when employing certain Azure features.
The Genesis: Open-Source Code and the Unexpected Link
The Microsoft AI research team, aiming to contribute to the broader research community, uploaded training data on GitHub. This included open-source code and AI models for image recognition. However, a link inadvertently embedded within these files gave access to a whopping 38TB of private company data.
- Content exposed included backups of Microsoft employees’ computers, revealing passwords to Microsoft services, secret keys, and over 30,000 internal Teams messages from hundreds of the company’s employees.
- The link was intended to allow researchers to download pretrained models. Microsoft’s use of an Azure feature, “Shared Access Signature (SAS) tokens,” enabled them to create shareable links providing access to their Azure Storage accounts.
- While these tokens can be fine-tuned to grant access to specific data sets or files, in this case, the link gave unrestricted access to the full storage account. Users could potentially alter the content within, including the capability to upload, overwrite, or delete files.
Discovery, Reporting, and Immediate Response
Wiz, a cybersecurity firm, uncovered the security lapse on June 22, 2023. Their research identified the problematic SAS token, revealing the Azure storage bucket’s misconfiguration.
- Upon detecting the issue, Wiz promptly informed Microsoft, which took swift action. By June 23, 2023, the SAS token was revoked, eliminating external access to the vulnerable Azure storage account.
- Microsoft confirmed that no customer data was jeopardized during the incident, and internal services remained uncompromised.
Root Cause and Previous Incidents
This isn’t the first time Microsoft has faced such challenges. In September 2022, SOCRadar, a threat intelligence firm, discovered another misconfigured Azure Blob Storage bucket linked to Microsoft. This contained sensitive data spanning from 2017 to August 2022, associated with a significant number of entities worldwide.
- The recurrent issue with SAS tokens arises from their complexity and the potential to last indefinitely.
- Wiz’s analysis suggests that due to their challenging nature, monitoring and revoking SAS tokens become complicated. Hence, they recommend limiting the use of these tokens as much as possible.
- BleepingComputer previously reported on the risks of using Azure features without stringent checks.
Microsoft’s Assurances and Further Action
Microsoft has been proactive in addressing the concerns of its user base and the broader tech community.
- The company reassured its customers that their data was not exposed and that internal services remained intact.
- They acknowledged the SAS token’s role in the incident and the need for stringent configurations when using such features.
- Microsoft has since revisited its security mechanisms. The company now actively scans all its public repositories to detect such vulnerabilities. Although their system initially missed this link, marking it as a “false positive,” corrective measures have been put in place to detect overly permissive SAS tokens in the future.
Future Safeguards
Given the increasing importance of AI and the vast data sets it requires, such incidents underscore the necessity for enhanced security protocols. Microsoft has taken this as a learning opportunity, issuing a comprehensive set of guidelines on best practices for handling SAS tokens, ensuring that such a scenario does not recur. In conclusion, while AI offers unmatched potential for innovation and growth, the accompanying data security challenges cannot be ignored. Both tech giants and the larger community must ensure that the rush to advance AI doesn’t come at the expense of data integrity and security.. Balancing innovation with the need for stringent cybersecurity measures is of paramount importance in our ever-evolving digital age.