Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 44 additions & 7 deletions Instructions.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ This guide walks you through everything needed to connect your GitHub organizati

- [Setup Purview \& GitHub](#setup-purview--github)
- [Table of Contents](#table-of-contents)
- [Architecture Overview](#architecture-overview)
- [1. Prerequisites](#1-prerequisites)
- [2. Create an Entra App Registration \& Add Permissions](#2-create-an-entra-app-registration--add-permissions)
- [Add API Permissions](#add-api-permissions)
Expand All @@ -33,6 +34,44 @@ This guide walks you through everything needed to connect your GitHub organizati

---

## Architecture Overview

At a high level, this guide wires three systems together — **GitHub**, **Microsoft Entra ID**, and **Microsoft Purview** — so that file changes in your repos are automatically scanned for compliance. The diagram below shows what you're setting up and how the pieces interact at runtime.

```mermaid
flowchart LR
Dev[Developer] -->|push / pull request| Repo[Target Repository]

subgraph GH[GitHub Organization]
Repo
Ruleset[(Org Ruleset<br/>requires Purview workflow)]
WFRepo[Workflow Repo<br/>purview-workflow]
Action[Purview GitHub Action]
Secrets[(Org Secrets:<br/>AZURE_CLIENT_ID,<br/>TENANT_ID, CERT/SECRET)]
end

Ruleset -.enforces.-> Repo
Repo -->|triggers required workflow| WFRepo
WFRepo -->|runs| Action
Secrets -.provides credentials.-> Action

Action -->|authenticate<br/>cert / secret / OIDC| Entra[Microsoft Entra ID<br/>App Registration]
Entra -->|access token| Action

Action -->|scan files &<br/>send activity| Purview[Microsoft Purview]
```

**What you're configuring in this guide:**

1. An **Entra App Registration** with Microsoft Graph permissions — gives the action an identity to call Purview.
2. A **Workflow Repository** that hosts the reusable scan workflow — referenced by every repo that needs scanning.
3. **GitHub Secrets** (org-level) holding the Entra credentials and an optional state-repo token.
4. An **Organization Ruleset** that makes the Purview scan a required check on protected branches.

At runtime, a push or PR triggers the workflow, the action authenticates to Entra, scans the changed files, and posts the results to Purview.

---

## 1. Prerequisites

Before you begin, make sure you have the following:
Expand Down Expand Up @@ -83,7 +122,7 @@ Choose one of the following authentication methods for the action to authenticat
3. Provide a description (e.g., `Purview GitHub Action`) and select an expiration period.
4. Click **Add**.
5. Copy the **Value** of the newly created secret immediately — it will not be shown again.
6. Store this value as a GitHub secret named `AZURE_CLIENT_SECRET` (see [Step 5](#5-add-github-secrets)).
6. Store this value as a GitHub secret named `AZURE_CLIENT_SECRET` (see [Step 5](#5-add-github-secrets)), then pass it via the `client-secret` action input.

---

Expand Down Expand Up @@ -138,7 +177,7 @@ Each organization that uses the Purview action needs the following secrets confi
| `AZURE_CLIENT_ID` | The **Application (client) ID** from your Entra App Registration | Yes |
| `AZURE_TENANT_ID` | The **Directory (tenant) ID** from your Entra App Registration | Yes |
| `AZURE_CLIENT_CERTIFICATE` | Full PEM file contents (private key + certificate) — **only if using certificate auth** | Conditional |
| `AZURE_CLIENT_SECRET` | The **Client Secret** value from your Entra App Registration — **only if using client-secret auth** | Conditional |
| `AZURE_CLIENT_SECRET` | The **Client Secret** value from your Entra App Registration — **only if using client-secret auth** (passed via the `client-secret` action input) | Conditional |
| `STATE_REPO_TOKEN` | A **Personal Access Token** or **Fine-Grained Token** with `contents:write` to the workflow repository (for state tracking) | Optional |

3. Set the **Repository access** policy to grant access to the repositories that will run the workflow.
Expand Down Expand Up @@ -344,8 +383,6 @@ jobs:
uses: PersonalPurview/purview-github-action@main
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
# ── (Client-Secret auth only) Uncomment the line below ──
# AZURE_CLIENT_SECRET: ${{ secrets.AZURE_CLIENT_SECRET }}
with:
# ══════════════════════════════════════
# Required inputs
Expand All @@ -356,9 +393,9 @@ jobs:
# ── Authentication (choose one) ──
# Certificate auth: provide the PEM secret
client-certificate: ${{ secrets.AZURE_CLIENT_CERTIFICATE }}
# Client-secret auth: remove the line above and pass the secret via the
# AZURE_CLIENT_SECRET env variable (see env section above).
# OIDC federated auth: remove the line above and configure federated
# Client-secret auth: remove the line above and uncomment the line below
# client-secret: ${{ secrets.AZURE_CLIENT_SECRET }}
# OIDC federated auth: remove both lines above and configure federated
# credentials in your Entra App Registration (see Appendix).

# ══════════════════════════════════════
Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,7 @@ For each commit author, the action checks the email against the `users` array. I
|-------|-------------|----------|---------|
| `client-id` | Azure AD application client ID | Yes | - |
| `client-certificate` | PEM containing private key + certificate for certificate-based auth. If omitted, uses GitHub OIDC federated credentials. | No | - |
| `client-secret` | Azure AD application client secret for secret-based auth. If omitted, uses GitHub OIDC federated credentials. | No | - |
| `tenant-id` | Azure AD tenant ID | Yes | - |
| `users-json-path` | Path to `users.json` in the workflow-definition repo (relative to repo root). In cross-repo workflows the file is fetched via the GitHub API using `state-repo-token`. | No | `users.json` |
| `purview-account-name` | Name of the Purview account | No | - |
Expand Down
3 changes: 3 additions & 0 deletions action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,9 @@ inputs:
client-certificate:
description: 'Optional PEM containing private key + certificate. If provided, certificate auth is used (takes priority over client-secret). If omitted, falls back to client-secret or GitHub OIDC federated credentials.'
required: false
client-secret:
description: 'Azure AD application client secret. If provided (and client-certificate is not), client-secret auth is used. If omitted, falls back to GitHub OIDC federated credentials.'
required: false
tenant-id:
description: 'Azure AD tenant ID'
required: true
Expand Down
26 changes: 10 additions & 16 deletions dist/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -62044,7 +62044,7 @@ class PurviewClient {
}
this.logger.info(`Processing content asynchronously.`);
const endpoint = `${this.baseUrl}/security/dataSecurityAndGovernance/processContentAsync`;
let payloadString = JSON.stringify(payload, this.jsonReplacer);
let payloadString = JSON.stringify(payload);
try {
const result = await this.retryHandler.executeWithRetry(async () => this.sendRequest(endpoint, payloadString, 'POST', {}, 'ProcessContentAsync'), 'ProcessContentAsync');
return result;
Expand All @@ -62060,7 +62060,7 @@ class PurviewClient {
}
this.logger.info(`Processing content for user ${userId} (mode: ${inline ? 'inline' : 'offline'})`);
const endpoint = `${this.baseUrl}/users/${userId}/dataSecurityAndGovernance/processContent`;
const payloadString = JSON.stringify(request, this.jsonReplacer);
const payloadString = JSON.stringify(request);
const additionalHeaders = {};
if (scopeIdentifier) {
additionalHeaders['If-None-Match'] = scopeIdentifier;
Expand All @@ -62083,7 +62083,7 @@ class PurviewClient {
}
this.logger.debug(`Uploading signal for ${payload.contentMetadata.contentEntries[0]?.identifier}`);
const endpoint = `${this.baseUrl}/users/${payload.userId}/dataSecurityAndGovernance/activities/contentActivities`;
let payloadString = JSON.stringify(payload, this.jsonReplacer);
let payloadString = JSON.stringify(payload);
try {
const result = await this.retryHandler.executeWithRetry(async () => this.sendRequest(endpoint, payloadString, 'POST', {}, 'UploadSignal'), 'UploadSignal');
return result;
Expand All @@ -62099,7 +62099,7 @@ class PurviewClient {
}
this.logger.info(`Searching tenant protection scope`);
const endpoint = `${this.baseUrl}/security/dataSecurityAndGovernance/protectionScopes/compute`;
let payloadString = JSON.stringify(payload, this.jsonReplacer);
let payloadString = JSON.stringify(payload);
try {
const result = await this.retryHandler.executeWithRetry(async () => this.sendRequest(endpoint, payloadString, 'POST', {}, 'SearchTenantProtectionScope'), 'SearchTenantProtectionScope');
const scopeCount = result.data?.value?.length ?? 0;
Expand All @@ -62117,7 +62117,7 @@ class PurviewClient {
}
this.logger.info(`Searching protection scope for user ${userId}`);
const endpoint = `${this.baseUrl}/users/${userId}/dataSecurityAndGovernance/protectionScopes/compute`;
let payloadString = JSON.stringify(payload, this.jsonReplacer);
let payloadString = JSON.stringify(payload);
try {
const result = await this.retryHandler.executeWithRetry(async () => this.sendRequest(endpoint, payloadString, 'POST', {}, 'SearchUserProtectionScope'), 'SearchUserProtectionScope');
const scopeCount = result.data?.value?.length ?? 0;
Expand Down Expand Up @@ -62240,13 +62240,6 @@ class PurviewClient {
throw error;
}
}
jsonReplacer(_key, value) {
// Remove sensitive data from logs
if (typeof value === 'string' && value.length > 1000) {
return value.substring(0, 100) + '... [truncated in logs]';
}
return value;
}
buildErrorResponse(error) {
const message = error instanceof Error ? error.message : 'Unknown error';
const statusCode = error?.statusCode;
Expand Down Expand Up @@ -64993,6 +64986,7 @@ class GitHubActionsRunner {
// by getWorkflowRun belongs to the *external* workflow-definition repo,
// not the target repo. listWorkflowRuns would 404 in that case.
const perPage = 10;
const maxRuns = 20;
let page = 1;
let totalFetched = 0;
while (true) {
Expand All @@ -65017,7 +65011,7 @@ class GitHubActionsRunner {
}
totalFetched += runs.workflow_runs.length;
this.logger.info(`Checked ${totalFetched} run(s) so far (${matchingRuns.length} matched workflow), no match in commit list yet`);
if (totalFetched >= runs.total_count) {
if (totalFetched >= runs.total_count || totalFetched >= maxRuns) {
break;
}
page++;
Expand Down Expand Up @@ -65179,11 +65173,11 @@ async function validateInputs() {
if (clientCertificatePem) {
validateClientCertificatePem(clientCertificatePem);
}
// Client secret is read from the AZURE_CLIENT_SECRET environment variable.
// Client secret is read from the client-secret action input.
// Certificate auth takes priority when both are provided.
const clientSecret = (process.env['AZURE_CLIENT_SECRET'] || '').trim() || undefined;
const clientSecret = (getInput('client-secret', { required: false }) || '').trim() || undefined;
if (clientCertificatePem && clientSecret) {
logger.info('Both client-certificate and AZURE_CLIENT_SECRET are provided; certificate auth takes priority.');
logger.info('Both client-certificate and client-secret are provided; certificate auth takes priority.');
}
// Get optional inputs
const filePatterns = getInput('file-patterns') || '**';
Expand Down
2 changes: 2 additions & 0 deletions sample/.github/workflows/purview-scan.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ on:

permissions:
id-token: write
# To comment on a commit this needs to be contents: write
contents: read
pull-requests: write
actions: read
Expand Down Expand Up @@ -34,6 +35,7 @@ jobs:

# Authentication (omit to use GitHub OIDC federated credentials)
client-certificate: ${{ secrets.AZURE_CLIENT_CERTIFICATE }}
# client-secret: ${{ secrets.AZURE_CLIENT_SECRET }} # Use instead of client-certificate for secret-based auth

# User mapping
users-json-path: 'users.json'
Expand Down
1 change: 1 addition & 0 deletions sample/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ See [.github/workflows/purview-scan.yml](.github/workflows/purview-scan.yml). It
| Parameter | Description | Default |
|-----------|-------------|---------|
| `client-certificate` | PEM containing private key + certificate. If omitted, GitHub OIDC federated credentials are used. | — |
| `client-secret` | Azure AD application client secret. If omitted, GitHub OIDC federated credentials are used. | — |
| `users-json-path` | Path to `users.json` (relative to workspace root) that maps commit author emails to Azure AD user IDs. | `users.json` |
| `file-patterns` | Comma-separated glob patterns of files to scan. | `**` |
| `exclude-patterns` | Comma-separated glob patterns of files to exclude. | `**/.git/**` |
Expand Down
18 changes: 5 additions & 13 deletions src/api/purviewClient.ts
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ export class PurviewClient {
this.logger.info(`Processing content asynchronously.`);

const endpoint = `${this.baseUrl}/security/dataSecurityAndGovernance/processContentAsync`;
let payloadString: string = JSON.stringify(payload, this.jsonReplacer);
let payloadString: string = JSON.stringify(payload);

try {
const result = await this.retryHandler.executeWithRetry(
Expand All @@ -78,7 +78,7 @@ export class PurviewClient {
this.logger.info(`Processing content for user ${userId} (mode: ${inline ? 'inline' : 'offline'})`);

const endpoint = `${this.baseUrl}/users/${userId}/dataSecurityAndGovernance/processContent`;
const payloadString: string = JSON.stringify(request, this.jsonReplacer);
const payloadString: string = JSON.stringify(request);

const additionalHeaders: Record<string, string> = {};
if (scopeIdentifier) {
Expand Down Expand Up @@ -109,7 +109,7 @@ export class PurviewClient {
this.logger.debug(`Uploading signal for ${payload.contentMetadata.contentEntries[0]?.identifier}`);

const endpoint = `${this.baseUrl}/users/${payload.userId}/dataSecurityAndGovernance/activities/contentActivities`;
let payloadString: string = JSON.stringify(payload, this.jsonReplacer);
let payloadString: string = JSON.stringify(payload);

try {
const result = await this.retryHandler.executeWithRetry(
Expand All @@ -132,7 +132,7 @@ export class PurviewClient {
this.logger.info(`Searching tenant protection scope`);

const endpoint = `${this.baseUrl}/security/dataSecurityAndGovernance/protectionScopes/compute`;
let payloadString: string = JSON.stringify(payload, this.jsonReplacer);
let payloadString: string = JSON.stringify(payload);

try {
const result = await this.retryHandler.executeWithRetry(
Expand All @@ -158,7 +158,7 @@ export class PurviewClient {
this.logger.info(`Searching protection scope for user ${userId}`);

const endpoint = `${this.baseUrl}/users/${userId}/dataSecurityAndGovernance/protectionScopes/compute`;
let payloadString: string = JSON.stringify(payload, this.jsonReplacer);
let payloadString: string = JSON.stringify(payload);

try {
const result = await this.retryHandler.executeWithRetry(
Expand Down Expand Up @@ -304,14 +304,6 @@ export class PurviewClient {
}
}

private jsonReplacer(_key: string, value: any): any {
// Remove sensitive data from logs
if (typeof value === 'string' && value.length > 1000) {
return value.substring(0, 100) + '... [truncated in logs]';
}
return value;
}

private buildErrorResponse(error: unknown): ApiResponse {
const message = error instanceof Error ? error.message : 'Unknown error';
const statusCode = (error as any)?.statusCode as number | undefined;
Expand Down
3 changes: 2 additions & 1 deletion src/runner/gitHubActionsRunner.ts
Original file line number Diff line number Diff line change
Expand Up @@ -619,6 +619,7 @@ export class GitHubActionsRunner {
// by getWorkflowRun belongs to the *external* workflow-definition repo,
// not the target repo. listWorkflowRuns would 404 in that case.
const perPage = 10;
const maxRuns = 20;
let page = 1;
let totalFetched = 0;

Expand Down Expand Up @@ -649,7 +650,7 @@ export class GitHubActionsRunner {
totalFetched += runs.workflow_runs.length;
this.logger.info(`Checked ${totalFetched} run(s) so far (${matchingRuns.length} matched workflow), no match in commit list yet`);

if (totalFetched >= runs.total_count) {
if (totalFetched >= runs.total_count || totalFetched >= maxRuns) {
break;
}

Expand Down
6 changes: 3 additions & 3 deletions src/validation/inputValidator.ts
Original file line number Diff line number Diff line change
Expand Up @@ -151,11 +151,11 @@ export async function validateInputs(): Promise<ActionConfig> {
validateClientCertificatePem(clientCertificatePem);
}

// Client secret is read from the AZURE_CLIENT_SECRET environment variable.
// Client secret is read from the client-secret action input.
// Certificate auth takes priority when both are provided.
const clientSecret = (process.env['AZURE_CLIENT_SECRET'] || '').trim() || undefined;
const clientSecret = (core.getInput('client-secret', { required: false }) || '').trim() || undefined;
if (clientCertificatePem && clientSecret) {
logger.info('Both client-certificate and AZURE_CLIENT_SECRET are provided; certificate auth takes priority.');
logger.info('Both client-certificate and client-secret are provided; certificate auth takes priority.');
}

// Get optional inputs
Expand Down
11 changes: 5 additions & 6 deletions tests/validation/inputValidator.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ describe('inputValidator', () => {
const defaults: Record<string, string> = {
'client-id': validGuid,
'client-certificate': '',
'client-secret': '',
'tenant-id': validTenantId,
'purview-account-name': 'test-account',
'purview-endpoint': 'https://graph.microsoft.com/v1.0',
Expand Down Expand Up @@ -72,7 +73,6 @@ describe('inputValidator', () => {
jest.clearAllMocks();
delete process.env['GITHUB_WORKFLOW_REF'];
delete process.env['GITHUB_WORKSPACE'];
delete process.env['AZURE_CLIENT_SECRET'];
fs.writeFileSync(usersJsonPath, JSON.stringify(validUsersJson), 'utf-8');
setupInputMocks();
});
Expand Down Expand Up @@ -178,13 +178,13 @@ describe('inputValidator', () => {
await expect(validateInputs()).rejects.toThrow(/state-repo-branch.*state-repo-token/);
});

it('reads AZURE_CLIENT_SECRET from environment variable', async () => {
process.env['AZURE_CLIENT_SECRET'] = 'my-super-secret';
it('reads client-secret from action input', async () => {
setupInputMocks({ 'client-secret': 'my-super-secret' });
const config = await validateInputs();
expect(config.clientSecret).toBe('my-super-secret');
});

it('sets clientSecret to undefined when AZURE_CLIENT_SECRET is not set', async () => {
it('sets clientSecret to undefined when client-secret is not set', async () => {
const config = await validateInputs();
expect(config.clientSecret).toBeUndefined();
});
Expand All @@ -198,8 +198,7 @@ describe('inputValidator', () => {
'MIIBvAIBADANBgk...',
'-----END PRIVATE KEY-----',
].join('\n');
setupInputMocks({ 'client-certificate': validPem });
process.env['AZURE_CLIENT_SECRET'] = 'my-secret';
setupInputMocks({ 'client-certificate': validPem, 'client-secret': 'my-secret' });
const config = await validateInputs();
expect(config.clientCertificatePem).toBe(validPem);
expect(config.clientSecret).toBe('my-secret');
Expand Down
Loading