短い音声ファイルの音声文字変換

このページでは、同期音声認識を使用して、短い音声ファイルをテキストに変換する方法を説明します。

同期音声認識は、短い音声（60 秒未満）で認識されたテキストを返します。

音声コンテンツは、ローカルファイルから Speech-to-Text に直接送信できます。また、Speech-to-Text では Cloud Storage バケットに保存された音声コンテンツを処理できます。同期音声認識リクエストの制限については、割り当てと上限のページをご覧ください。

始める前に

Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.

Google Cloud Console の [プロジェクトセレクタ] ページで、Google Cloud プロジェクトを選択または作成します。

プロジェクトセレクタに移動

Google Cloud プロジェクトで課金が有効になっていることを確認します。

Enable the Speech-to-Text APIs.

Enable the APIs

Make sure that you have the following role or roles on the project: Cloud Speech Administrator

Check for the roles

In the Google Cloud console, go to the IAM page.
Go to IAM
Select the project.
In the Principal column, find all rows that identify you or a group that you're included in. To learn which groups you're included in, contact your administrator.
For all rows that specify or include you, check the Role colunn to see whether the list of roles includes the required roles.

Grant the roles

In the Google Cloud console, go to the IAM page.
[IAM] に移動
プロジェクトを選択します。
[ アクセスを許可] をクリックします。
[新しいプリンシパル] フィールドに、ユーザー ID を入力します。これは通常、Google アカウントのメールアドレスです。
[ロールを選択] リストでロールを選択します。
追加のロールを付与するには、 [別のロールを追加] をクリックして各ロールを追加します。
[保存] をクリックします。
Install the Google Cloud CLI.
To initialize the gcloud CLI, run the following command:
```
gcloud init
```
Note: If you installed the gcloud CLI previously, make sure you have the latest version by running gcloud components update.

Google Cloud Console の [プロジェクトセレクタ] ページで、Google Cloud プロジェクトを選択または作成します。

プロジェクトセレクタに移動

Google Cloud プロジェクトで課金が有効になっていることを確認します。

Enable the Speech-to-Text APIs.

Enable the APIs

Make sure that you have the following role or roles on the project: Cloud Speech Administrator

Check for the roles

In the Google Cloud console, go to the IAM page.
Go to IAM
Select the project.
In the Principal column, find all rows that identify you or a group that you're included in. To learn which groups you're included in, contact your administrator.
For all rows that specify or include you, check the Role colunn to see whether the list of roles includes the required roles.

Grant the roles

In the Google Cloud console, go to the IAM page.
[IAM] に移動
プロジェクトを選択します。
[ アクセスを許可] をクリックします。
[新しいプリンシパル] フィールドに、ユーザー ID を入力します。これは通常、Google アカウントのメールアドレスです。
[ロールを選択] リストでロールを選択します。
追加のロールを付与するには、 [別のロールを追加] をクリックして各ロールを追加します。
[保存] をクリックします。
Install the Google Cloud CLI.
To initialize the gcloud CLI, run the following command:
```
gcloud init
```
Note: If you installed the gcloud CLI previously, make sure you have the latest version by running gcloud components update.

クライアントライブラリは、アプリケーションのデフォルト認証情報を使用することによって、Google API で簡単に認証を行い、これらの API にリクエストを送信できます。アプリケーションのデフォルト認証情報を使用すると、ベースとなるコードを変更することなく、ローカルでのアプリケーションのテストやアプリケーションのデプロイが可能です。詳しくは、クライアントライブラリを使用して認証するをご覧ください。

Create local authentication credentials for your user account:
```
gcloud auth application-default login
```

また、クライアントライブラリがインストールされていることを確認してください。

ローカルファイルで同期音声認識を実行する

ローカル音声ファイルに対して、同期音声認識を行う例を次に示します。

Python

from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech

def transcribe_file_v2(
    project_id: str,
    audio_file: str,
) -> cloud_speech.RecognizeResponse:
    # Instantiates a client
    client = SpeechClient()

    # Reads a file as bytes
    with open(audio_file, "rb") as f:
        content = f.read()

    config = cloud_speech.RecognitionConfig(
        auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
        language_codes=["en-US"],
        model="long",
    )

    request = cloud_speech.RecognizeRequest(
        recognizer=f"projects/{project_id}/locations/global/recognizers/_",
        config=config,
        content=content,
    )

    # Transcribes the audio into text
    response = client.recognize(request=request)

    for result in response.results:
        print(f"Transcript: {result.alternatives[0].transcript}")

    return response

リモートファイルで同期音声認識を実行する

Speech-to-Text API は、リクエストの本文で音声ファイルのコンテンツを送信しなくても、Cloud Storage にある音声ファイルに対して直接同期音声認識を実行できるようになっています。

Speech-to-Text は、サービスアカウントを使用して Cloud Storage 内のファイルにアクセスします。デフォルトでは、サービスアカウントは同じプロジェクト内の Cloud Storage ファイルにアクセスできます。

サービスアカウントのメールアドレスは次のとおりです。

service-PROJECT_NUMBER@gcp-sa-speech.iam.gserviceaccount.com

別のプロジェクトの Cloud Storage ファイルを音声文字変換するには、このサービスアカウントにもう一方のプロジェクトの Speech-to-Text サービスエージェントロールを付与します。

gcloud projects add-iam-policy-binding PROJECT_ID \
    --member=serviceAccount:service-PROJECT_NUMBER@gcp-sa-speech.iam.gserviceaccount.com \
    --role=roles/speech.serviceAgent

プロジェクトの IAM ポリシーの詳細については、プロジェクト、フォルダ、組織へのアクセス権の管理をご覧ください。

サービスアカウントにさらにきめ細かくアクセス権を付与するには、特定の Cloud Storage バケットへの権限を付与します。

gsutil iam ch serviceAccount:service-PROJECT_NUMBER@gcp-sa-speech.iam.gserviceaccount.com:admin \
    gs://BUCKET_NAME

Cloud Storage へのアクセスの管理の詳細については、Cloud Storage ドキュメントのアクセス制御リストの作成と管理をご覧ください。

Cloud Storage に保存されたファイルに対して同期音声認識を行う例を次に示します。

Python

from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech

def transcribe_gcs_v2(
    project_id: str,
    gcs_uri: str,
) -> cloud_speech.RecognizeResponse:
    """Transcribes audio from a Google Cloud Storage URI.

    Args:
        project_id: The GCP project ID.
        gcs_uri: The Google Cloud Storage URI.

    Returns:
        The RecognizeResponse.
    """
    # Instantiates a client
    client = SpeechClient()

    config = cloud_speech.RecognitionConfig(
        auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
        language_codes=["en-US"],
        model="long",
    )

    request = cloud_speech.RecognizeRequest(
        recognizer=f"projects/{project_id}/locations/global/recognizers/_",
        config=config,
        uri=gcs_uri,
    )

    # Transcribes the audio into text
    response = client.recognize(request=request)

    for result in response.results:
        print(f"Transcript: {result.alternatives[0].transcript}")

    return response

クリーンアップ

このページで使用したリソースについて、Google Cloud アカウントに課金されないようにするには、次の操作を行います。

Optional: Revoke the authentication credentials that you created, and delete the local credential file.
```
gcloud auth application-default revoke
```
Optional: Revoke credentials from the gcloud CLI.
```
gcloud auth revoke
```

コンソール

注意: プロジェクトを削除すると、次のような影響があります。

プロジェクト内のすべてのものが削除されます。このドキュメントのタスクで既存のプロジェクトを使用した場合、それを削除すると、そのプロジェクトで行った他の作業もすべて削除されます。
カスタムプロジェクト ID が失われます。このプロジェクトを作成したときに、将来使用するカスタムプロジェクト ID を作成した可能性があります。そのプロジェクト ID を使用した URL（たとえば、appspot.com）を保持するには、プロジェクト全体ではなくプロジェクト内の選択したリソースだけを削除します。

複数のアーキテクチャ、チュートリアル、クイックスタートを実施する予定がある場合は、プロジェクトを再利用すると、プロジェクトの割り当て上限を超えないようにすることができます。

In the Google Cloud console, go to the Manage resources page.

Go to Manage resources

In the project list, select the project that you want to delete, and then click Delete.

In the dialog, type the project ID, and then click Shut down to delete the project.

gcloud

注意: プロジェクトを削除すると、次のような影響があります。

プロジェクト内のすべてのものが削除されます。このドキュメントのタスクで既存のプロジェクトを使用した場合、それを削除すると、そのプロジェクトで行った他の作業もすべて削除されます。
カスタムプロジェクト ID が失われます。このプロジェクトを作成したときに、将来使用するカスタムプロジェクト ID を作成した可能性があります。そのプロジェクト ID を使用した URL（たとえば、appspot.com）を保持するには、プロジェクト全体ではなくプロジェクト内の選択したリソースだけを削除します。

Delete a Google Cloud project:

gcloud projects delete PROJECT_ID

次のステップ

同期認識については、リファレンスドキュメントをご覧ください。
ストリーミング音声を文字に変換する方法を学習する。
長い音声ファイルを文字に変換する方法を学習する。
Chirp を使用して、音声ファイルの音声文字変換を行う。
ベストプラクティスのドキュメントで、最高のパフォーマンスと精度を実現するための方法やヒントを確認する。

短い音声ファイルの音声文字変換

始める前に

Check for the roles

Grant the roles

Check for the roles

Grant the roles

ローカル ファイルで同期音声認識を実行する

Python

リモート ファイルで同期音声認識を実行する

Python

クリーンアップ

コンソール

gcloud

次のステップ

ローカルファイルで同期音声認識を実行する

リモートファイルで同期音声認識を実行する