The Whisper Script, Packaged

Overview

This is a script called whisper.

Takes .m4a and .mp3 files from a Dropbox directory
Has ChatGPT Whisper endpoint transcribe them
Has ChatGPT take the Whisper transcription, correct it and make paragraphs
Creates a text file in another Dropbox folder
Sends and email to Evernote

You need:

An OpenAI API key
Dropbox folders on your Mac
If you use the email functionality, you need SMTP credentials and a 'make to' address for Evernote (or you can comment this part out).

Instructions

You need to change directories and generally think for yourself a bit. If you're not comfortable editing scripts, this might not be for you.

Some notes:

These scripts work for me, but my home directory is /Users/yourUser so you need to modify that.
Get an OpenAI API key
Create the Credentials file
Modify the scripts
Run whisper on its own first. You might need to handle Python dependencies with pip. ChatGPT can help you.
When you run Whisper inside the Launchctl context, you might get lost. Note that it dumps log files in /tmp
This blog screws up formatting a bit. You can see the raw Evernote note here. It's Markdown.

Credentials

I store these in $HOME/.secure_env

They should look like this

export OPENAI_API_KEY='' # beware this MIGHT have a dash in it!
export SMTP_SERVER=''
export SMTP_USERNAME=''
export SMTP_PASSWORD=''
export SMTP_PORT=''
export SMTP_SENDER=''
export EVERNOTE_EMAIL=''

The Whisper Script

I stored this in ~/scripts/whisper. Ensure it's executable with chmod +x whisper.

#!/usr/bin/env python3

import datetime
from pathlib import Path
import argparse
from openai import OpenAI
import os, sys
import smtplib

client = OpenAI()

# NOTE: you need this in the environment
# export OPENAI_API_KEY='your key'
# BEWARE of the launch agent running from here
# ~/Library/LaunchAgents/com.confusionstudios.watchvoicedictationfolder.plist
# and that guy runs the whisper-wrapper to handle paths and environment variables

def get_smtp_credentials():
    # Retrieve SMTP server, port, username, and password from environment variables
    smtp_server = os.getenv("SMTP_SERVER")
    smtp_port = os.getenv("SMTP_PORT")
    smtp_username = os.getenv("SMTP_USERNAME")
    smtp_password = os.getenv("SMTP_PASSWORD")
    smtp_sender = os.getenv("SMTP_SENDER")
    return smtp_server, smtp_port, smtp_username, smtp_password, smtp_sender

def send_email_to_evernote(subject, body):
    recipient = os.getenv("EVERNOTE_EMAIL")

    # Get SMTP credentials
    smtp_server, smtp_port, smtp_username, smtp_password, smtp_sender = get_smtp_credentials()

    try:
        # Connect to the SMTP server
        smtp_server = smtplib.SMTP_SSL(smtp_server, smtp_port)
        smtp_server.login(smtp_username, smtp_password)

        # Compose the email message
        message = f"Subject: {subject} @Diary\n\n{body}"

        # Send the email
        smtp_server.sendmail(smtp_sender, recipient, message)

        # Close the connection
        smtp_server.quit()

        print("Email sent successfully.")
    except Exception as e:
        print(f"Error sending email: {e}")

def get_creation_time(file_path):
    return datetime.datetime.fromtimestamp(file_path.stat().st_ctime)

def transcribe_with_whisper_api(file_path):
    print(f"Sending up audio file {file_path.stem} to Whisper for transcription.")
    audio_file = open(file_path, "rb")
    transcription = 
client.audio.transcriptions.create
(
        model="whisper-1", file=audio_file, response_format="text"
    )

    print(f"Got the transcription, length: {len(transcription)}. Now sending to ChatGPT for post-processing.")

    completion = 
client.chat.completions.create
(
        model="gpt-3.5-turbo",
        messages=[
            {
                "role": "system",
                "content": "The following is a transcript. Please make paragraphs out of the transcript. Do not alter it but except to fix homonyms, spelling discrepancies and possible voice transcription mistakes. Feel free to adjust punctuation. Also note the spelling of MIDI Designer. Do not respond to the transcript! THE FOLLOWING IS THE RAW TRANSCRIPT:",
            },
            {
                "role": "user",
                "content": transcription,
            },
        ],
    )

    return completion.choices[0].message.content

def process_file(file_path, output_file, dry_run=False):
    if dry_run:
        print(f"Would transcribe {file_
path.name
} into {output_file}")
    else:
        print(f"Processing {file_
path.name
}...")
        transcript = transcribe_with_whisper_api(file_path)
        with open(output_file, "w") as f:
            f.write(transcript)
        print(f"Generated {output_
file.name
}")
        send_email_to_evernote(output_
file.name
, transcript)
        print(f"Sent Evernote Email with Subject {output_
file.name
}")

def process_files(source_dir, output_dir, dry_run=False):
    # Load list of already processed files
    transcribed_file_path = source_dir / 'transcribed-files.txt'
    if transcribed_file_path.exists():
        with transcribed_file_
path.open
('r') as file:
            processed_files = {line.strip() for line in file}
    else:
        processed_files = set()

    voice_files = list(source_dir.glob("*.m4a")) + list(source_dir.glob("*.mp3"))
    for file_path in voice_files:
        if file_
path.name
 in processed_files:
            print(f"Skipping {file_
path.name
}, already processed.")
            continue

        creation_time = get_creation_time(file_path)
        formatted_time = creation_time.strftime("%Y-%m-%d-%H-%M")
        old_filename_without_extension = file_path.stem

        if old_filename_without_extension.isdigit():
            output_file_name = f"{formatted_time}.txt"
        else:
            output_file_name = f"{formatted_time} - {old_filename_without_extension}.txt"

        output_file = output_dir / output_file_name

        process_file(file_path, output_file, dry_run)

        if not dry_run:
            # Add file to processed list and update file
            processed_files.add(file_
path.name
)
            with transcribed_file_
path.open
('a') as file:
                file.write(file_
path.name
 + '\n')

def parse_arguments():
    parser = argparse.ArgumentParser(
        description="Process voice dictations and generate text files."
    )
    parser.add_argument(
        "--dry-run", action="store_true", help="Print out actions without executing"
    )
    return parser.parse_args()

def main():
    args = parse_arguments()
    source_dir = Path("~/Dropbox/x-fer/voice-dictation").expanduser()
    if not os.path.exists(source_dir):
        print(f"The source directory {source_dir} does not exist.")
        sys.exit(1)

    output_dir = Path("~/Dropbox/x-fer/voice-dictation-output").expanduser()

    output_dir.mkdir(parents=True, exist_ok=True)

    process_files(source_dir, output_dir, dry_run=args.dry_run)

if __name__ == "__main__":
    main()

The Launchctl Command That Loads Whisper Wrapper

This is ~/Library/LaunchAgents/com.confusionstudios.watchvoicedictationfolder.plist

NOTE: You need to adjust paths for everything in this plist except for the /tmp outputs which do give great debugging information.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "
http://www.apple.com/DTDs/PropertyList-1.0.dtd
">
<plist version="1.0">
<dict>
    <key>Label</key>
    <string>com.confusionstudios.watchvoicedictationfolder</string>
    <key>ProgramArguments</key>
    <array>
        <string>/Users/yourUser/scripts/whisper-wrapper</string>
    </array>
    <key>WatchPaths</key>
    <array>
        <string>/Users/yourUser/Dropbox/x-fer/voice-dictation</string>
    </array>
    <key>RunAtLoad</key>
    <true/>
    <key>KeepAlive</key>
    <false/>
    <key>StandardOutPath</key>
    <string>/tmp/whisper-wrapper.out</string>
    <key>StandardErrorPath</key>
    <string>/tmp/whisper-wrapper.err</string>
</dict>
</plist>

The Wrapper Script

Launchctl is missing stuff from the environment so you need this wrapper script. Ensure it's executable with chmod +x whisper-wrapper

#!/bin/bash
source $HOME/.secure_env
/opt/homebrew/bin/python3 $HOME/scripts/whisper #note the location of python AND the script, you might have to adjust

Also: you need to name the Python executable path so it can pick up its dependencies.

Version 0.01h

Dan Rosenstark, author of MIDI Designer, on Tech & Music

Dan Rosenstark