The Whisper Script, Packaged
Overview
This is a script called whisper
.
- Takes .m4a and .mp3 files from a Dropbox directory
- Has ChatGPT Whisper endpoint transcribe them
- Has ChatGPT take the Whisper transcription, correct it and make paragraphs
- Creates a text file in another Dropbox folder
- Sends and email to Evernote
You need:
- An OpenAI API key
- Dropbox folders on your Mac
- If you use the email functionality, you need SMTP credentials and a 'make to' address for Evernote (or you can comment this part out).
Instructions
You need to change directories and generally think for yourself a bit. If you're not comfortable editing scripts, this might not be for you.
Some notes:
- These scripts work for me, but my home directory is
/Users/yourUser
so you need to modify that. - Get an OpenAI API key
- Create the Credentials file
- Modify the scripts
- Run
whisper
on its own first. You might need to handle Python dependencies with pip. ChatGPT can help you. - When you run Whisper inside the Launchctl context, you might get lost. Note that it dumps log files in
/tmp
- This blog screws up formatting a bit. You can see the raw Evernote note here. It's Markdown.
Credentials
I store these in $HOME/.secure_env
They should look like this
export OPENAI_API_KEY='' # beware this MIGHT have a dash in it!
export SMTP_SERVER=''
export SMTP_USERNAME=''
export SMTP_PASSWORD=''
export SMTP_PORT=''
export SMTP_SENDER=''
export EVERNOTE_EMAIL=''
The Whisper Script
I stored this in ~/scripts/whisper
. Ensure it's executable with chmod +x whisper
.
#!/usr/bin/env python3
import datetime
from pathlib import Path
import argparse
from openai import OpenAI
import os, sys
import smtplib
client = OpenAI()
# NOTE: you need this in the environment
# export OPENAI_API_KEY='your key'
# BEWARE of the launch agent running from here
# ~/Library/LaunchAgents/com.confusionstudios.watchvoicedictationfolder.plist
# and that guy runs the whisper-wrapper to handle paths and environment variables
def get_smtp_credentials():
# Retrieve SMTP server, port, username, and password from environment variables
smtp_server = os.getenv("SMTP_SERVER")
smtp_port = os.getenv("SMTP_PORT")
smtp_username = os.getenv("SMTP_USERNAME")
smtp_password = os.getenv("SMTP_PASSWORD")
smtp_sender = os.getenv("SMTP_SENDER")
return smtp_server, smtp_port, smtp_username, smtp_password, smtp_sender
def send_email_to_evernote(subject, body):
recipient = os.getenv("EVERNOTE_EMAIL")
# Get SMTP credentials
smtp_server, smtp_port, smtp_username, smtp_password, smtp_sender = get_smtp_credentials()
try:
# Connect to the SMTP server
smtp_server = smtplib.SMTP_SSL(smtp_server, smtp_port)
smtp_server.login(smtp_username, smtp_password)
# Compose the email message
message = f"Subject: {subject} @Diary\n\n{body}"
# Send the email
smtp_server.sendmail(smtp_sender, recipient, message)
# Close the connection
smtp_server.quit()
print("Email sent successfully.")
except Exception as e:
print(f"Error sending email: {e}")
def get_creation_time(file_path):
return datetime.datetime.fromtimestamp(file_path.stat().st_ctime)
def transcribe_with_whisper_api(file_path):
print(f"Sending up audio file {file_path.stem} to Whisper for transcription.")
audio_file = open(file_path, "rb")
transcription =
client.audio.transcriptions.create
(
model="whisper-1", file=audio_file, response_format="text"
)
print(f"Got the transcription, length: {len(transcription)}. Now sending to ChatGPT for post-processing.")
completion =
client.chat.completions.create
(
model="gpt-3.5-turbo",
messages=[
{
"role": "system",
"content": "The following is a transcript. Please make paragraphs out of the transcript. Do not alter it but except to fix homonyms, spelling discrepancies and possible voice transcription mistakes. Feel free to adjust punctuation. Also note the spelling of MIDI Designer. Do not respond to the transcript! THE FOLLOWING IS THE RAW TRANSCRIPT:",
},
{
"role": "user",
"content": transcription,
},
],
)
return completion.choices[0].message.content
def process_file(file_path, output_file, dry_run=False):
if dry_run:
print(f"Would transcribe {file_
path.name
} into {output_file}")
else:
print(f"Processing {file_
path.name
}...")
transcript = transcribe_with_whisper_api(file_path)
with open(output_file, "w") as f:
f.write(transcript)
print(f"Generated {output_
file.name
}")
send_email_to_evernote(output_
file.name
, transcript)
print(f"Sent Evernote Email with Subject {output_
file.name
}")
def process_files(source_dir, output_dir, dry_run=False):
# Load list of already processed files
transcribed_file_path = source_dir / 'transcribed-files.txt'
if transcribed_file_path.exists():
with transcribed_file_
path.open
('r') as file:
processed_files = {line.strip() for line in file}
else:
processed_files = set()
voice_files = list(source_dir.glob("*.m4a")) + list(source_dir.glob("*.mp3"))
for file_path in voice_files:
if file_
path.name
in processed_files:
print(f"Skipping {file_
path.name
}, already processed.")
continue
creation_time = get_creation_time(file_path)
formatted_time = creation_time.strftime("%Y-%m-%d-%H-%M")
old_filename_without_extension = file_path.stem
if old_filename_without_extension.isdigit():
output_file_name = f"{formatted_time}.txt"
else:
output_file_name = f"{formatted_time} - {old_filename_without_extension}.txt"
output_file = output_dir / output_file_name
process_file(file_path, output_file, dry_run)
if not dry_run:
# Add file to processed list and update file
processed_files.add(file_
path.name
)
with transcribed_file_
path.open
('a') as file:
file.write(file_
path.name
+ '\n')
def parse_arguments():
parser = argparse.ArgumentParser(
description="Process voice dictations and generate text files."
)
parser.add_argument(
"--dry-run", action="store_true", help="Print out actions without executing"
)
return parser.parse_args()
def main():
args = parse_arguments()
source_dir = Path("~/Dropbox/x-fer/voice-dictation").expanduser()
if not os.path.exists(source_dir):
print(f"The source directory {source_dir} does not exist.")
sys.exit(1)
output_dir = Path("~/Dropbox/x-fer/voice-dictation-output").expanduser()
output_dir.mkdir(parents=True, exist_ok=True)
process_files(source_dir, output_dir, dry_run=args.dry_run)
if __name__ == "__main__":
main()
The Launchctl Command That Loads Whisper Wrapper
This is ~/Library/LaunchAgents/com.confusionstudios.watchvoicedictationfolder.plist
NOTE: You need to adjust paths for everything in this plist except for the /tmp outputs which do give great debugging information.
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "
http://www.apple.com/DTDs/PropertyList-1.0.dtd
">
<plist version="1.0">
<dict>
<key>Label</key>
<string>com.confusionstudios.watchvoicedictationfolder</string>
<key>ProgramArguments</key>
<array>
<string>/Users/yourUser/scripts/whisper-wrapper</string>
</array>
<key>WatchPaths</key>
<array>
<string>/Users/yourUser/Dropbox/x-fer/voice-dictation</string>
</array>
<key>RunAtLoad</key>
<true/>
<key>KeepAlive</key>
<false/>
<key>StandardOutPath</key>
<string>/tmp/whisper-wrapper.out</string>
<key>StandardErrorPath</key>
<string>/tmp/whisper-wrapper.err</string>
</dict>
</plist>
The Wrapper Script
Launchctl is missing stuff from the environment so you need this wrapper script. Ensure it's executable with chmod +x whisper-wrapper
#!/bin/bash
source $HOME/.secure_env
/opt/homebrew/bin/python3 $HOME/scripts/whisper #note the location of python AND the script, you might have to adjust
Also: you need to name the Python executable path so it can pick up its dependencies.
Version 0.01h