design+
code+
rant

Automated Data Entry with Python - Design Process & User Considerations.

TL;DR:

  • The task: to save time by automating a manual data-entry processing, while producing useful data for the end-user.
  • Desired output: a CSV-file containing metadata related to audio files in a specified folder. The point is to have an external source of metadata, which can then be used to enrich the related audio files as needed.
  • Automation tool: a python script.
  • Approach: extract and derive most of the data from the files themselves.
  • Considerations: some of the required data (a verbal file description) can’t easily be auto-generated and will have to be done manually (which is actually a good thing, as I will show).
  • Wins: a good 15-20 minutes saved for every CSV-file generated in this way.

Running a small business means dealing with all manner of tasks – some of which can be quite repetitive and dull. For my particular business (producing and selling sound effects), I needed to automate a certain task which comes up every time I produce a new collection of sounds: to take a bunch of audio files and make a list containing the filenames and a few other types of information, then deliver the files along with the list, to my customers.

To understand why, you need to know a little about my customers: Many of them are professional sound editors and designers working in film, TV, computer games, radio, etc. They are typically quite busy, working to tight deadlines, which means that they love saving a bit of time when they can. Sometimes these people will record their own sounds for the projects they work on, but this is time-consuming, often requires special recording gear, permits, contacts, etc. That sound professionals often lack one or more of these things, is the whole reason there is a market for pre-recorded sound effects such as I provide.

So what often happens, is that these sound-professionals will buy whatever sounds they need and add them to their personal collections, which can consist of many thousands of sounds covering every imaginable situation. This means another time-related bottle-neck: how to find just the right sound without having to listen to each and every one – every time. As a sound supplier, I can mitigate this problem through the quality of my product, by:

  • Naming the files in a way that is consistent and informative, as well as easy to read.
  • Embedding metadata in the files, which contains further information about their content.

Sound professionals typically use librarian software for browsing and tagging sounds. This software also allows them to read metadata embedded in the files. Problem is, people use different brands of librarian software, and even though most implement the industry standard on audio metadata (the Broadcast Wave Format developed by the European Broadcast Union), some only include a rudimentary set of its features, while others add proprietary functionality or simply interpret the standard in their own curious ways. Result: it becomes difficult to deliver metadata-enriched files that work for everyone.

My solution:

A Python script to ease the process of adding metadata to wav-files.

  • Embed only a bare minimum of metadata
  • Deliver the rest in an external file, which can be imported into various librarian softwares (CSV is a good candidate)

Currently, I have various ways of embedding metadata into the files, which work...OK.

Moving on to the external file. Here’s where the problem of saving time becomes my problem: how to write out a CSV-file with maybe 10 columns of data per file, for a collection of sounds with anywhere from dozens to hundreds of individual files, without going mad, falling asleep, losing one’s s#%t, etc? I used to do this by hand, or, rather, by way of the terrible text processing capabilities of spreadsheet software. Then I discovered python…

What I usually do, is name the files with a series of descriptive words in title-case and comma-delimited – like this: Wind,Breeze,Soothing.wav. This format is descriptive and suitable for chopping up into keywords, which are often used in the search engines of librarian software.

Basically I want three kinds of data:

1: Data derived from the file names.

Here, I want to:

  • Be able to extract the names of specific files as text strings,
  • then write these strings to a column named “FileName” in the CSV file,
  • then strip the file extension (“.wav”) from the strings and write the results to a column named “Keywords”,
  • write some kind of template text to a column named “Description”, which usually contains a more verbose description of the sound file.

2: File property data

In addition to the descriptive stuff, I also like to include technical data, such as duration of the file in minutes and seconds, sample rate and bit depth. This is kind of superfluous, as most librarian software would be able to read this straight off the file as is, but I like to include it, because I like to provide a track list as a form of pre-sales information about the product – a kind of declaration of contents, if you will (if a sound designer is looking for background ambiances of at least 3 minutes in length, she will want to know up front if the sounds deliver on that front).

3: Additional, recording-specific data

I usually include the name of the recordist (me), the month/year of recording, location (country), recording equipment used (some people care about that), and a few other things. This stuff can’t be extracted or derived from the files - except for the date, but I don’t need the kind of granularity you get from a timestamp, so I prefer to enter it here manually (or hard-code it if it never changes), along with anything else only I know about the recording circumstances.

Roughly three types of data but each has its purpose. What I need the script to do, then, is:

  • Extract file names and process the resulting strings
  • Extract technical data from the files, and, in some cases, convert the data to something which sound professionals will find meaningful.
  • Input certain data at script run-time (location, equipment, etc).

Python code breakdown

Without going too much in-depth, I will outline how the different parts of the script work toward my goal. I have included all my comments, since they sort of document my process. Note that this is one of the first scripts I have written from scratch for my own purpose, so it could almost certainly be refactored and improved quite a bit. For now, it does the job, though. The finished script is run from the command-line, by the way.

Setting folder path:

This could be a simple input() function, but I realized it would make for some rather crappy UX, so I added a regular expression to check that the input is correctly formatted, and if it is, that the path actually exists. If neither condition is passed, the user can either try again or just use the current working directory. Could be done in many other ways, I guess.

# Code to let the user input a folder path 
# and check that it is valid and exists
pathRegex = re.compile(r'[A-Z]:\\[a-zA-Z0-9]+')
toggle = False  
# Toggle to break out of while loop

while toggle == False:
    print('Enter folder path')   
    path = input()

    if pathRegex.match(path) and os.path.isdir(path):            
        print('Using ' + path)
        toggle = True
    else:
        print('Path is invalid or doesn\'t exist...')
        print('Use current working directory - y/n?')
        cwdVal = input()
        if cwdVal == 'y':
            path = os.getcwd()  # overwrite path with cwd
            print('Using CWD ' + path)
            toggle = True      
# end of path-checker code

Setting some variables and console messages:

Here, I initialize some of the variables later used to populate the CSV; I create a regex object I will need later and set up the console messages to help the user through the path and data-entry process at run-time.

# initializing various variables
num = 0                         # set in for loop
bitDepth = ''                   # get from file metadata
sampleRate = ''                 # get from file metadata
channels = ''                   # get from file metadata
duration = 0                    # get from file metadata
looping = ''                    # set with regex

# regex object to check if file is a loop
loopRegex = re.compile(r'\,Loop$')

# get name of parent folder and create namestring for the csv to generate
folderName = os.path.basename(path)
outFileName = folderName + '-Metadata.csv' 

# data entered by user at run-time
print('Enter recording location:')
location = input()             

print('Date of recording: Enter in preferred format:')
dateTime = input()              

recordist = 'Christian H From'  # default
rights = 'Minuscule Media'      # default

print('Enter mics used')
mics = input()               

print('Recording device: Enter device name:')
recorder = input()       

print('Getting other file data...')

Creating a CSV file

Using the os module, I create a CSV file (outputFile) at the specified path; then from the csv module, a writer object to allow writing to the file.

# create a csv file to hold the filenames and open it at path
outputFile = open(
                    os.path.normpath(path) + 
                    '\\' + 
                    outFileName, 'w', newline=''
                )
outputWriter = csv.writer(outputFile)
outputWriter.writerow([
                        'Num', 
                        'FileName', 
                        'Description', 
                        'Keywords', 
                        'Bit Depth', 
                        'Sample Rate', 
                        'Channels', 
                        'Duration', 
                        'Looping Y/N', 
                        'Location', 
                        'Date/Time', 
                        'Creator/Recordist', 
                        'Copyright', 
                        'Microphone', 
                        'Recorder' 
                        ])

For-loop pt 1: Getting file names

This is quite basic stuff, where the os module lets me get the file names from files in the specified path, and save them as text-strings, which I can then manipulate. I also give each file a number, which goes in a “File Number” column. Sometimes I edit sounds to work as seamless loops (especially favored in game audio, I hear), and I always put a “Loop” at the end of their file names. To show this in a CSV-column, I search the file name strings for “Loop” (using a little regular expression) and put an “Y” for yes in the column if it is indeed a loop. Then I create a string variable holding dummy text for the “Description” column.

# Loop over the files at path. 
# Any vars that get their values from the individual files 
# must be assigned inside the loop.
# Get only specific file type, should be wav. 
# Regex is avoided by using: os.path.join(path, '*.wav')

for file in glob.glob(os.path.join(path, '*.wav')):     

# On the files that match...     
    filename = os.path.basename(file)
	# ...give the file a number
	num = num + 1   
	# ...strip away file extension to get tags
    keywords = os.path.splitext(filename)[0]    
    
	# mark if file is looping
    if loopRegex.search(keywords):              
        looping = 'Y'        
    else:
        looping = 'N'    
    # Difficult to generalize, need other algo/interface for this
    desc = 'some description'         

For-loop pt 2: Processing wave-files

The sound files I produce are 24bit/96kHz .wav. High-definition stuff, but for certain technical reasons, this has become a bit of a standard for delivering professional sound effects. The wave module comes standard with python, and allows for various read/write methods on wave audio, so I’ll use that for extracting the technical data.

    waveFile = wave.open(file, 'r')
    bitDepth = 8 * waveFile.getsampwidth()
    sampleRate = waveFile.getframerate()
    channels = waveFile.getnchannels()
    frames = waveFile.getnframes()
    frameRate = waveFile.getframerate()
    duration = frames / frameRate
    duration = timeFormat(duration)     
    waveFile.close()
    outputWriter.writerow([
                        num, 
                        str(filename), 
                        str(desc), 
                        str(keywords), 
                        str(bitDepth), 
                        sampleRate, 
                        channels, 
                        duration, 
                        looping, 
                        location, 
                        dateTime, 
                        recordist, 
                        rights, 
                        mics, 
                        recorder
                        ])
outputFile.close()

Quirks of the wave:

It’s easy enough to get things like number of channels (getnchannels()) and sample rate (getframerate()), but bit depth and duration must be derived from the data given. Apparently, a sample-width of 2 equals 16-bit audio, which must mean that you multiply the sample-width by 8 to get the bit-rate. Duration or time-length of the file, can be found by dividing the number of frames (or samples) with the frame-rate (or sample-rate). With a sample-rate of 96 kHz, a second of audio is precisely 96.000 samples. Dividing the number of samples with the sample-rate, then, gives us the duration in seconds – as an integer. Better convert that to a more readable format, which the timeFormat() function takes care of:

def timeFormat(time):
    # seconds (ints) to hh:mm:ss formatter
    # this takes an int and formats it to the time format of hh:mm:ss
    # first calculate minutes and seconds
    time = int(time)  # round off any floats to ints
    hours = time // 3600
    time = time % 3600    # subtract whole hours from time
    minutes = time // 60  # returns whole minutes as remainder of time
    seconds = time % 60   # returns whole seconds as remainder of minutes

    hrStr = ''
    minStr = ''
    secStr = ''

    # format as hh:mm:ss string
	# using f-strings to add one leading zero when necessary
    if hours < 10:
        hrStr = f'{hours:02d}'      
    else:
        hrStr = str(hours)
        
    if minutes < 10:
        minStr = f'{minutes:02d}' 
    else:
        minStr = str(minutes)
        
    if seconds < 10:
        secStr = f'{seconds:02d}'
    else:
        secStr = str(seconds)
        
    duration = hrStr + ':' + minStr + ':' + secStr
    return duration

Closing down:

Finally, I give the user the option to open the finished CSV-file.

print('Done. Type "o" + ENTER to open CSV or type any key to exit.')
endCommand = input()
if endCommand == 'o':
    os.startfile(os.path.normpath(path) + '\\' + outFileName) 
	# using normpath to avoid having to escape backslashes
         in the path name

Some thoughts on automation

As I said in the beginning, not everything in this project lends itself to automation – at least not through a simple python script. The “Description” column is the tricky bit. It should contain a short but useful description of the sound, ideally adding to the information provided by the file name and the keywords. The whole idea behind browsing sound files by looking at textual metadata, is to avoid having to listen all they way through every single file.

A well-written description might tell me that a 5-minute recording of a quiet forest has a passage of heavy wind around the 4-minute mark, saving me the annoyance of wasting 4 minutes listening to one file. This means that a good description anticipates what the user will want to know. They might not care that a recording of busy traffic consists mostly of German cars, but they will certainly want to know if the traffic is moving; if it’s mostly mopeds, trucks or ox-carts. Not all the important information can be fitted into the file-name, and since the “Keywords”-column is derived from the file-name, it isn’t of much help.

The point here is, that a good description can only really be written by someone who has listened to the recording, and who understands what information is important or inconsequential to the end-user. This makes description-writing a creative, curatorial task, which requires experience and empathy for the user. Someday, an AI trained to my specs may be able to do this job, of course – who knows? I must admit, that I don’t enjoy writing descriptions very much, but at the moment it is a task which requires a knowledgeable human to take responsibility for the end-user experience, letting the (currently) mindless computer do the truly boring and repetitive stuff. In this day and age, I’m OK with that particular division of labor.

Technology:

  • Python 3.xx