Stroking Data

Well, massaging data, really. I’d rather just skip the massage and go directly to getting my data drunk, but that’s no way to form a lasting relationship. If you recall, I claimed at the end of the last post that my data looked a lot like this;

THEWIG~1 MP3 1,867,904 04-06-02 12:10p The Wiggles – Teddy Bear Hug.mp3
THEWIG~2 MP3 1,966,080 04-06-02 4:42p The Wiggles – The Monkey Dance.mp3
THEWIG~3 MP3 1,370,112 04-06-02 1:24p The Wiggles – Rock A Bye Bear.mp3
THEWIG~5 MP3 2,327,220 04-06-02 5:14p The Wiggles – Uncle Noah’s ark.mp3

It’s a damned lie. That data is neat, clean and ready to commit. It exists, mind you, but it’s embedded in much nastier data, stuff that looks so;

AFROM~19 MP3 4,975,989 08-23-01 11:07p AfroMan – Because I Got High.mp3
ALIENA~1 MP3 8,372,224 09-01-01 8:58a Alien Ant Farm – Smooth criminal.mp3
IRISHR~1 MP3 3,406,064 09-05-01 9:33p Irish Rovers – Finnegans Wake.mp3
SKANDA~8 MP3 5,148,967 04-03-00 12:07a Skandalous All-Stars – Radio Free Europe.mp3
JOESTR~1 MP3 4,270,834 09-06-01 10:01p Joe Strummer & The Mescaleros – Sandpaper Blues.mp3
DUBLIN~2 MP3 4,082,695 09-05-01 9:38p Dubliners & Pogues – Whiskey In The Jar.mp3
AFRIKA~6 MP3 6,153,323 03-30-00 4:17p Afrika Bambaataa & Soul Sonic Force – Planet Rock.mp3

Before, everything was lined up neatly, ready for slotting into Access. Now it almost lines up, but i’m not playing horseshoes or hand-grenades. Close isn’t good enough. Access will allow you to import data from a text file two different ways. You either specify the data in the file is of a fixed width or that it is delimited in some manner. Delimited means that some character, say Buddy Ebsen, pops up in between every piece of important data in a record. Each of these pieces is called a field. If the Afroman song above was delimited by Buddy Ebsen, it’d look like this;

AFROM~19 MP3Buddy Ebsen4,975,989Buddy Ebsen08-23-01 11:07pBuddy EbsenAfroManBuddy EbsenBecause I Got High.mp3

I’d be happy with the size of the file, 975,989, the name of the artist, Afroman, and the song, Because I Got High.mp3. Since Buddy isn’t here to assist me, I’ve got to figure out someway to put a delimiter into each of the 4000+ lines of this file. Let’s examine Afroman more closely;

AFROM~19 MP3 4,975,989 08-23-01 11:07p AfroMan – Because I Got High.mp3

My life is a little easier because I don’t want all the fields. I need to make the line above look like this;

4,975,989-AfroMan – Because I Got High.mp3

Once it does, the dash is the delimiter, and I am re-goldenized

Once again, doing it by hand is not an elegant solution, so that’s out. I do know of a way to do this, I think. And, just like yesterday, it can’t be done in windows. Or, if it can be done in Windows, I don’t know how to do it. And i’ve tried, believe me.

The first thing I do it is connect to one of the UNIX servers at work. UNIX can do anything. Once I’m there I cut and paste all 4000 or so records into a an open file. I used UNIX’s vi editor to create the file and I’ll use some of the functionality it has to start replacing data en masse. If you’d like a small taste of what vi is like, create a file in Notepad without ever using the mouse to do anything. This includes opening and closing Notepad. However, I can replace a lot of stuff with just a few keystrokes. These, in fact;

:1,$s/^.*MP3 // and :1,$s/ .*:… /-/ give me this

4,975,989-AfroMan – Because I Got High.mp3

Woo-hoo!

Tomorrow, why they do that.

Comments are closed.