For a recent tiny project I have to parse the output of a recursive "dir" windows command into a format that can be easily imported into a database for further examination. So I have to change this :

Directory of F:\XXXX

03/03/2010 11:39 AM 47,121 XXX.pdf

into :

"F:\XXXX";"XXX.pdf";"47121";"2010-03-03"

So I made this very simple perl script :

#!C:\\Perl\\bin\\perl
use strict;
use warnings;

my $dossier;
my $fichier;
my $taille;
my $date;
my $out;
my $i = 0;

open FICHIER,"< input.txt" or die "le fichier n'existe pas";
open CSV,"> output.csv" or die "le fichier ne peut pas être édité";

print "ouverture du fichier input.txt";

while (my $ligne = <FICHIER>) {
if ($ligne =~ /Directory of/) {
my @partie = split /\s/, $ligne;
$dossier = $partie3;
$dossier =~ s/\\/\\\\/g;
}
if ($ligne =~ /(\d\d)\/(\d\d)\/(\d\d\d\d)/ ) {
my @partie = split /\s+/, $ligne;
$date = "$3-$1-$2";
$taille = $partie3;
$taille =~ s/,//g;
$ligne =~ /^(.*) ($partie3) (.*)$/;
$fichier = $3;
$fichier =~ s/'/\'/g;

$out = "\"".$dossier."\";\"".$fichier."\";\"".$taille."\";\"".$date."\"\n";
print CSV $out;
}
$i++;
if ($i % 100000 == 0) {
print "$i\n";
}
}
print "fermeture du fichier input.txt";
close FICHIER;
close CSV;

I also meet some perl limitation during this project. It seem's that perl (on my computer) have difficulties to read files longer than 2 millions lines. It continue to parse but without any output. It's quite strange.

Once you generate your csv file you just have to import them into any database you like. Mine is 7,2 millions lines long ;-)