Friday, September 4, 2009

Little Mysteries

Some days have little mysteries.

I was happy to learn the bash trick ${0##*/} to skip using basename (don't need to add dependencies), and once I learned more about bash substring removal, it made perfect sense. $0 (or ${0}) is the script name, as called, often with leading directory information. You need curly braces for substring removal, so start with ${0}. Use # for stingy prefix matching (the smallest match from the start of the variable), and ## for greedy prefix matching (the largest match). So ##*/ is the largest match of any character than ends with a slash. Since / is the directory separator, that greedy match removes all leading directories from the script name ... same as basename, but should be faster.

OK, so I feel like I've learned something! A small trick, but it weans me from excessive sed and awk too.

Yesterday's neat trick was shoving all of my command-line arguments into an array. Why an array? I kept losing the quotes around a string with spaces (user comment) with unexpected script results. This is why I test my scripts! So I don't have to worry about shift eating the script input, I'm in the habit of storing that input in a variable for safe-keeping; I'm now tinkering with an array for that purpose. (Yes, the implicit shift of getopts can be overridden with OPTIND=1, but $ARGS is immune to other tactics like set too.)

# save the args

ARGS="$@"

# or put the args into an array, space-preserving

typeset -a ARGARRAY=("$@")

The ARGARRAY is great: although I lose the quotes from the command line, the user comment is a single element in the array so it's safe as long as I quote that variable when I use it. But back on the bash string replacement track: using the command-line input argument array, I quickly noticed that the flags starting with a dash (hyphen) disturb some string matching routines. So since I know about greedy string matching now, I thought this should work:

typeset -a UNDASHEDARGARRAY("${@##-}")

It doesn't work. It still doesn't work when I escape the hyphen UDAA=("${@##\-}"); either way, the result is just the same as the stingy removal of UDAA=("${@#\-}"): just the first dash goes away. Phooey.

The only approach I've found that works is to use the substring removal twice in a row.

typeset -a UDAA=("${@##\-}")

UDAA=("${UDAA[@]##\-}") (also works with single octothorpes instead of pairs)

What also works is the overkill of removing all hyphens, leading or trailing or internal, with either

typeset -a ONE=("${@//-}") TWO=("${@//-/}")

However, internal hyphens don't mess up string matching, and might be significant. One or two leading dashes indicate a flag, an option to the command; any other dashes might be useful.

So my little bash mystery today is why these two arrays are the same with --long-flags:

typeset -a FIRST=("${@#-}") SECOND=("${@##-}")

I don't like these mysteries, but I know when it's time to get back to work. I have a work-around, so I'll use it.

UPDATE 14:09: looked at the strip_leading_zero2 () example function in the Advanced Bash-Scripting Guide to strip possible leading zero(s), and came up with a dash-prefix-stripper that works in one operation:

shopt -s extglob

typeset -a UNDASHEDARGARRAY=("${@##+(-)}")

I'll ponder that one, and check extglob before set then unset it after.

No comments:

Post a Comment