05 August 2003

Windows XP filename sorting

I don’t know if you’ve noticed, but Windows XP’s Windows Explorer sorts filenames differently from previous versions. I certainly hadn’t noticed until someone pointed it out to me…

Let’s say for example we have filenames which look like:

    filename 1
    filename 2 foo
    filename 10
    filename 11 foo
    filename 0
    filename -42
    

In a normal sort, and in fact in older versions of Windows Explorer, the sorted output would be:

    filename -42
    filename 0
    filename 1
    filename 10
    filename 11 foo
    filename 2 foo
    

Whereas with Windows XP you get:

    filename -42
    filename 0
    filename 1
    filename 2 foo
    filename 10
    filename 11 foo
    

Notice how the numbers are now sorted in a little more sensible an order? I suggest that this has been done internally by expanding the numbers out to an arbitrary length, sorting, and then displaying the version from before expansion. My sample implementation is something along the lines of:

    #include <stdio.h>
    
    // Expand numbers in strings so that numberic sorting is sexy
    
    int main(int argc, char *argv[])
    {
      char input[1024];
      int i, num, numvalid;
    
      // We read strings in from stdin, and assume that each string is less
      // than 1024 characters long. Expanded strings are written to stdout
      while(fgets(input, 1024, stdin) != NULL)
        {
          numvalid = 0;
          num = 0;
          for(i = 0; i < strlen(input); i++)
    	{
    	  if(!isdigit(input[i]))
    	    {
    	      if(numvalid != 0)
    		{
    		  printf("%08d", num);
    		  numvalid = 0;
    		  num = 0;
    		}
    	      printf("%c", input[i]);
    	    }
    	  else
    	    {
    	      num *= 10;
    	      num += input[i] - '0';
    	      numvalid = 1;
    	    }
    	}
    
          // We don't need to do any cleanup, as we know each line ends with a \n
          // if this wasn't the case, we'd include a test for num != 0 here...
        }
    }
    

Which gives the following output:

    filename -00000042
    filename 00000000
    filename 00000001
    filename 00000002 foo
    filename 00000010
    filename 00000011 foo
    

You can find this example code here.

Feel like you're working with monkeys?

You might be.

SCO

I've so far stopped myself from commenting on the who SCO thing because it went rapidly from being funny, so just annoying. I will however back down a little, and say that GROKLAW has a nice comment to make now that RedHat has stepped in.