I am using a lambda to modify the behaviour of sort.
sorted(list, key=lambda item:(item.lower(),len(item)))
Sorting a list containing the elements A1,A2,A3,A,B1,B2,B3,B
, the result is A,A1,A2,A3,B,B1,B2,B3
.
My expected sorted list would be A1,A2,A3,A,B1,B2,B3,B
.
I've already tried to include the len(item)
for sorting, which didn't work. How to modify the lambda so that the sort result is instead?
Here is one way to do it:
>>> import functools
>>> def cmp(s, t):'Alter lexicographic sort order to make longer keys go *before* any of their prefixes'ls, lt = len(s), len(t)if ls < lt: s += t[ls:] + 'x'elif lt < ls: t += s[lt:] + 'x'if s < t: return -1if s > t: return 1return 0>>> sorted(l, key=functools.cmp_to_key(cmp))
['A1', 'A2', 'A3', 'A', 'B1', 'B2', 'B3', 'B']
Traditionally, lexicographic sort order longer strings after their otherwise identical prefixes (i.e. 'abc' goes before 'abcd').
To meet your sort expectation, we first "fix-up" the shorter string by adding the remaining part of the longer string plus another character to make it the longer of the two:
compare abc to defg --> compare abcgx to defg
compare a to a2 --> compare a2x to a2
The functools.cmp_to_key() tool then converts the comparison function to a key function.
This may seem like a lot of work, but the sort expectations are very much at odds with the built-in lexicographic sorting rules.
FWIW, here's another way of writing it, that might or might not be considered clearer:
def cmp(s, t):'Alter lexicographic sort order to make longer keys go *before* any of their prefixes'for p, q in zip(s, t):if p < q: return -1if q < p: return 1if len(s) > len(t): return -1elif len(t) > len(s): return 1return 0
The logic is:
- Compare character by character until a different pair is found
- That differing pair determines the sort order in the traditional way
- If there is no differing pair, then longest input goes first.
- If there is no differing pair and the lengths are equal, the strings are equal.