# Sorting Multiple Columns When Text Processing

Tags:390-7

https://app.dataquest.io/m/390/text-processing/7/sorting-on-multiple-columns

In the example given for this section, why is the output
10,1,C,2,86
in the third line (as opposed to the first or second) after running the code

``````/home/learn\$ sort -t"," -k4,5 example_data_no_header.csv
``````

I expected it to be in the first or second line since, for the fourth column, 2<236 and 2<140.

2 Likes

As explained in the content -

For example, if we run `sort -t"," -k4,5 example_data_no_header.csv` , the command will look at the characters from the fourth and fifth columns as one field only.

So,

10,1,C,2,86

is seen as

10,1,C,286

by the `sort`.

Now, `sort` sorts that column in lexicographic order.

So, if you have values like

1, 2, 3, 4, 11, 12, 23

after sorting the above you would get -

1, 11, 12, 2, 23, 3, 4

The above was also explained in the previous Mission Step

So the first sorted value would be `14022` because `1` < `2`.

Second value would be `236224` because out of all values that begin with `2`, the second character `3` is the smallest. Then third would be `286` because out of all values that being with `2`, the second character `8` is higher than the second character `3` in `236224` and so on.

3 Likes

I understand now, thank you.

Hi ,can you clarify please how the sort command orders the numbers ,I understand the -k4,5 but sorting in lexographic order seems complicated to me.thanks

Yeah, it can be confusing.

When working with strings - alphabets specifically, lexicographic order is just alphabetical order.

However, when working with numbers, itβs the numerical order when looking at the digits of the numbers from left to right.

So, if you have the following numbers -

1, 2, 11, 3, 23, 22, 15, 31

Lexicographic order will be -

1, 11, 15, 2

Notice that we look at the first smallest digit, which is `1`. Then the second digit of numbers that have a second digit, and the smallest of the second digits is `1`, thatβs why the second number above is `11`.

Once we have exhausted all numbers starting with `1`, we move on to the next smallest digit, which is `2`.

Based on that this is the final outcome -

1, 11, 15, 2, 22, 23, 3, 31

Now, if there was a 3-digit number like `112`, then that would be taken into account as well. `112` would be βhigherβ than `11` but would be smaller than `15`. So the order of numbers in that case would be -

1, 11, 112, 15, 2, 22, 23, 3, 31

There is more to it but the above should suffice. In most cases, you will have functions that will sort it as per the order above for you and itβs not something to worry about. I have, honestly, not encountered this much myself to comment on its usefulness in data science/analysis.

But it should be noted, that when numbers are represented as strings, like -

[β1β, β11β, β15β, β2β, β22β, β23β, β3β, β31β]

Sorting a list of strings as above using those commonly available sorting functions, will sort them in lexicographic order. Sorting them as normal numerical values -

[1, 2, 11, 3, 23, 22, 15, 31]

will sort them in the numerical order you expect.

2 Likes

I also wonder about using sorting parameters.
Why do these commands produce different results?
`/home/dq\$ sort -t':' -k3,3 -k4,4 -g -r characters_no_header`
`/home/dq\$ sort -t':' -k3,3 -k4,4gr characters_no_header`