Challenge: Bash the Quicksort

davidktw · Nov 28, 2022

Well I often hear leetcode challenges among some posts in this forum.
Perhaps it would be an interesting challenge to code a Quicksort implementation using Bash too ?

Other languages would be such a simple feat to implement Quicksort efficiently. Perhaps the challengers here can do just as good using Bash ?
The fun part has not arrived yet and there will be a reason why I have chosen Bash for this purpose.
Lets start simple first. Lets just implement a Quicksort in Bash first and see how far our HWZ Forum software engineers can achieve. (wink wink)

QUESTION
Implement a Quicksort algorithm using Bash where the implementation should use as much of Bash as possible. While there will be no restriction on how the implementation should be as long as it is written using Bash, challengers are free to use other utilities within the Linux ecosystem as long as the main sorting implementation is written purely in Bash. That means you should not using other programming languages or utilities that perform the actual sorting such as sort, or other programming tools like awk, perl, python etc.

INPUT

0 - N numeric values delimited by newline characters will be provided from standard IN stream.
Each numerical value can be in the integral range [-2^63,(2^63)-1]
Input is only limited to numerical values, string comparison is not required.
0 <= N <= 100000

OUTPUT

The sorted 0 - N numeric values, by ascending order (top down) will be printed to the standard OUT stream delimited by newline characters
Nothing should be printed if N == 0

SAMPLE

Code:

$ echo -ne '' | ./quicksort.sh
$ echo '' | ./quicksort.sh
$ echo '1' | ./quicksort.sh
1
$ seq 10 1 | ./quicksort.sh
1
2
3
4
5
6
7
8
9
10
$ (seq 10 1;seq 0 -10) | ./quicksort.sh
-10
-9
-8
-7
-6
-5
-4
-3
-2
-1
0
1
2
3
4
5
6
7
8
9
10

PERFORMANCE
Here we are also interested in performance. As every challenger system will differs, we will try to give a gauge to how good your implementation is
I will use sort as a measure which we can run on our system

Code:

$ time sort -n <quicksort.sample.test.data.10k.txt >/dev/null

real    0m0.008s
user    0m0.008s
sys    0m0.001s

Hence the performance of your implementation will be pegged relative to the execution time of the linux sort tool using the exact command line arguments as shown above.

By running a sample set of test data of 10,000 lines of random numerical input data which you can download from
https://mega.nz/file/3DBEmBgR#ttIXpIkLFq-vtkPe4S_ZhKAOYyX0r2-mqH60Zwc2K6gand running the following command

Code:

$ time ./quicksort.sh <quicksort.sample.test.data.10k.txt >/dev/null

real    0m5.924s
user    0m5.915s
sys    0m0.008s

That would means my implementation would be approximately 740.5x of the linux sort execution time. The lower this relative value is, the better your implementation is.

Here is another run against a sample set of 100,000 lines of random numerical input data which you can download from
https://mega.nz/file/mGhHxSST#Yqc3j_zVcwXsa597TnYqHSfEUXoWp9kLlz6BTPbZx-Y
When you use the 100k lines data set on the linux sort utility as such

Code:

$ time sort -n <quicksort.sample.test.data.100k.txt >/dev/null

real    0m0.072s
user    0m0.052s
sys    0m0.020s

and upon your own Bash Quicksort implementation, you will get another relative performance.

Code:

$ time ./quicksort.sh <quicksort.sample.test.data.100k.txt >/dev/null

real    5m35.569s
user    5m35.444s
sys    0m0.080s

Mine would be 4660.68x times of linux sort utility timing
As you can see Bash is not exactly performant and as the input size get larger, it is just going to get ridiculously slower.

In case you are wondering how can Bash be so slow ?
These are the timings for the same algorithm written in Perl and ran against the same 2 test cases above.
The timings versus the linux sort utility are relatively 8.0x and 6.3x respectively

Code:

$ time <quicksort.sample.test.data.10k.txt ./quicksort.pl >/dev/null

real    0m0.064s
user    0m0.060s
sys    0m0.005s
$ time <quicksort.sample.test.data.100k.txt ./quicksort.pl >/dev/null

real    0m0.454s
user    0m0.446s
sys    0m0.008s

Perl is already slower than C, but given Perl is a JIT compiled programming language, it is much faster than Bash, which is an interpreted language, in comparison.

However what we are pursing here is creativity when working with constrained scenario and Bash have its quirks, hence it’s up to you to squeeze as much out of it as possible.

We have not hit the limits yet, lets how the basic quicksort implementation progress (^.^)

Have fun coding and hopefully we still have software engineers that remember how to implement Quicksort instead of just knowing how to use.
Invitation to @peterchan75 @ng min teck @Trader11 @diminishin @project_00 and others!

Small hint: Remember how the input may vastly affect the performance of Quicksort ? What makes the worse case complexity of Quicksort ? Take note in your implementation.

davidktw · Nov 28, 2022

Okay lets have the 2nd part to this challenge ?
Suppose you have attempted the part 1 of this challenge in the post above using recursive approach, now try implement the iterative approach.
All recursive solution must have a non-recursive counterpart.

Hint: One possible technique is you need to know how to use a particular data structure.

davidktw · Nov 28, 2022

Here I will reveal the 3rd part to the question. First part of this challenge asked for a Bash solution.
No matter how you optimise your solution, you should very quickly observed that when dealing with large data set say anything between 100k to 1m inputs array size, the timing will be extremely slow no matter what you do and you will also observe the process is going to easily maxed out at near 100%. That's when you may need to venture into concurrent programming.

In Bash, there is no multithreading. Hence your only option is multiprocessing. If you don't even know what is the difference between these 2 approaches, you will want to read up first before you attempt. Using Bash, implement a concurrent quicksort approach with multiprocessing. You are not restricted to how you implement it except that the main sorting algorithm must be implemented in Bash.

For those that would like to implement it using other programming languages, feel free to do so as long as you are not using multithreading. In my opinion, it is trivial implementing it using multithreading. Able to do it using multiprocessing is where it matters because it will scale beyond just limited to one computing node. You can have 96 cores but multithreading alone will be hard to scale beyond a single node without further multiprocessing technique.

Lets see if we have SWEs loitering in this forum capable of implementing a multiprocessing quicksort technique within a single host first.

At @peterchan75, this is where the fun begin that test on the creativity of SWEs. Knowing your basic quicksort algorithm won't cut it. Even if you consult with online literature of using parallel programming techniques, you still need more than just academic knowledge to produce efficient concurrent implementation. I won't reveal first what problem one will face while implementing it and what techniques are at your disposal to manage them. Lets see what creative approaches we will find in this forum. (Honestly, I'm not too hopeful wink wink, but I could be wrong)

Happy Coding SWEs

peterchan75 · Nov 30, 2022

If I can just do this, why am I bother with quicksort.
Other than just an academic exercise, I don't see much practical needs for it.

Code:

@ticker_report_amount = sort { $a->[1] cmp $b->[1] || $a->[3] cmp $b->[3] } @ticker_report_amount;

davidktw · Nov 30, 2022

peterchan75 said:
If I can just do this, why am I bother with quicksort.
Other than just an academic exercise, I don't see much practical needs for it.

Code:

@ticker_report_amount = sort { $a->[1] cmp $b->[1] || $a->[3] cmp $b->[3] } @ticker_report_amount;

Because you are merely focusing on the exercise. The focus is not the exercise at all. I wouldn't have chosen Bash when I could have chosen Javascript/Java/Perl/Python/C and other languages for the same task.
I am only at the beginning of the problem. Hence you are missing the point about this being academic. I can assure you there is nothing academic about it besides it is based on a widely known sorting algorithm.

In the first place, I have already demonstrated that using Bash to implement Quicksort is extremely slow. It's not just the language itself, it's the constraint that it is to work with. Otherwise I would have told you to use the linux sort utility which is way faster than what you have written and much easier.
sort -n <inputfile.txt is all that it requires.

Before you even answer if you should be bothered, maybe you should ask if you could do it properly first.

peterchan75 · Nov 30, 2022

It depends on the nature of the data. If I am not wrong, data that are "somewhat" sorted, quicksort works great.

davidktw · Nov 30, 2022

peterchan75 said:
It depends on the nature of the data. If I am not wrong, data that are "somewhat" sorted, quicksort works great.

Looks like you don't understand quicksort.
https://www.geeksforgeeks.org/can-quicksort-implemented-onlogn-worst-case-time-complexity/

davidktw · Nov 30, 2022

It would seems so many engineers have return the knowledge back to the lecturers? Still no challengers for the basic quicksort implementation?

peterchan75 · Nov 30, 2022

davidktw said:
Looks like you don't understand quicksort.
https://www.geeksforgeeks.org/can-quicksort-implemented-onlogn-worst-case-time-complexity/

This may sound lame but my CIS course was many years. It was during the end of punch card and dumb terminal on IBM370. Anyway, to set the record straight, the correct answer is insertion sort for sorted or almost sorted data.

davidktw · Nov 30, 2022

peterchan75 said:
This may sound lame but my CIS course was many years. It was during the end of punch card and dumb terminal on IBM370. Anyway, to set the record straight, the correct answer is insertion sort for sorted or almost sorted data.

Yes insertion do exhibit this behaviour. It makes a good incremental sorting algorithm where your input can arrive sparsely, suspended at anytime over a period of time, and you can still get a fully sorted data structure at all times.

Heapsort also have such behaviour as the sorting is done during the insertion operation and at most O(logN) due to the completeness of the tree at any point of time.

None of these are academic because in real industrial works you can choose between calling a full sorting algorithm all the time, or you can amortise the sorting operations across your insertion/update/delete operations.

davidktw · Nov 30, 2022

davidktw said:
Yes insertion do exhibit this behaviour. It makes a good incremental sorting algorithm where your input can arrive sparsely, suspended at anytime over a period of time, and you can still get a fully sorted data structure at all times.

Heapsort also have such behaviour as the sorting is done during the insertion operation and at most O(logN) due to the completeness of the tree at any point of time.

None of these are academic because in real industrial works you can choose between calling a full sorting algorithm all the time, or you can amortise the sorting operations across your insertion/update/delete operations.

@peterchan75

Just to provide information too. I didn't come out with this challenge, writing solutions referring to any CS literatures while doing so. My last exercise coding a quicksort implementation on Java was roughly 20 years ago. The reason why I can still do it on a totally different language which has challenges on its own is simply I care about remembering these techniques and I believe they will be applicable in time to come. That's how much I value my craft. It's disappointing that I can even hear of undergraduates(having roughly 1 year from graduation) during interviews telling me they can't describe sufficiently how different data structures and/or sorting algorithms work because they learnt it in their 1st/2nd semester.

Honestly I'm not smarter than most genuine software engineers out there, but I probably care a lot about my craft to stay relevant in the industry.

Often enough, I do ask them. "Why do you tell me that you are from this university, study these courses and got this result but you cannot tell me adequately about what you learnt from there just months ago ?"

peterchan75 · Nov 30, 2022

davidktw said:
@peterchan75

Just to provide information too. I didn't come out with this challenge, writing solutions referring to any CS literatures while doing so. My last exercise coding a quicksort implementation on Java was roughly 20 years ago. The reason why I can still do it on a totally different language which has challenges on its own is simply I care about remembering these techniques and I believe they will be applicable in time to come. That's how much I value my craft. It's disappointing that I can even hear of undergraduates(having roughly 1 year from graduation) during interviews telling me they can't describe sufficiently how different data structures and/or sorting algorithms work because they learnt it in their 1st/2nd semester.

Honestly I'm not smarter than most genuine software engineers out there, but I probably care a lot about my craft to stay relevant in the industry.

Often enough, I do ask them. "Why do you tell me that you are from this university, study these courses and got this result but you cannot tell me adequately about what you learnt from there just months ago ?"

The past may not equal to the future. My uni mate in the NYC is a lot smarter than me. One day, he sent me a picture of USB mouse jiggler. I told him I have perl script that do just that. I don't want the PC to go into sleep mode when my scripts are running. Solution is created out of necessities. Sometimes, workers with the right attitude are a lot more desirable than smart worker. Of course, smart workers with the right attitude are the most desirable. An interview or two may not be able to identity those desirable workers. Now with google, idea and code resources are readily available and with the right attitude, there is no stopping ones' quest.

Anyway, I told you before about this low code thingy. At first glance, this may seem to be a straitjacket. But it instill standard and reduced complexity.

davidktw · Nov 30, 2022

peterchan75 said:
The past may not equal to the future. My uni mate in the NYC is a lot smarter than me. One day, he sent me a picture of USB mouse jiggler. I told him I have perl script that do just that. I don't want the PC to go into sleep mode when my scripts are running. Solution is created out of necessities. Sometimes, workers with the right attitude are a lot more desirable than smart worker. Of course, smart workers with the right attitude are the most desirable. An interview or two may not be able to identity those desirable workers. Now with google, idea and code resources are readily available and with the right attitude, there is no stopping ones' quest.

Anyway, I told you before about this low code thingy. At first glance, this may seem to be a straitjacket. But it instill standard and reduced complexity.

The past may not be equal to the future, but we are still eating rice after hundreds of years.

Of course solutions are created out of necessity, that is EXACTLY why going comfortable is counterproductive at times.

If google and information are readily available like you mention, then how come it becomes the latter generation can’t even tackle a fundamental question like this? Well that is because the readily available information is normally not the issue, the issue is with attitude and aptitude as you have mentioned.

There was no stopping decades back too, in fact there are more progress that differentiate in the past than what is going on now. Now is really reinventing the wheel more often than before if you look closely.

IMHO Low code is only applicable in some scenarios normally from the end-user perspective. It doesn’t work from the vendor perspective. In fact, from a commercial perspective, you get easily locked in to the product suite. Also to quote what you mentioned about necessity, low-code remove even more of that.

I am not in favour of low-code platforms. No doubt they lower the entry bar, but they also destroy a lot of opportunities. That is what I think.

peterchan75 · Nov 30, 2022

davidktw said:
The past may not be equal to the future, but we are still eating rice after hundreds of years.

Of course solutions are created out of necessity, that is EXACTLY why going comfortable is counterproductive at times.

If google and information are readily available like you mention, then how come it becomes the latter generation can’t even tackle a fundamental question like this? Well that is because the readily available information is normally not the issue, the issue is with attitude and aptitude as you have mentioned.

There was no stopping decades back too, in fact there are more progress that differentiate in the past than what is going on now. Now is really reinventing the wheel more often than before if you look closely.

IMHO Low code is only applicable in some scenarios normally from the end-user perspective. It doesn’t work from the vendor perspective. In fact, from a commercial perspective, you get easily locked in to the product suite. Also to quote what you mentioned about necessity, low-code remove even more of that.

I am not in favour of low-code platforms. No doubt they lower the entry bar, but they also destroy a lot of opportunities. That is what I think.

You are entitled to your opinion.

But low-code platform is not static. It's moving forward too. In some MNC, there career path for staff and principle staff engineers. These people like you who can become consultant within the company. Let the engineers do their fire fighting job.

davidktw · Nov 30, 2022

peterchan75 said:
You are entitled to your opinion.
But low-code platform is not static. It's moving forward too. In some MNC, there career path for staff and principle staff engineers. These people like you who can become consultant within the company. Let the engineers do their fire fighting job.

I am quite sure any product have consultants. My company have projects on low code because the customer ask for it too. So low code projects is not new to me. However I see the danger and cons of it, so nope it is not my go-to solution for versatility. And it is certainly not my recommendation for anyone starting their career roadmap in the industry due to the lack of necessity created in the process. In fact, good chances you might even need to retrofit so that you can fit into the product constraints and caveats.

Yes this is my personal opinion.

davidktw · Dec 1, 2022

Looks like getting software engineers to solve a quicksort problem might be a greater problem than it seems.
Let me showcase a lousy implementation to start the ball rolling then

Bash:

#!/usr/bin/env bash

function qs {
  local arr=("$@")
  if [[ ${#arr[@]} -le 1 ]]; then
    echo "${arr[@]}"
  else
    local pivot
    local temp
    local left
    local right
    local leftout
    local rightout

    left=0
    right=$((${#arr[@]} - 1))
    pivot=$(($RANDOM * $RANDOM % ${#arr[@]}))
    temp=${arr[$pivot]}
    arr[$pivot]=${arr[$right]}
    arr[$right]=$temp
    pivot=$right
    right=$(($right - 1))
    while [ $left -le $right ]
    do
      if [ ${arr[$left]} -lt ${arr[$pivot]} ]; then
        left=$(($left + 1))
      else
        temp=${arr[$left]}
        arr[$left]=${arr[$right]}
        arr[$right]=$temp
        right=$(($right - 1))
      fi
    done
    temp=${arr[$left]}
    arr[$left]=${arr[$pivot]}
    arr[$pivot]=$temp

    [[ $left -gt 0 ]] && leftout=(`qs "${arr[@]:0:$left}"`) || leftout=()
    [[ $(($pivot - $left)) -gt 0 ]] && rightout=(`qs "${arr[@]:$(($left + 1)):$(($pivot - $left))}"`) || rightout=()
    echo "${leftout[@]}" "${arr[$left]}" "${rightout[@]}"
  fi
}

if [ $# -eq 0 ]; then
  readarray -t arr
  output=`qs "${arr[@]}"`
else
  output=`qs "$@"`
fi
[ ${#arr[@]} -gt 0 ] && [ "z${arr[0]}" != "z" ] && printf '%d\n' ${output[@]}

How bad it is ? Using the same system that I timed the other implementation found in post #1, here is the performance for the implementation above.
It is 1773.25x worse than the linux sort utility on the same test data set.

Code:

$ time <quicksort.sample.test.data.10k.txt ./quicksort.sh >/dev/null

real    0m14.186s
user    0m12.900s
sys    0m2.219s

Hence what can be done further to achieve a better relative performance of 816.5x ?

diminishin · Dec 2, 2022

Bro, if got time I will give it a try, but must it be bash?

I think very hard to find younger programmers who are skilled in bash...most of us just use it to deploy our apps

davidktw · Dec 2, 2022

diminishin said:
Bro, if got time I will give it a try, but must it be bash?

I think very hard to find younger programmers who are skilled in bash...most of us just use it to deploy our apps

If you really find it so difficult to program in Bash, then sure feel free to do it in other languages. We will see how it goes.

Trader11 · Dec 2, 2022

Just for fun, I ask chat.ai bot to do it and see a few techniques that I don't see common

<CODE>
#!/bin/bash

# QuickSort algorithm
# Sorts the given array in place using the QuickSort algorithm
#
# Arguments:
# $1: The array to sort (passed as an indexed array)
#
# Returns:
# None (the array is sorted in place)

function quicksort() {
# Get the array to sort
local -n arr=$1

# Check if the array has fewer than 2 elements
if (( ${#arr[@]} < 2 )); then
return
fi

# Choose the first element as the pivot
local pivot=${arr[0]}

# Initialize the left and right arrays
local -a left
local -a right

# Partition the array into left and right subarrays
for (( i=1; i<${#arr[@]}; i++ )); do
if [[ ${arr[$i]} -lt $pivot ]]; then
left+=("${arr[$i]}")
else
right+=("${arr[$i]}")
fi
done

# Recursively sort the left and right subarrays
quicksort left
quicksort right

# Concatenate the left, pivot, and right arrays to form the sorted array
arr=("${left[@]}" "$pivot" "${right[@]}")
}

</CODE>

davidktw · Dec 2, 2022

Trader11 said:
Just for fun, I ask chat.ai bot to do it and see a few techniques that I don't see common

<CODE>
#!/bin/bash

# QuickSort algorithm
# Sorts the given array in place using the QuickSort algorithm
#
# Arguments:
# $1: The array to sort (passed as an indexed array)
#
# Returns:
# None (the array is sorted in place)

function quicksort() {
# Get the array to sort
local -n arr=$1

# Check if the array has fewer than 2 elements
if (( ${#arr[@]} < 2 )); then
return
fi

# Choose the first element as the pivot
local pivot=${arr[0]}

# Initialize the left and right arrays
local -a left
local -a right

# Partition the array into left and right subarrays
for (( i=1; i<${#arr[@]}; i++ )); do
if [[ ${arr[$i]} -lt $pivot ]]; then
left+=("${arr[$i]}")
else
right+=("${arr[$i]}")
fi
done

# Recursively sort the left and right subarrays
quicksort left
quicksort right

# Concatenate the left, pivot, and right arrays to form the sorted array
arr=("${left[@]}" "$pivot" "${right[@]}")
}

</CODE>

Does it produce the same output as what my question requires, and how does it perform?

Looking at the code, I just have to give it one data set to make it work like a snail. Did the chat.ai bot tells you that too?

Can you ask your chat.ai bot about it? lol

Can’t say it is a good try, but a noted try?
I definitely am expecting more from you, not functioning just as a liaison officer for your chat.ai bot.
In any case, appreciate the participation.

Challenge: Bash the Quicksort

Arch-Supremacy Member

Arch-Supremacy Member

Arch-Supremacy Member

Supremacy Member

Arch-Supremacy Member

Supremacy Member

Arch-Supremacy Member

Arch-Supremacy Member

Supremacy Member

Arch-Supremacy Member

Arch-Supremacy Member

Supremacy Member

Arch-Supremacy Member

Supremacy Member

Arch-Supremacy Member

Arch-Supremacy Member

Senior Member

Arch-Supremacy Member

Banned

Arch-Supremacy Member