CHAPTER 11
SORTING: ORDERING IT TO SEARCH FASTER Can you imagine trying to locate a word in a dictionary that is not sorted or put in order? You would have quite a bit of difficulty finding the word. Sorting makes searching not only efficient but in many applications possible. How does a computer sort the data? It is very similar to the way you sort a series of numbers or a deck of cards. There are many ways that the data can be sorted, therefore many algorithms are there to look into and analyze. Among various existing algorithms one algorithm may perform better than another one in certain circumstances. While one sort algorithm will do the job of sorting,it is important to understand various sorting techniques, trends and circumstances so that a proper algorithm can be chosen. The complexity of a sorting algorithm is measured in terms of speed (time), amount of space required for memory storage and/orits simplicity. The three categories of sorting techniques include exchange sort, selection sort and insertion sort. These categories are based on the movement and exchange of the two next items (exchange), selecting the next item (selection), and inserting the next item in the right position (insertion). In most of the sorting techniques, the unsorted data is already placed in an array and the same array is used to hold the sorted data so as not to waste memory. There are cases where an array is not used. The data may be stored in the files or even in some other data structure such as a linked list built by pointers. SORT IT YOURSELF: DO IT YOUR WAY
Imagine that there are 10 cards sitting next to each other on your desk and your task is to put them in order (sort it) from smallest to largest (ascending order). You manage to do it right away. How did you do that? Now, try to write down each step as you did the sort. Did you continuously compare each card with the next card and then exchange (swap) the cards? If not, did you pick up the smallest card and place it at the beginning? Or did you place each card in its spot as you performed the sort? If you didn’t follow any of above, then what did you do? Did you follow the same steps over and over? If I ask you to do it over would you be able to perform the same task? Be aware that others may have different approaches than you have in sorting the cards. No matter how the intermediate work is done the result should be the same- data is sorted. However one thing is common to every one. Data is scanned over and over. This is why we need a systematic approach to programming.
Now let’s consider what would have happened if all ten cards were already sorted except one card? While there are different techniques to sort a list of data, one technique may be preferable due to circumstances. This is why consideration of the three sorting approaches (exchange, selection, insertion) may be helpful. SORT ASCENDING OR DESCENDING
Data can be sorted in ascending or increasing order from smallest to largest. Alternatively data can be arranged in descending (decreasing) order from largest to smallest. Notice data can be numeric, alphabetic, or alphanumeric, such as:
111 678 999 1234
A M X a x
BYE GOOD HELLO
123BYT G482KR xyzcabbies
When the data is not numeric, two items are compared in terms of their ASCII character values. Remember, the ASCII value for ‘0’ is 48, A is 65, and lower ‘a’ is 97.
EXCHANGE SORTS
In order to sort a series of items, two items at a time are compared, if they are not in order (for example the first item is greater than second item) they are exchanged. However, if these two items are already in order, the exchange will not take place. After the comparison is completed, a conclusion can be drawn that the second item is not the smaller one. Then the next two items are compared and so forth. For each comparison, there would be the possibility of an exchange. You may have wondered why this family of sorting is called exchange sort. Among the family of exchange sorts are bubble sorts and shell sorts. Bubble sorts compare two adjacent cells (items) while shell sorts compare elements with adistance of more than 1. By the time entire array is scanned, we can conclude that the largest element is placed at the end of the array (alternatively, the smallest item finds itself at the beginning of the array).
EXCHANGE SORT EXAMPLE: PASS ONE
Given the ten unsorted cards listed below, what would happen if you compare the two adjacent items and exchange them when necessary until all cards are covered? Going through all the data is known as a pass. Since the larger data is moved one space (or slot) to the end each time by the end of the pass, the largest data will be placed at the end. Observe the unsorted data below and the data movement from the beginning to the end of (pass one). What do you conclude from this sorting approach (algorithm)?
(8)(3) (1) (5) (7) (10) (4) (9)(6) (2) (3)(8) (1) (8) (5) (8) (7) (8) (8) (10) (4) (10) (9) (10) (6) (10) (2) (10) (3)(1) (5) (7) (8) (4) (9) (6) (2) (10)You can observe that the largest data moved to the end and other items moved one position nearer to its sorted position (bubbling up). Original data (8)(3)(1)(5)(7)(10)(4)(9)(6)(2)
After the first pass sort (3)(1)(5)(7)(8)(4)(9)(6)(2)(10)
EXCHANGE SORT PROGRAM: FIRST PASS
Let us go easy and translate the algorithm to the programming code the way that we were viewing it by comparing and swapping the two items. The items are placed in an array called slot for simplicity. You can call it any other name if you don't like slot. Remember the first element of an array starts with zero. After comparison of first two items, a user defined swap function is called. The swap function passes the array and two indices of the array that should be exchanged with each other.
if (slot[0] > slot[1] ) swap(slot, 0,1); if (slot[1] > slot[2] ) swap(slot,1,2); .............. if (slot[n-2] >slot[n-1] ) swap(slot,n-2,n-1); Let us generalize this by using an index variable such as i.
if ( slot[i] > slot[i+1]) swap(slot, i, i+1)Let us apply a loop to move through all the slots (one pass) for ( i=0; i < n-1; i++) if ( slot[i] > slot[i+1]) swap(slot, i, i+1) The size of the data is We have given n. the value 10. In C/C++, the array index begins with 0, therefore, if 10 spaces are allocated for the array they are numbered from 0 to 9. If ni is 9 and we compare i with i+1, there would be an error since the array slot 10 is nonexistent. As a result, having sets the maximum value of i < n-1 to 8.
iEXCHANGE SORT WITH ALL THE PASSES (N PASSES)
With the first pass of the exchange sort only one item is sorted (e.g. largest value). Each slot is compared with the next slot and if it is not in order a swap will occur. Similarly with the second pass, the second largest value is put into the right place. Finally as the number of passes progress, the items gradually become sorted. for (pass=1 ; pass <=n; pass ++) { for ( scan=0; scan < n; scan++) if ( slot[scan] > slot[scan+1] ) swap(slot, scan, scan+1); } EXCHANGE SORT: ROOM FOR IMPROVEMENT
The heart of the exchange algorithm is for every pass to scan through the entire list and compare two items, swapping them if it is necessary. An improvement for the above algorithm would be,that if all of the data gets sorted except one item. That item will then be in the sorted place (sound philosophy). Therefore the number of passes will be one less than the total. Instead of using pass<=n you will use pass<n. On every pass, one item will find its sorted place, e.g. at the first pass the last item will be sorted which will be placed at the end of the list. Therefore at the second pass we don’t need to scan to the end of the list. Generally if the size of data is then on each pass there will be 1 less item to be scanned from the end of the list.The improvement will be instead of nscan<n to use scan<n-pass. In our example, n is 10. Therefore we need 10 passes. Do not be surprised in 9 passes (n-1) the data gets sorted. Looking at it philosophically, the last data in the last pass is automatically in the right place since everybody else is sitting in their right place (slot). We know that in the first pass the last data is ordered. In the second pass we know that the last data is already sorted and we do not have to bother to do anything but to exclude the last number. On the first pass will be scan <n-1, on the second pass will be scan <n-2. On the third pass we will have scan <n-3. As a result we can generalize it as scan <n - pass.
Consider more improvement. If there is no swap within a pass, then the list of items are already sorted. This will be explained later. EXCHANGE SORT PROGRAM: BUBBLE SORT
The following program demonstrates an exchange sort by having two loops, one and one if. Items will be ordered in ascending order and at each pass, two items will be compared and the smaller one will be swapped to the left. At the end of each pass the largest data will be pushed to the end. This kind of exchange sort is called swap since the smaller light data are bubbling up gradually.
bubble sort1. #include<iostream> 2. using namespace std; 3. swap(int slot[],int i, int j){ 4. int temp; 5. temp=slot[i]; 6. slot[i]=slot[j]; 7. slot[j]=temp; 8. return 0; 9. }//SWAP 10. main(){ 11. int scan,pass,n=10, i; 12. int slot[ ]={8,3,1,5,7,10,4,9,6,2}; 13. for(pass=1;pass<n;pass++){ 14. for(scan =0; scan <n-pass; scan ++){ 15. if(slot[scan ]>slot[scan +1]) 16. swap(slot,scan,scan+1); 17. }//end scan 18. }//end pass 19. cout<< " SORTED ITEMS: "<<endl; 20. for(i=0;i<n;i++) 21. cout<<slot[i]<<" "; 22. cout<<endl; 23. return 0; 24. }//MAIN
Figure 11.1a - Program to show an exchange sort
SORTED ITEMS: 1 2 3 4 5 6 7 8 9 10 Figure 11.1b - Output of Figure11.1a
EXCHANGE SORTS: SINK SORT, BUBBLE SORT, OR TURTLE SORTS
In an exchange sort, each time two items are compared the larger (heavy) item moves one level down to the bottom (to the right). At the end of each pass, the largest item will be placed at the bottom. In an exchange sort, the heaviest items are gradually sinking until sunk. This is known as a . Alternatively, in exchange sort, when the light items are gradually bubbling up and the lightest item is at the top, we have a sink sort. Due to the slow movement of items, gradually one level down or one level up, sink sort and bubble sort are also known as bubble sort. Note that often sink sort is referred to as bubble sort as well.
turtle sortsPROGRAM EXECUTION TIME
Every programming instruction (language construct) that is used in statements such as computation, assignment, comparison, or repetition has its own execution time due to its own hardware architecture. A program execution can be measured by the sum of the execution time of its instructions and elapsed time (the difference between start and ending time can be used to determine the execution time). However, an execution time of a program is related to how much data is used in the program. In other words, the size of the data is important in measuring the execution time. One way to measure program execution time is to look at the major expressions such as the number of comparisons or the number of exchanges in the program. The following code would be needed to compute the elapsed time for a bubble sort program.
#include HOW LONG DOES IT TAKE TO SORT
A sort program consists of at least two loops that compare two items. Based on the result of a comparison an exchange (swap) or shuffle (shift) will take place. The number of comparisons will differ depending on the size of the data and as a result the number of swaps or shuffles will be different. For example, if you are sorting 1000 items the number of comparisons will be higher than if you were sorting 10 items. Similarly the number of swaps or shuffles will be higher as the size of the data becomes larger. Therefore the execution time of a sort depends relatively on the order of size of the data - e.g. O(N). For example if the size of the data is 10 and there are two loops (nested), there are approximately 100 comparisons. Generally the number of comparisons for the size of will be n .
n^{2}CALCULATING NUMBER OF COMPARISON AND SWAP: BUBBLE SORT
How many comparisons and how many swaps are in a bubble sort? Roughly speaking we have n data, therefore n-1 comparisons, and for each comparison we might need an exchange (worst case). Therefore, for a data size of 10 items we need 9 comparisons and 9 exchanges (swaps). This is just for the first pass. How about the second pass? We need one less comparison and one less swap. For the second pass we need n-2, in third pass n-3, and finally n-9 which is 1. Add the numbers for each pass together we will have:
n-1 + n-2 + n-3 + n-4 + n-5 + n-6 + n-7 + n-8 + n-9It seems like the sum of the first n numbers already has a mathematical formula n(n+1)/2. For above example substituting the value in the formula we will have 9*(10)/2 which is 45 comparisons needed to sort a data of size 10. Similarly 45 swaps are needed in the worst case.
CALCULATING THE SUM OF THE FIRST N NUMBERS
Mathematicians have calculated the sum of the first numbers according to the following procedures: To find the sum of a series (9 to 1): Start with adding the reverse of the series which makes all numbers the same( e.g. 10). Count the resulting numbers. There are nn number of n-1. Finally, divide the result by 2 since an additional series was added at the beginning. By observing the below computation yourself, you can figure out why the formula is n* (n+1)/2 and sum of the first 9 number is 9*(10)/2.
9 + 8 + 7 + 6 + 5 + 4 + 3 + 2 + 1 + + + + + + + + + 1 2 3 4 5 6 7 8 9 || || || || || || || || || 10 10 10 10 10 10 10 10 10 WORST CASE, AVERAGE CASE AND BEST CASE OF AN ALGORITHM
To analyze an algorithm and to have a better picture of how an algorithm function works there are three situations that one may have to observe: Worst case, Average case and Best case. For example, for a sorting algorithm the worst case would be when the number of comparisons and swaps are the maximum and the best case would be when the number of comparisons and swaps becomes the minimum. The average case is the mixture of all special cases including worst and best cases, and in several algorithms finding the average case is not an easy task to figure out. In an example of bubble sort the worst case would be when data is sorted in reverse (descending) and we want to sort it in ascending order. However a bubble sort can be performed best if the data is already sorted. If there is no swap within a pass, adding a condition to the bubble sort will terminate any further passes. Other questions such as what happens if only one data is not sorted and is in last position or in the first position. These are the questions that we need to answer in an average case since we need to look all special situations. BIG-O NOTATION OR ORDER OF MAGNITUDE
One way to compare the algorithms without executing them or even writing them is to compare their order of magnitude (Big-O) function. A Big-O notation gives the rounded-off (upper bound) worst-case performance of an algorithm. An algorithm’s worst case is a situation that we need to watch out for. For example, the Big-O notation for the bubble sort is O( ).
n^{2}SWAP FUNCTION: DONE DIFFERENTLY
When two items are not in order, a swap is necessary to sort the items. For a swap, a temporary variable is needed to hold the value of the assigned variable before it is replaced. Note that, by default, pass of parameters in C++ is done by value. Any swap by value will not have impact upon calling the function. Therefore, the name of an array has to be passed with both indices. Pass the parameters by reference or pass the parameters by pointers as illustrated below: swap (int slot[ ], int i, int j) { int temp = slot [ i ]; slot [ i ] = slot[ j ]; slot [ j ] = temp; }In the above swap function the array and the index i and j of the array are passed.
swap (int &x, int &y) { int temp = x; x = y; y = temp; }For the above function to swap, the address of each element of an array is passed. In C++ this passing is known as pass by reference. In order to use a swap function we need temporary memory. The following function demonstrates the swap without wasting memory. The function call in the main program will be as follows: swap (slot[i], slot[i+1]);
Another way to swap is to pass pointers. swap (int *x, int *y) { int temp=*x; *x=*y; *y=temp; }The above swap function receives the address of each element of the array and stored in a pointer variable (this is done traditionally in C Language). The function call in main will be: swap (&slot[i], &slot[i+1]);
SWAP WITHOUT A TEMPORARY VARIABLE
If you wonder how to swap two variables without a third variable, so you can save memory, here we have two solutions: swap (int &x, int &y) { x = x + y; y = x – y; x = x – y; }The above swap will save memory, but it will take longer due to the computation. Another way to swap two items without a temporary variable is to use Exclusive-OR as show below. Exclusive-OR is faster than regular addition and subtraction. The program below uses better computation because it performs by using Exclusive-OR (bit-wise operator). swap (int &x, int &y) { x = x ^ y; y = x ^y; x = y ^ x; }For example, x = 5, y = 6. Before the swap the binary number of x is 101; binary number of y is 110. Exclusive-OR is same OR except, when there are two 1s the answer will be 0. x = x ^ y is 011 y = x ^y is 101 x = y ^ x is 110After swap x =110, y =101. Hint: do not confuse logical OR (true / false) with bit-wise OR (binary number). SELECTION SORT
On way to sort a series of data is to select one of the data and place it its proper position (sorted position) by swapping with other data. This kind of sort is known as since its foundation is based on the selection of an item each time. The family of selection sorts includes selection sort and straight selection.
quick sortSTRAIGHT SELECTION SORT
In order to sort a set of data according to the straight selection, the first data is elected and is assumed to have the smallest value of all the data. Next it goes through the rest of the data and compares that value to each remaining data. If the selected value is not the smallest it will be reset to reflect the smallest value found. At the end of the first pass the smallest found is swapped and placed at the first position. For example, the first position of the array is selected as the minimum location that holds the smallest value. A loop through of the rest of an array will reset the smallest position if it is necessary. This process continues for the entire pass.
STRAIGHT SELECTION SORT ALGORITHM: SIMPLIFIED
<0l> STRAIGHT SELECTION SORT: THE FIRST PASS
The first pass of straight selection is shown below. At the beginning number 8 is chosen as the smallest value then number 3 becomes the smallest and then number 1, which will stay until the end of the pass. At the end of the pass number 8 will be swapped with number 3. (8)(3)(1)(5)(7)(10)(4)(9)(6)(2) (8) (3) (1) (1) (1) (1) (1) (1) (1) (1) (8)The following illustrates the original data aftermath of the first pass in selection sort Original data (8)(3)(1)(5)(7)(10)(4)(9)(6)(2) First pass (1)(3)(8)(5)(7)(10)(4)(9)(6)(2) STRAIGHT SELECTION PROGRAM: FIRST PASS
In the programming of the first pass of the straight selection, the first location is set as the smallest location ( sml=0 ). A loop scans through the second element of the array and so on comparing each element of the array ( slot[scan] ) with the element in the smallest location. This will reset the sml to the new smallest value if comparison is true. The swap function will take place after the loop (first pass). Note that the location of the smallest value is used instead of the smallest value itself since the smallest location is used as the index of an array in the swap function.
sml=0; for (scan=1; scan |