search and sort
Search and SortSearching and sorting data is one of the most common applications of programming.
This is most commonly used with arrays and arraylists.
Searching is when a program looks for the location or index of a specific element or value. This assumes the data structure is already sorted in ascending or descending order.
Sequential Search
Sequential search iterates through every single element of the data and compares each element to the desired element until the desired one is found, or it has looked through every element in the data. If the element does not exist in the data, -1 is returned.
This algorithm does not take advantage of the sorted array and is inefficient, since it may need to search every single element in the data to find out if an element is present. However, it doesn't need the data to be sorted.
The best case occurs when the desired element is the first value in the array or arraylist. The worst case occurs when the desired element is either not present or at the very end or the array or arraylist, as every single element in the data must be iterated through.
The time complexity of sequential search is O(n).
Implementation example of sequential search:
public int seqSearch(E[] arr, E element) {
for (int i = 0; i < arr.length; i ++) { // loops through arr to find E element
if (arr[i].equals(element)) {
return i; // if the element is found, its position is returned
}
}
return -1; // if the element is not in the data, return -1
}
Binary Search
Binary search is a much more efficient algorithm than sequential search. It starts by comparing the middle element of the data to the desired element and then halves the remaining amount of data needed to be searched through each time until the element is found or no data remains to be searched.
Since the data is sorted, the algorithm can efficiently reduce the range each time by getting rid of the higher/lower half of the remaining data, hence the name binary search.
This algorithm has time complexity O(log n).
There are two ways to implement this algorithm, as it can be done with iteration using a while loop or with recursion.
Iterative example:
public int binarySearch(int [] arr, int element) { // finding element in integer array arr
int x = 0, y = arr.length-1;
while (x <= y) { // while there is still data to be searched
int m = x + (y - x) / 2;
if (arr[m] == x) return m; // return the index if the element is found
if (array[m] < x) x = m - 1; // halve range accordingly
else y = m + 1;
}
return -1; // return -1 if element was not in the data
}
Recursive Example:
public int binarySearch(int arr[], int x, int a, int b) {
if (b >= a) { // if there is still data to be searched
int m = a + (b - a) / 2;
if (arr[m] == x) return m; // base case: return the index if the element is found
if (arr[m] > x) return binarySearch(array, x, a, m - 1); // halve range accordingly
else return binarySearch(arr, x, m + 1, y);
}
return -1; // return -1 if element was not in the data
}
If you are not familiar with recursion, check out my tutorial.
Sorting is when a program organizes comparable elements of the same type in ascending or descending order.
This allows for more efficient searching, as previously mentioned. There are recursive and non-recursive sorting algorithms of varying efficiencies.
Bubble Sort
Bubble Sort is a simple sorting algorithm that repeatedly iterates through data by comparing each pair of adjacent elements, and swapping them if they are in the wrong order. The algorithm gets its name from the way the elements "bubble" to the top of the list with each full pass. For each full pass, the range of data which needs to be passed through decreases by 1, until there is a pass where no swaps are made and the data is fully sorted.
The best-case scenario for this sorting algorithm would be to already have a sorted dataset, and the worst case scenario for the algorithm would be to have a dataset with its data in reverse order.
This algorithm has time complexity O(n²), since in the worst case (when the data is in reverse order), n(n-1) * 0.5 swaps must be made. Since the algorithm is in-place (it does not create any new space for sorting purposes), it has a space complexity of O(1).
This algorithm is implemented iteratively:
int[] bubbleSort (int[] arr){
int swaps = -1; // swap counter variable
while (swaps != 0){ // condition for unsorted array
swaps = 0;
for (int i = 0; i < arr.length - 1; i++){ // decreases the range which needs to be searched
for (int j = 0; j < arr.length - i - 1; j++){
if (arr[j] > arr[j + 1]){ // swap elements if needed
int temp = arr[j];
arr[j] = arr[j + 1];
arr[j + 1] = temp;
swaps++;
}
}
}
}
return arr; // returns the sorted array
}
Selection Sort
Selection Sort is a sorting algorithm that sorts an array by repeatedly finding the maximum element from the unsorted part of the array and swapping it with the last element of the unsorted part of the array. The range of the unsorted part decreases by 1 each time until the entire array is sorted.
This algorithm does not have a best or worst case scenario. It has time complexity O(n²) since n(n-1) * 0.5 swaps must be made. Since the algorithm is in-place (it does not create any new space for sorting purposes), it has a space complexity of O(1).
This algorithm is implemented iteratively:
int[] selectionSort (int[] arr){
for(int i = arr.length; i > 1; i--){ // decreases range of unsorted part
int max = 0; // variable for index of element with the maximum value within the range
for(int x = 1; x < i; x++){
if(arr[x] > arr[max]) max = x; // adjust max if needed
}
int temp = arr[i-1]; // swaps the max element and the i-1th element
arr[i-1] = arr[max];
arr[max] = temp;
}
return arr; // returns the sorted array
}
Interested in more sorting algorithms? Check out my project on a non-comparative sorting algorithm.
Ideally, programming algorithms should be as efficient as possible due to limits in storage, memory, and time.
To compare efficiencies, big O notation is used. The amount of time or space an algorithm takes is called its time or space complexity.
Big O notation utilizes the omega (Ω) symbol to represent the lowest or best case complexity for an algorithm, and capital O (O) to represent the worst or highest complexity for an algorithm.
The notation is in the syntax O(f(n)) where f is any function and n increases without bound. f(n) represents the space or time the algorithm takes.