Sorting Algorithms
Working with Data Structures and manipulating data.
import random
numbers = []
for i in range(10):
numbers.append(random.randint(0,100))
print("Random List")
print(numbers)
Explore
Bubble Sort What is happening with this sort?
- It repeatedly compares adjacent elements in a list and swaps them if they are in the wrong order.
- It starts by comparing the first two elements and swapping them if necessary.
- Then it compares the second and third elements, the third and fourth elements, and so on, until reaching the end of the list.
- Advantages
- time complexity of O(n logn) meaning it can sort though large arrays pretty quickly
- stable search
- Process
- it is a recursive algorithm that continuously splits an array in half until it cannot be divided any further
- merge sort is done each time the array is halved and then when all halves have been sorted, the merge operation is applied.
- merge operation takes smaller sorted arrays and combines them into a larger one
- starts with a list and finds the smallest element, and swaps it with the 1st element
- continues to do this until it gets to the largest element
- go from one number to the next and swap it if the number is smaller
- it goes from item to item in the list and checks that the number is in the right spot
Practice
[75, 17, 46, 80, 67, 45, 69, 79, 40, 0]
How would you sort this list with...
- Bubble Sort
- it compares the first and second element, then the second and third, then the third and fourth until it reaches the end of the list. While it compares the numbers, it will swap them if one number is smaller than the other.
- Selection Sort
- it will iterate through the list and find the smallest number which it will then swap with the first number in the list. It will then find the next smallest number and put it in the second spot in the list. It will keep doing this till it reaches the largest element.
[88, 39, 53, 39, 58, 43, 74, 81, 71, 51]
How would you sort this list with...
- Merge Sort
- you would halve the list again and again until the list cannot be sorted anymore. Then each time it is halved, the merge sort is done on each half so that when it can no longer be halved, everything is sorted. It then merges the halved lists back together into one list.
- Insertion Sort
- starts from the first number in the list and compares it to the next number. if the number is smaller, they swap. It keeps going until it goes through the entire list. If a number is not in the right place, the algorithm will check the list and place it in the right spot.
import nltk
import random
nltk.download('words') # Download the word list (only required once)
from nltk.corpus import words
english_words = words.words()
#print(len(english_words)) # Prints the number of words in the list
# You can now use the 'english_words' list in your code
word_list = []
for i in range(10):
word_list.append(english_words[random.randint(0,len(english_words))])
print("Random List")
print(word_list)
def bubble_sort(word_list):
n = len(word_list)
for i in range(n-1):
for j in range(n-i-1):
if word_list[j] > word_list[j+1]:
word_list[j], word_list[j+1] = word_list[j+1], word_list[j]
bubble_sort(word_list)
print("Sorted words:", word_list)
Discuss
Answer the following with your group.
- When should you use each algorithm? What makes an algorithm the right choice?
- time and space complexity
-
Given the following lists...
- [0, 2, 6, 4, 8, 10]
- [Elephant, Banana, Cat, Dog, Apple]
- [29, 13, 83, 47, 32, 78, 100, 60, 65, 15, 24, 9, 40, 68, 53, 8, 90, 58, 39, 32, 34, 91, 74, 94, 49, 87, 34, 87, 23, 17, 27, 2, 38, 58, 84, 15, 9, 46, 74, 40, 44, 8, 55, 28, 81, 92, 81, 88, 53, 38, 19, 21, 9, 54, 21, 67, 3, 41, 3, 74, 13, 71, 70, 45, 5, 36, 80, 64, 97, 86, 73, 74, 94, 79, 49, 32, 20, 68, 64, 69, 1, 77, 31, 56, 100, 80, 48, 75, 85, 93, 67, 57, 26, 56, 43, 53, 59, 28, 67, 50]
-
Select the algorithm you believe is best for each, explain.
- for the first one, you can use insertion which could swap the 6 and 4. Once it reaches the end of the list, the algorithm is finished so it is the most efficient
- for the second list, selection is the best sorting algorithm because it looks for the smallest item in the list which would be apple since it is alphabetically at a minimum. It would then swap it with elephant and so on
- for the third list, the best sorting algorithm would be merge because merge works the best on larger data sets. It can keep halving the data set, sorting, and finally merge it back into one list.
HACKS
Provided below is a Bubble Sort Algorithm sorting a list of dictionaries based off of selected key.
-
Now it's time to do some coding...
-
Run code and then research and answer these questions...
- Is a list and/or dictionary in python considered a primitive or collection type? Why?
- In Python, both lists and dictionaries are considered collection types
- A primitive type refers to data types such as integers, floats, booleans, and characters. These types hold a single value and are not composed of smaller elements. On the other hand, collection types are data structures that can hold multiple values or objects.
- Lists in Python are ordered collections of elements, where each element can be of any data type. Lists are considered collection types because they can store multiple values in a single variable.
- Is the list passed into bubble sort "pass-by-value" or "pass-by-reference? Describe why in relation to output.
- In Python, the list is passed by reference when passed into a function like bubble sort
- This means that any changes made to the list inside the function will affect the original list outside the function. The reason for this behavior is that when a list is passed to a function, its 'reference' is passed, rather than creating a new copy of the entire list.
- In the case of bubble sort, as elements are swapped within the function, the original list is modified. The modifications are reflected in the output because the function operates directly on the referenced list
- Is a list and/or dictionary in python considered a primitive or collection type? Why?
-
Implement new cell(s) and/or organize cells to do the following.
- Create your own list and sorting algorithm
- Test your list with my bubble sort
- Test my list with your new sort
- Research analysis on sorting:comparisons, swaps, time. Build this into your hacks. - Find a better way to print the data, key first, then other elements in viewable form.
Use the code below to help guide your adventure
"""
* Creator: Nighthawk Coding Society
Bubble Sort of a List with optimizations
"""
# bubble sorts a list of dictionaries, base off of provided key
def bubbleSort(list, key):
n = len(list) - 1 # list are indexed 0 to n-1, len is n
# Traverse through list with i index
for i in range(n):
swapped = False # optimize code, so it exits if now swaps on inner loop
# Inner traversal using j index
for j in range(n-i): # n-i as positions on right are in order in bubble
# Swap if the element KeyN is greater KeyN1
keyN = list[j].get(key)
keyN1 = list[j+1].get(key)
if keyN > keyN1:
swapped = True
list[j], list[j + 1] = list[j + 1], list[j] # single line swap
if not swapped: # if no swaps on inner pass, list is sorted
return # exit function
if __name__ == "__main__":``
# list/dictionary sample
list_of_people = [
{"name": "Risa", "age": 18, "city": "New York"},
{"name": "John", "age": 63, "city": "Eugene"},
{"name": "Shekar", "age": 18, "city": "San Francisco"},
{"name": "Ryan", "age": 21, "city": "Los Angeles"}
]
# assuming uniform keys, pick 1st row as source of keys
key_row = list_of_people[0]
# print list as defined
print("Original")
print(list_of_people)
for key in key_row: # finds each key in the row
print(key)
bubbleSort(list_of_people, key) # sort list of people
print(list_of_people)
Creating my own list and sorting algorithm (merge)
def mergeSort(arr, sortkey):
if len(arr) > 1: # makes sure list isn't alr sorted
left_arr = arr[:len(arr)//2] #starts at beginning and goes to mid
right_arr = arr[len(arr)//2:] #starts at middle and goes to end
# recursion for merge sort
mergeSort(left_arr, sortkey)
mergeSort(right_arr, sortkey)
# merge step
i = 0 # keep track of leftmost element in left arr
j = 0 # keep track of leftmost element in right arr
k = 0 # merged array index
while i < len(left_arr) and j < len(right_arr): # keep going until one array is at the end of the index
if compareArrItem(left_arr[i], right_arr[j], sortkey):
arr[k] = left_arr[i]
i += 1
else:
arr[k] = right_arr[j]
j += 1
k += 1 # add 1 to k each time so that next item is added to next index of arr
#checking for leftovers
while i < len(left_arr):
arr[k] = left_arr[i]
i += 1
k += 1
while j < len(right_arr):
arr[k] = right_arr[j]
j += 1
k += 1
def compareArrItem(item1, item2, sortkey): #return true if item1 is less than item2
return item1[sortkey] < item2[sortkey]
# my list is a group of users and their enjoyment rating out of 10 of DVASS(our group project) which includes uno, war, black jack, and the memory game as well as how many hours they spent playing the games
dvass_survey = [
{"name": "Nathan", "rating": 8, "hours": 7},
{"name": "Max", "rating": 10, "hours": 80},
{"name": "Kalani", "rating": 7, "hours": 8},
{"name": "Evan", "rating": 8, "hours": 5},
{"name": "Jaden", "rating": 10, "hours": 11},
{"name": "Ryan", "rating": 50, "hours": 50},
{"name": "Aniket", "rating": 100, "hours": 67}
]
mergeSort(dvass_survey, 'rating')
print(dvass_survey)
mergeSort(dvass_survey, 'hours')
print(dvass_survey)
Test given list with my new sort
mergeSort(list_of_people, 'age')
print(list_of_people)
Research analysis on sorting: comparisons, swaps, time. Build this into your hacks. Bubble Sort:
- Comparisons: Bubble Sort makes a maximum of n*(n-1)/2 comparisons in the worst case, where n is the number of elements.
- Swaps: Bubble Sort makes a maximum of n*(n-1)/2 swaps in the worst case.
- Time Complexity: Bubble Sort has an average and worst-case time complexity of O(n^2).
Insertion Sort:
- Comparisons: Insertion Sort makes a maximum of n*(n-1)/2 comparisons in the worst case.
- Swaps: Insertion Sort makes a maximum of n*(n-1)/2 swaps in the worst case.
- Time Complexity: Insertion Sort has an average and worst-case time complexity of O(n^2). However, it performs well on partially sorted or small lists.
Selection Sort:
- Comparisons: Selection Sort makes a maximum of n*(n-1)/2 comparisons in the worst case.
- Swaps: Selection Sort makes a maximum of n swaps in the worst case.
- Time Complexity: Selection Sort has an average and worst-case time complexity of O(n^2).
Merge Sort:
- Comparisons: Merge Sort makes a maximum of n*log(n) comparisons in the worst case.
- Swaps: Merge Sort requires additional memory for merging and does not involve element swaps.
- Time Complexity: Merge Sort has an average and worst-case time complexity of O(n log n).
Find a better way to print the data, key first, then other elements in viewable form.
mergeSort(dvass_survey, 'rating')
def mergePrint(list, key):
for i, user in enumerate(list):
print("#" + str(i + 1) + key)
for k, v in user.items():
print("\t" + str(k) + ":", str(v))
print("")
mergePrint(dvass_survey, 'rating')