What are some common data structures and their basic operations? How are common sorting algorithms implemented? What are the pros and cons of each algorithm? This article aims to address these questions by introducing the basics of sorting algorithms, and by discussing common data structures and common sorting algorithms.
A data structure is a data organization, management, and storage format, which is used to efficiently access and modify data.
Data structures are cornerstones of algorithms. By seeing an algorithm as a dancer, the data structure is the stage.
Physical structures, such as arrays and linked lists, are visible and tangible, just like human flesh, blood, and bones.
Logical structures, such as queues, stacks, trees, and graphs, are invisible and intangible, just like human thoughts and spirits.
Big O notation (progressive time complexity): simplifies the relative execution time function T(n) of a program into an order of magnitude, such as n, n^2, or logN.
The following rules are used to derive time complexities:
Comparison of time complexities: O(1) > O(logn) > O(n) > O(nlogn) > O(n^2)
The following figure shows the number of times each time complexity is run.
Constant space O(1): The storage space is fixed in size and is irrelevant to the input scale.
Linear space O(n): The allocated space is a linear collection, and the size of the collection is proportional to the n input scale.
Two-dimensional space O(n^2): The allocated space is a two-dimensional array collection, and the length and width of the collection are proportional to the n input scale.
Recursive space O(logn): Recursion is a special scenario. Although no variables or collections are explicitly declared in recursive code, a memory space is specified to store method call stacks when a program runs on a computer. The memory capacity required by recursion is proportional to the depth of the recursion.
Stable: If a is located before b and a is equal to b, a is still located before b after sorting.
Unstable: If a is located before b and a is equal to b, a may be located next to b after sorting.
Specific algorithms are used to solve specific problems.
Among these algorithms, string, search, and sorting algorithms are the most basic ones.
An array is a collection of limited ordered variables of the same type. Each variable in the array is called an element.
Basic operations on arrays are read O(1), update O(1), insert O(n), delete O(n), and expand O(n).
A linked list is a linear data structure, in which elements are stored at non-contiguous memory locations. It is a data structure consisting of nodes.
Each node in a single linked list contains the data and next fields. The data field stores data of the node, whereas the next field stores the address of the next node.
Basic operations on linked lists are read O(n), update O(1), insert O(1), and delete O(1).
Arrays are suitable for scenarios with more read operations and fewer insert and delete operations.
Linked lists are suitable for scenarios with more insert and delete operations and fewer read operations.
A stack is a linear logical data structure that follows the last in first out (LIFO) principle. The location where the earliest element is stored is called the stack bottom, and the location where the last element is stored is called the stack top.
A stack resembles a pipe with one end blocked and the other open. In contrast, a queue resembles a pipe with both ends open.
Array implementation
Linked list implementation
Basic operations on stacks are push O(1) and pop O(1).
A queue is a linear logical data structure that follows the last in last out (LILO) principle. The exit of a queue is the head of the queue, and the entry of the queue is the tail of the queue.
Array implementation
Linked list implementation
Basic operations on queues are enqueue O(1) and dequeue O(1).
A hash table is a logical data structure that can map keys to values.
Basic operations on hash tables are read O(1), write O(1), and expand O(n).
A hash table is essentially an array that can only be accessed based on subscripts, such as a[0] a[1] a[2] a[3]. Most keys of hash tables are strings.
You can use a hash function to convert a key of string or other types to the index subscript of an array.
Assume that the length of an array is 8.
When the key is 001121, the following information appears:
index = HashCode ("001121") % Array.length = 7
When the key is this, the following information appears:
index = HashCode ("this") % Array.length = 6
The subscripts obtained by a hash function for different keys may be the same. For example, the array subscripts corresponding to the 002936 and 002947 keys are both 2. This situation is called a hash collision.
Linear probing: Threadlocal
Linked list: Hashmap
A tree is a finite set of n (n ≥ 0) nodes.
When n is 0, the tree is an empty tree. Any tree with at least one node has the following features:
(1) Depth-first search (DFS)
Pre-order traversal: root node, left subtree, and right subtree
In-order traversal: left subtree, root node, and right subtree
Post-order traversal: left subtree, right subtree, and root node
Implementation: recursion or stacks
(2) Breadth-first search (BFS)
Level order traversal: traversal by level
Implementation: queues
A binary tree is a special tree. Each node of a binary tree contains up to two child nodes. Specifically, each node of a binary tree can contain 0 to 2 child nodes.
Each non-leaf node of a full binary tree contains two child nodes, and all leaf nodes are on the same level.
In a binary tree, all its n nodes are numbered from 1 to n by level. If all the n nodes are in the same positions as the nodes in a full binary tree of the same depth, this tree is a complete binary tree.
A binary search tree (BST) is a binary tree that meets the following conditions:
A binary heap is a special complete binary tree that is divided into two types: maximum heaps and minimum heaps.
(1) Insert: Insert a node at the end of a binary heap. Then, the nodes rise.
(2) Delete: Delete the head node of a binary heap and move the tail node to the head. Then, the nodes sink.
(3) Construct: Construct a binary tree before a binary heap. All non-leaf nodes sink one by one.
Arrays
Bubble sort is a simple sorting algorithm. It repeatedly steps through the list, compares two adjacent elements at a time, and swaps them if they are in the wrong order. The pass through the list is repeated until the list is fully sorted. The algorithm, which is a comparison sort, is named for the way smaller elements "bubble" to the top of the list.
It is applicable to scenarios with a small amount of ordered data.
1) Bubble sort continues after the list is sorted
2) The list is partially sorted, but all its elements are traversed in the next round
3) All elements must be sorted even if only one of them is out of order
Merge sort is an efficient, merge-based sorting algorithm. This algorithm is a typical divide-and-conquer algorithm. It recursively splits the list into two sublists and then integrates the two sublists while maintaining the element sequence to produce an ordered list.
Image source: https://www.cnblogs.com/chengxiao/p/6194356.html
Advantages:
Disadvantages:
It is applicable to scenarios where the data volume is large and stable sorting is required.
The quicksort algorithm splits a list into a large sublist and a small sublist by using the divide-and-conquer policy. Then, it sorts the two sublists recursively to ensure the eventual sorting of the entire list.
Advantages:
Disadvantages:
It is applicable to scenarios where the data volume is large and sorting can be unstable.
1) The maximum or minimum element is selected as the pivot each time
2) The list contains a large amount of repeated data
3) Quicksort performance is optimized
Heapsort is a sorting algorithm designed based on heaps. A heap is a data structure that approximates a complete binary tree and meets the property requirements of heaps: The key value or index of each child node is always less than (or greater than) that of its parent node.
Advantages:
Disadvantages:
It is applicable to scenarios where a large amount of data is input in streaming mode.
Based on the heapsort process, after the maximum heap is established, the top element is swapped with the last element in the heap, and then the new top element is sunk to the appropriate position. During the sinking process, a large number of almost ineffective comparisons are made because the elements at the bottom are small. Therefore, although the complexity of heapsort and quicksort is both O(NlogN), the constant coefficient of heapsort is greater.
Counting sort is not a comparison-based sorting algorithm. Instead, it aims to convert input data values into keys and store them in extra array space. As a linear sorting algorithm of time complexity, counting sort requires that the input data be integers with specific ranges.
Advantages:
Disadvantages:
The value of each element is an integer. This algorithm is applicable only when the k value in the time complexity is small and the elements concentrate in the list.
The number does not start from 0, which may waste space
Bucket sort is the upgrade of counting sort. Its efficiency depends on mapping functions. Implementation: Assume that the input data is evenly distributed. Distribute the data into a limited number of buckets, and then sort data in each bucket. You may continue to sort data by using the bucket sort algorithm in a recursive manner or other sorting algorithms.
Advantages:
Disadvantages:
It is applicable to scenarios where data is evenly distributed.
Generate a random list of N numbers in the range from 0 to K. Use various algorithms for sorting and record the time required for each sorting.
[1] Cartoon Algorithm: Algorithm Journey of Xiaohui
[2] Algorithms, Fourth Edition
[3] Grokking Algorithms: An Illustrated Guide for Programmers and Other Curious People
[4] For Offers
[5] Top 10 Classic Sorting Algorithms (Demonstrated in Motion Graphs)
[6] Wikipedia
Alibaba Clouder - November 1, 2018
digoal - September 18, 2019
Alibaba Clouder - December 5, 2016
Alibaba Clouder - October 15, 2020
digoal - September 12, 2019
Alibaba Clouder - March 6, 2020
Alibaba Cloud provides big data consulting services to help enterprises leverage advanced data technology.
Learn MoreAlibaba Cloud experts provide retailers with a lightweight and customized big data consulting service to help you assess your big data maturity and plan your big data journey.
Learn MoreA platform that provides enterprise-level data modeling services based on machine learning algorithms to quickly meet your needs for data-driven operations.
Learn MoreApsaraDB for HBase is a NoSQL database engine that is highly optimized and 100% compatible with the community edition of HBase.
Learn More