Algoritmo de clasificación de montón

En este tutorial, aprenderá cómo funciona el algoritmo de ordenación del montón. Además, encontrará ejemplos prácticos de ordenación de pila en C, C ++, Java y Python.

Heap Sort es un algoritmo de clasificación popular y eficiente en programación de computadoras. Aprender a escribir el algoritmo de ordenación del montón requiere el conocimiento de dos tipos de estructuras de datos: matrices y árboles.

El conjunto inicial de números que queremos ordenar se almacena en una matriz, por ejemplo, (10, 3, 76, 34, 23, 32)y después de ordenar, obtenemos una matriz ordenada.(3,10,23,32,34,76)

La clasificación de montón funciona visualizando los elementos de la matriz como un tipo especial de árbol binario completo llamado montón.

Como requisito previo, debe conocer un árbol binario completo y una estructura de datos de pila.

Relación entre índices de matriz y elementos de árbol

Un árbol binario completo tiene una propiedad interesante que podemos usar para encontrar los hijos y los padres de cualquier nodo.

Si el índice de cualquier elemento en la matriz es i, el elemento en el índice 2i+1se convertirá en el hijo izquierdo y el elemento en el 2i+2índice se convertirá en el hijo derecho. Además, el padre de cualquier elemento en el índice i viene dado por el límite inferior de (i-1)/2.

Relación entre índices de matriz y montón

Vamos a probarlo

 Hijo izquierdo de 1 (índice 0) = elemento en (2 * 0 + 1) índice = elemento en 1 índice = 12 Hijo derecho de 1 = elemento en (2 * 0 + 2) índice = elemento en 2 índice = 9 De manera similar, Hijo izquierdo de 12 (índice 1) = elemento en (2 * 1 + 1) índice = elemento en 3 índice = 5 Hijo derecho de 12 = elemento en (2 * 1 + 2) índice = elemento en 4 índice = 6

Confirmemos también que las reglas son válidas para encontrar el padre de cualquier nodo

 Padre de 9 (posición 2) = (2-1) / 2 = ½ = 0.5 ~ 0 índice = 1 Padre de 12 (posición 1) = (1-1) / 2 = 0 índice = 1

Comprender esta asignación de índices de matriz a posiciones de árbol es fundamental para comprender cómo funciona la estructura de datos de montón y cómo se utiliza para implementar la clasificación de montón.

¿Qué es la estructura de datos del montón?

Heap es una estructura de datos especial basada en árboles. Se dice que un árbol binario sigue una estructura de datos de montón si

es un árbol binario completo
Todos los nodos del árbol siguen la propiedad de que son mayores que sus hijos, es decir, el elemento más grande está en la raíz y sus hijos y más pequeño que la raíz, etc. Este montón se llama max-heap. Si, en cambio, todos los nodos son más pequeños que sus hijos, se llama min-heap

El siguiente diagrama de ejemplo muestra Max-Heap y Min-Heap.

Max Heap y Min Heap

Para obtener más información al respecto, visite Estructura de datos de montón.

Cómo "apilar" un árbol

A partir de un árbol binario completo, podemos modificarlo para que se convierta en un Max-Heap ejecutando una función llamada heapify en todos los elementos que no son hojas del montón.

Dado que heapify usa la recursividad, puede ser difícil de comprender. Primero pensemos en cómo amontonarías un árbol con solo tres elementos.

 heapify(array) Root = array(0) Largest = largest( array(0) , array (2*0 + 1). array(2*0+2)) if(Root != Largest) Swap(Root, Largest)

Heapify casos base

El ejemplo anterior muestra dos escenarios: uno en el que la raíz es el elemento más grande y no necesitamos hacer nada. Y otro en el que la raíz tenía un elemento más grande cuando era niño y necesitábamos intercambiar para mantener la propiedad max-heap.

Si ha trabajado con algoritmos recursivos antes, probablemente haya identificado que este debe ser el caso base.

Ahora pensemos en otro escenario en el que hay más de un nivel.

Cómo agrupar el elemento raíz cuando sus subárboles ya son montones máximos

El elemento superior no es un montón máximo, pero todos los subárboles son montones máximos.

Para mantener la propiedad max-heap para todo el árbol, tendremos que seguir presionando 2 hacia abajo hasta que alcance su posición correcta.

Cómo acumular el elemento raíz cuando sus subárboles son max-montones

Thus, to maintain the max-heap property in a tree where both sub-trees are max-heaps, we need to run heapify on the root element repeatedly until it is larger than its children or it becomes a leaf node.

We can combine both these conditions in one heapify function as

 void heapify(int arr(), int n, int i) ( // Find largest among root, left child and right child int largest = i; int left = 2 * i + 1; int right = 2 * i + 2; if (left arr(largest)) largest = left; if (right arr(largest)) largest = right; // Swap and continue heapifying if root is not largest if (largest != i) ( swap(&arr(i), &arr(largest)); heapify(arr, n, largest); ) )

This function works for both the base case and for a tree of any size. We can thus move the root element to the correct position to maintain the max-heap status for any tree size as long as the sub-trees are max-heaps.

Build max-heap

To build a max-heap from any tree, we can thus start heapifying each sub-tree from the bottom up and end up with a max-heap after the function is applied to all the elements including the root element.

En el caso de un árbol completo, el primer índice de un nodo no hoja viene dado por n/2 - 1. Todos los demás nodos después de eso son nodos hoja y, por lo tanto, no necesitan ser apilados.

Entonces, podemos construir un montón máximo como

  // Build heap (rearrange array) for (int i = n / 2 - 1; i>= 0; i--) heapify(arr, n, i);

Cree una matriz y calcule i

Pasos para construir el montón máximo para ordenar el montón

Como se muestra en el diagrama anterior, comenzamos por apilar los árboles más pequeños y más bajos y gradualmente subimos hasta llegar al elemento raíz.

Si ha entendido todo hasta aquí, felicidades, está en camino de dominar el tipo de pila.

¿Cómo funciona la ordenación por pila?

Dado que el árbol satisface la propiedad Max-Heap, el elemento más grande se almacena en el nodo raíz.
Intercambiar: Elimina el elemento raíz y ponlo al final de la matriz (enésima posición) Pon el último elemento del árbol (montón) en el lugar vacante.
Eliminar: reduce el tamaño del montón en 1.
Heapify: Heapify el elemento raíz nuevamente para que tengamos el elemento más alto en la raíz.
El proceso se repite hasta que se ordenan todos los elementos de la lista.

Intercambiar, eliminar y agrupar

El siguiente código muestra la operación.

  // Heap sort for (int i = n - 1; i>= 0; i--) ( swap(&arr(0), &arr(i)); // Heapify root element to get highest element at root again heapify(arr, i, 0); )

Ejemplos de Python, Java y C / C ++

Python Java C C ++

 # Heap Sort in python def heapify(arr, n, i): # Find largest among root and children largest = i l = 2 * i + 1 r = 2 * i + 2 if l < n and arr(i) < arr(l): largest = l if r < n and arr(largest) < arr(r): largest = r # If root is not largest, swap with largest and continue heapifying if largest != i: arr(i), arr(largest) = arr(largest), arr(i) heapify(arr, n, largest) def heapSort(arr): n = len(arr) # Build max heap for i in range(n//2, -1, -1): heapify(arr, n, i) for i in range(n-1, 0, -1): # Swap arr(i), arr(0) = arr(0), arr(i) # Heapify root element heapify(arr, i, 0) arr = (1, 12, 9, 5, 6, 10) heapSort(arr) n = len(arr) print("Sorted array is") for i in range(n): print("%d " % arr(i), end='')

 // Heap Sort in Java public class HeapSort ( public void sort(int arr()) ( int n = arr.length; // Build max heap for (int i = n / 2 - 1; i>= 0; i--) ( heapify(arr, n, i); ) // Heap sort for (int i = n - 1; i>= 0; i--) ( int temp = arr(0); arr(0) = arr(i); arr(i) = temp; // Heapify root element heapify(arr, i, 0); ) ) void heapify(int arr(), int n, int i) ( // Find largest among root, left child and right child int largest = i; int l = 2 * i + 1; int r = 2 * i + 2; if (l arr(largest)) largest = l; if (r arr(largest)) largest = r; // Swap and continue heapifying if root is not largest if (largest != i) ( int swap = arr(i); arr(i) = arr(largest); arr(largest) = swap; heapify(arr, n, largest); ) ) // Function to print an array static void printArray(int arr()) ( int n = arr.length; for (int i = 0; i < n; ++i) System.out.print(arr(i) + " "); System.out.println(); ) // Driver code public static void main(String args()) ( int arr() = ( 1, 12, 9, 5, 6, 10 ); HeapSort hs = new HeapSort(); hs.sort(arr); System.out.println("Sorted array is"); printArray(arr); ) )

 // Heap Sort in C #include // Function to swap the the position of two elements void swap(int *a, int *b) ( int temp = *a; *a = *b; *b = temp; ) void heapify(int arr(), int n, int i) ( // Find largest among root, left child and right child int largest = i; int left = 2 * i + 1; int right = 2 * i + 2; if (left arr(largest)) largest = left; if (right arr(largest)) largest = right; // Swap and continue heapifying if root is not largest if (largest != i) ( swap(&arr(i), &arr(largest)); heapify(arr, n, largest); ) ) // Main function to do heap sort void heapSort(int arr(), int n) ( // Build max heap for (int i = n / 2 - 1; i>= 0; i--) heapify(arr, n, i); // Heap sort for (int i = n - 1; i>= 0; i--) ( swap(&arr(0), &arr(i)); // Heapify root element to get highest element at root again heapify(arr, i, 0); ) ) // Print an array void printArray(int arr(), int n) ( for (int i = 0; i < n; ++i) printf("%d ", arr(i)); printf(""); ) // Driver code int main() ( int arr() = (1, 12, 9, 5, 6, 10); int n = sizeof(arr) / sizeof(arr(0)); heapSort(arr, n); printf("Sorted array is "); printArray(arr, n); )

 // Heap Sort in C++ #include using namespace std; void heapify(int arr(), int n, int i) ( // Find largest among root, left child and right child int largest = i; int left = 2 * i + 1; int right = 2 * i + 2; if (left arr(largest)) largest = left; if (right arr(largest)) largest = right; // Swap and continue heapifying if root is not largest if (largest != i) ( swap(arr(i), arr(largest)); heapify(arr, n, largest); ) ) // main function to do heap sort void heapSort(int arr(), int n) ( // Build max heap for (int i = n / 2 - 1; i>= 0; i--) heapify(arr, n, i); // Heap sort for (int i = n - 1; i>= 0; i--) ( swap(arr(0), arr(i)); // Heapify root element to get highest element at root again heapify(arr, i, 0); ) ) // Print an array void printArray(int arr(), int n) ( for (int i = 0; i < n; ++i) cout << arr(i) << " "; cout << ""; ) // Driver code int main() ( int arr() = (1, 12, 9, 5, 6, 10); int n = sizeof(arr) / sizeof(arr(0)); heapSort(arr, n); cout << "Sorted array is "; printArray(arr, n); )

Complejidad de clasificación de montón

Heap Sort tiene O(nlog n)complejidades de tiempo para todos los casos (mejor caso, caso promedio y peor caso).

Entendamos el motivo. La altura de un árbol binario completo que contiene n elementos eslog n

As we have seen earlier, to fully heapify an element whose subtrees are already max-heaps, we need to keep comparing the element with its left and right children and pushing it downwards until it reaches a point where both its children are smaller than it.

In the worst case scenario, we will need to move an element from the root to the leaf node making a multiple of log(n) comparisons and swaps.

During the build_max_heap stage, we do that for n/2 elements so the worst case complexity of the build_heap step is n/2*log n ~ nlog n.

During the sorting step, we exchange the root element with the last element and heapify the root element. For each element, this again takes log n worst time because we might have to bring the element all the way from the root to the leaf. Since we repeat this n times, the heap_sort step is also nlog n.

Also since the build_max_heap and heap_sort steps are executed one after another, the algorithmic complexity is not multiplied and it remains in the order of nlog n.

Also it performs sorting in O(1) space complexity. Compared with Quick Sort, it has a better worst case ( O(nlog n) ). Quick Sort has complexity O(n^2) for worst case. But in other cases, Quick Sort is fast. Introsort is an alternative to heapsort that combines quicksort and heapsort to retain advantages of both: worst case speed of heapsort and average speed of quicksort.

Heap Sort Applications

Systems concerned with security and embedded systems such as Linux Kernel use Heap Sort because of the O(n log n) upper bound on Heapsort's running time and constant O(1) upper bound on its auxiliary storage.

Aunque Heap Sort tiene O(n log n)complejidad de tiempo incluso en el peor de los casos, no tiene más aplicaciones (en comparación con otros algoritmos de clasificación como Quick Sort, Merge Sort). Sin embargo, su estructura de datos subyacente, heap, se puede usar de manera eficiente si queremos extraer el más pequeño (o más grande) de la lista de elementos sin la sobrecarga de mantener los elementos restantes en el orden ordenado. Por ejemplo, colas de prioridad.