KMP算法简单介绍

KMP（Knuth-Morris-Pratt）算法是一种用于字符串匹配的经典算法，其目的是在一个文本串（主串）中查找一个模式串的出现位置。与朴素的字符串匹配算法相比，KMP算法具有更高的效率，特别是在处理大型文本时。

KMP算法的核心思想是利用已经匹配过的部分信息来避免不必要的比较。算法的关键在于构建一个部分匹配表（Partial Match Table，PMT），该表用于存储模式串中每个位置的最长前缀和最长后缀的匹配长度。

经典例题引入

题目描述:给定文本串 T 和模式串 P，找到模式串在文本串中的第一次出现的位置。

一般的暴力算法(朴素字符串匹配算法):

int search(char text[], char pattern[]) {
    int m = strlen(pattern);
    int n = strlen(text);

    for (int i = 0; i <= n - m; i++) {
        int j;
        for (j = 0; j < m; j++) {
            if (text[i + j] != pattern[j])
                break;
        }

        if (j == m)
            return i;  // 匹配成功，返回模式串在文本串中的起始位置
    }

    return -1;  // 没有找到匹配的位置
}

其算法时间复杂度为O(M*N);

而KMP算法可以将其时间复杂度减小到O(N)

用KMP算法解决上述例题

int* computeNextArray(char str2[], int length) {
    if (length == 1) {
        int* next = (int*)malloc(sizeof(int));
        next[0] = -1;
        return next;
    }

    int* next = (int*)malloc(sizeof(int) * length);
    next[0] = -1;
    next[1] = 0;

    int i = 2; // 目前在哪个位置上求 next 数组的值
    int cn = 0; // cn 表示前缀和后缀相等的长度

    while (i < length) {
        if (str2[i - 1] == str2[cn]) { // 配成功的时候
            next[i++] = ++cn;
        } else if (cn > 0) {
            cn = next[cn];
        } else {
            next[i++] = 0;
        }
    }

    return next;
}//用于得出next数组

int getIndexof(char text[], char pattern[]) {
    if (text == NULL || pattern == NULL || strlen(pattern) < 1 || strlen(text) < strlen(pattern)) {
        return -1;
    }

    int x = 0;
    int y = 0;
    int *next = (int *)malloc(sizeof(int) * strlen(pattern));

    computeNextArray(pattern, strlen(pattern));

    while (x < strlen(text) && y < strlen(pattern)) {
        if (text[x] == pattern[y]) {
            x++;
            y++;
        } else if (next[y] == -1) {
            x++;
        } else {
            y = next[y];
        }
    }

    free(next);

    return y == strlen(pattern) ? x - y : -1;
}