About Software: [LeetCode] 10. Regular Expression Matching

Implement regular expression matching with support for '.' and '*'.

'.' Matches any single character.
'*' Matches zero or more of the preceding element.

The matching should cover the entire input string (not partial).

The function prototype should be:
bool isMatch(const char *s, const char *p)

Some examples:
isMatch("aa","a") → false
isMatch("aa","aa") → true
isMatch("aaa","aa") → false
isMatch("aa", "a*") → true
isMatch("aa", ".*") → true
isMatch("ab", ".*") → true
isMatch("aab", "c*a*b") → true

<Solution>

這題頗困難，看了幾個分析才懂(ref1，ref2)

此題用 DP 的解法，會比較快，以下是說明

1. dp[i][j] : true if s[0..i) matches p[0..j), false otherwise

注意一點的是，dp的 size 是 (s.length() + 1) x (p.length + 1)

所以 index of s = i - 1，index of p = j - 1

2. 當 dp 的 i = 0 的時候，也就是 dp[0][1..patternLen_plus1-1] 這個範圍

代表的意思就是 s = ""，p = "XXXX"

而在這個情況下能 match，就只有 p 裡面的字元都有一個 * 在後面

例如 : s="", p="a*b*c*"

所以在 i = 0 時，檢查有沒有遇到 *，遇到就等於 dp[0][j-2] 的值

這個意思等同某個字元有跟著 *，且出現0次

dp[0][j-2] ：跳過p的上一個字元，如果是 match，就是 match(等同於現在字元出現 0 次)

3. 在 dp 其他部分的值，就要分當下在 p 的字元是不是 *

(1) 不是 *

目前 s 的字元是不是等同於目前 p 的字元，或是 p 目前字元是 .

且s和p的上一個階段，也就是 i-1, j-1 也必須是 match的

(2) 是 *

先看 dp[i][j-2] 是 true or false。如果是 true 就直接是 true，因為是 zero occurrence

如果是 false，代表目前 s 的字元有出現過，所以檢查目前s字元是否等於p上一個字元

或是 p 上一個字元是 .

如果上述檢查是 true，代表當前字元是 match

就看 s 的上一個字元的狀態(dp[i-1][j] 的值)，是否也能 match

例如 : s = "caacaa", p = "ca*"，紅色是目前要檢查的字元

s的字元: c a a c a a

match: V V V X

因為上一個字元的狀態，已經不 match 了，所以之後也無法 match 了

綜合以上，可以寫成如下的 code

	class Solution {
	public:
	bool isMatch(string s, string p) {
	const int strLen_plus1 = s.length() + 1, patternLen_plus1 = p.length() + 1;

	//> dynamic programming
	//> dp[i][j] : true if s[0..i) matches p[0..j), false otherwise
	//> note : index of s = i - 1, index of p = j - 1
	bool dp[strLen_plus1][patternLen_plus1];

	for(int i = 0; i < strLen_plus1; i++) {
	memset(dp[i], 0, patternLen_plus1);
	}

	//> initial state
	//> i.e : s = "", p = "" -> match
	dp[0][0] = true;

	//> step1
	//> dp[0][1..patternLen_plus1-1]
	//> means check s="" and p="XXXX" are matched or not
	//> obviously, match only if every char in p has suffix *
	//> for example: s="", p="abc*"
	//> so we only need to check that every char has * suffix
	//> and has 0 occurrence
	for(int j = 1; j < patternLen_plus1; j++) {
	if(p[j-1] == '*') {
	dp[0][j] = dp[0][j-2];
	}
	}

	//> setp2
	for(int i = 1, s_index = i-1; i < strLen_plus1; i++, s_index = i-1) {
	for(int j = 1, p_index = j-1; j < patternLen_plus1; j++, p_index = j-1) {
	if(p[p_index] != '*') {
	//> both current char and previous result need to be matched
	dp[i][j] = (s[s_index] == p[p_index] \|\| p[p_index] == '.') && dp[i-1][j-1];
	}
	else {
	//> zero occurrence or multi-occurence
	dp[i][j] = dp[i][j-2] \|\| ((s[s_index] == p[p_index-1] \|\| p[p_index-1] == '.') && dp[i-1][j]);
	}
	}
	}

	return dp[s.length()][p.length()];
	}
	};

view raw regularExpression.cpp hosted with ❤ by GitHub

About Software

2016年11月29日星期二

[LeetCode] 10. Regular Expression Matching

沒有留言:

張貼留言

關於我自己

2016年11月29日 星期二

[LeetCode] 10. Regular Expression Matching

沒有留言:

張貼留言

2016年11月29日星期二