我已经解决了这个问题,但无法提出通过所有测试用例的最有效问题。它在 5 个测试用例中超时。

Determine sentences contain all of the words of a phrase
0: chris and jennifer had a fight this morning
1: chris went on a holiday
2: jennifer is in prison

Query Phrases are
0: chris jennifer
1: jennifer
2: prison

Goal is to find indexes of the matching sentences for each query or -1 if there are no matching sentence exists. Order of words does not matter.

Output :
0
0 2
2

即 第一个查询在句子 0 中有匹配词,第二个查询在句子 0 和 1 中有匹配词,依此类推。

约束

  • n: 句子数
  • m: 词数
  • n, m < 10^4
  • 任何句子或查询短语中的单词数在 [1-10] 范围内
  • 每个单词最多11个字符
  • 没有单词出现在超过 10 个句子中
  • 每个单词仅由大小写字母组成
  • 每个词必须完全匹配 - 即喜欢和喜欢不匹配。

输入格式:

3
克里斯和詹妮弗今天早上吵架了
克里斯去度假了
詹妮弗在 jail 里
3
克里斯詹妮弗
詹妮弗
jail

每个 3 代表句子或查询的数量。


以下是我试过的...

<强> 1。我的第一个解决方案:

  1. 为每个句子制作HashMap
  2. 对于短语中的每个拆分词:
    2-1.检查句子 HashMap 中是否存在所有单词
    2-2。如果是这样存储索引
    2-3。如果所有句子都没有匹配的句子,则存储-1。
  3. 打印结果

令p = 句子中最大的单词数
令 k = 查询中的最大单词数
大 O 是 O(npk)

public static void textQueries(List<String> sentences, List<String> queries) { 
    List<Map<String, Integer>> sentenceMaps = createMaps(sentences); 
    String results = queryMatcher(sentenceMaps, queries); 
    System.out.println(results); 
} 
 
 
private static String queryMatcher(List<Map<String, Integer>> sentenceMaps, List<String> queries) { 
    Map<String, Integer> wordCounter = new LinkedHashMap<>(); 
    List<List<String>> results = new ArrayList<List<String>>(); 
    for (String query : queries) { 
        List<String> result = new ArrayList<>(); 
        for (int j = 0; j < sentenceMaps.size(); j++) { 
            if (isQueryFound(sentenceMaps.get(j), query, wordCounter)) { 
                result.add(j + ""); 
            } 
        } 
        results.add(result); 
    } 
    return generateResultString(results); 
} 
 
 
/* 
 * StringBuilder used to reduce delays of calling multiple System.out.println(); 
 */ 
private static String generateResultString(List<List<String>> results) { 
    StringBuilder stringBuilder = new StringBuilder(); 
    for (List<String> matchingSentenceIndexes : results) { 
        if (matchingSentenceIndexes.isEmpty()) { 
            stringBuilder.append("-1\n"); 
        } else { 
            resultStringHelper(matchingSentenceIndexes, stringBuilder); 
        } 
        //stringBuilder.append("\n"); 
    } 
    return stringBuilder.toString(); 
} 
 
/* 
 * add " " for multiple indexes result 
 */ 
private static void resultStringHelper(List<String> result, StringBuilder stringBuilder) { 
    for (int i = 0; i < result.size(); i++) { 
        stringBuilder.append(result.get(i)); 
        if (i < result.size() - 1) { 
            stringBuilder.append(" "); 
        } else if (i == result.size() - 1) { 
            stringBuilder.append("\n"); 
        } 
    } 
} 
private static boolean isQueryFound(Map<String, Integer> sentenceMap, String query, Map<String, Integer> wordCounter) { 
    String[] queryTokens = query.split(" "); 
    for (String queryToken : queryTokens) { 
        if (isMoreThan10Sentences(wordCounter, queryToken)) return false; 
        if (sentenceMap.containsKey(queryToken)) { 
            wordCounter.put(queryToken, wordCounter.getOrDefault(queryToken, 0) + 1); 
        } else { 
            return false; 
        } 
    } 
    return true; 
} 
 
private static boolean isMoreThan10Sentences(Map<String, Integer> wordCounter, String token) { 
    return wordCounter.getOrDefault(token, -1) > 10; 
} 
 
private static Map<String, Integer> initMap(String[] tokens) { 
    Map<String, Integer> map = new LinkedHashMap<>(); 
    for (String token : tokens) { 
        map.put(token, 0); 
    } 
    return map; 
} 
 
private static List<Map<String, Integer>> createMaps(List<String> sentences) { 
    List<Map<String, Integer>> maps = new ArrayList<Map<String,Integer>>(); 
    for (int i = 0; i < sentences.size(); i++) { 
        String[] tokens = sentences.get(i).split(" "); 
        maps.add(initMap(tokens)); 
    } 
    return maps; 
} 

最近 5 个测试用例超时。

对于小型测试用例,其在线编码服务器上的基准如下:
map 创建时间:9.23954E-4
查询匹配时间:3.85751E-4

map 生成很昂贵。


<强> 2。我的第二次尝试:

类似的逻辑,但应用了并发性,因为该平台最多支持 2 个线程。

多线程在这里完成:
1. Sentence -> Map generation(Concurrent map generation)
2.查询匹配(并发匹配)

public static void textQueries(List<String> sentences, List<String> queries) { 
    List<Map<String, Integer>> sentenceMaps = createMaps(sentences); 
    startTime = System.nanoTime(); 
    String results = queryMatcher(sentenceMaps, queries); 
    System.out.println(results); 
 
private static String queryMatcher(List<Map<String, Integer>> sentenceMaps, List<String> queries) { 
    List<Future<String>> futures = new ArrayList<Future<String>>(); 
    int threads = Runtime.getRuntime().availableProcessors(); 
    ExecutorService executor = Executors.newFixedThreadPool(threads); 
    String[] results = new String[threads]; 
    int length = queries.size() / threads; 
    for (int i = 0; i < threads; i++) { 
        int queryStart = length * i; 
        int queryEnd = length * (i+1); 
        if (i == threads -1 && queries.size() % threads != 0) queryEnd++; 
        Callable<String> worker = new QueryMatcher(sentenceMaps, queries, queryStart, queryEnd); 
        Future<String> submit = executor.submit(worker); 
        futures.add(submit); 
    } 
 
    for (int i = 0; i < futures.size(); i++) { 
        try { 
            results[i] = futures.get(i).get(); 
        } catch (InterruptedException e) { 
            e.printStackTrace(); 
        } catch (ExecutionException e) { 
            e.printStackTrace(); 
        } 
    } 
    String returnString = concaString(results); 
    executor.shutdown(); 
    return returnString; 
} 
 
private static String concaString(String[] results) { 
    StringBuilder stringBuilder = new StringBuilder(); 
    for (int i = 0; i < results.length; i++) { 
        stringBuilder.append(results[i]); 
    } 
    return stringBuilder.toString(); 
} 
 
private static String generateResultString(List<List<String>> results) { 
    StringBuilder stringBuilder = new StringBuilder(); 
    for (List<String> matchingSentenceIndexes : results) { 
        if (matchingSentenceIndexes.isEmpty()) { 
            stringBuilder.append("-1\n"); 
        } else { 
            resultStringHelper(matchingSentenceIndexes, stringBuilder); 
        } 
        //stringBuilder.append("\n"); 
    } 
    return stringBuilder.toString(); 
} 
 
private static void resultStringHelper(List<String> result, StringBuilder stringBuilder) { 
    for (int i = 0; i < result.size(); i++) { 
        stringBuilder.append(result.get(i)); 
        if (i < result.size() - 1) { 
            stringBuilder.append(" "); 
        } else if (i == result.size() - 1) { 
            stringBuilder.append("\n"); 
        } 
    } 
} 
private static boolean isQueryFound(Map<String, Integer> sentenceMap, String query, Map<String, Integer> wordCounter) { 
    String[] queryTokens = query.split(" "); 
    for (String queryToken : queryTokens) { 
        if (isMoreThan10Sentences(wordCounter, queryToken)) return false; 
        if (sentenceMap.containsKey(queryToken)) { 
            wordCounter.put(queryToken, wordCounter.getOrDefault(queryToken, 0) + 1); 
        } else { 
            return false; 
        } 
    } 
    return true; 
} 
 
private static boolean isMoreThan10Sentences(Map<String, Integer> wordCounter, String token) { 
    return wordCounter.getOrDefault(token, -1) > 10; 
} 
 
private static boolean isQueryFound(Map<String, Integer> sentenceMap, String query) { 
    String[] queryTokens = query.split(" "); 
    //Map<String, Integer> duplicateChecker = new LinkedHashMap<String, Integer>(); 
 
    for (String queryToken : queryTokens) { 
        if (sentenceMap.containsKey(queryToken)) { 
            //if (!duplicateChecker(duplicateChecker, sentenceMap, queryToken)) 
            //return false; 
        } else { 
            return false; 
        } 
    } 
    return true; 
} 
 
/* 
 * this method checks for the case when there are duplicate words in query 
 * i.e. sentence containing 2 hello will return false of queries with 3 hello 
 */ 
private static boolean duplicateChecker(Map<String, Integer> duplicateChecker, Map<String, Integer> sentenceMap, String queryToken) { 
    if (duplicateChecker.containsKey(queryToken)) { 
        if (duplicateChecker.get(queryToken) == 0) return false; 
        duplicateChecker.put(queryToken, duplicateChecker.get(queryToken) - 1); 
    } else { 
        duplicateChecker.put(queryToken, sentenceMap.get(queryToken) - 1); 
    } 
    return true; 
} 
 
private static List<Map<String, Integer>> createMaps(List<String> sentences) { 
    List<Map<String, Integer>> maps = new ArrayList<>(); 
    int threads = Runtime.getRuntime().availableProcessors(); 
    ExecutorService executor = Executors.newFixedThreadPool(threads); 
    List<Future<List<Map<String, Integer>>>> futures = new ArrayList<Future<List<Map<String, Integer>>>>(); 
    int length = (sentences.size()) / threads; 
 
    for (int i = 0; i < threads; i++) { 
        int start = i * length; 
        int end = (i+1) * length; 
        if (i == threads - 1 && sentences.size() % threads != 0) end++; 
        List<String> splitSentence = new ArrayList(sentences.subList(start, end)); 
 
        Callable<List<Map<String, Integer>>> worker = new MapMaker(splitSentence); 
        Future<List<Map<String, Integer>>> submit = executor.submit(worker); 
        futures.add(submit); 
    } 
 
    for (int i = 0; i < futures.size(); i++) { 
        try { 
            for (Map<String, Integer> map : futures.get(i).get()) { 
                maps.add(map); 
            } 
        } catch (InterruptedException e) { 
            e.printStackTrace(); 
        } catch (ExecutionException e) { 
            e.printStackTrace(); 
        } 
    } 
    executor.shutdown(); 
    return maps; 
} 
 
private synchronized static Map<String, Integer> initMap(String[] tokens) { 
    Map<String, Integer> map = new LinkedHashMap<>(); 
    for (String token : tokens) { 
        map.put(token, 0); 
        //            map.put(token, map.getOrDefault(map.get(token), 1) + 1); 
    } 
    return map; 
} 
 
 
public static class MapMaker implements Callable<List<Map<String, Integer>>> { 
    private List<String> sentences; 
 
    @Override 
    public List<Map<String, Integer>> call() throws Exception { 
        List<Map<String, Integer>> maps = new ArrayList<Map<String,Integer>>(); 
        for (int i = 0; i < sentences.size(); i++) { 
            String[] tokens = sentences.get(i).split(" "); 
            maps.add(initMap(tokens)); 
        } 
        return maps; 
    } 
 
    public MapMaker(List<String> sentences) { 
        this.sentences = sentences; 
    } 
} 
 
public static class QueryMatcher implements Callable<String> { 
    private List<Map<String, Integer>> sentenceMaps; 
    private List<String> queries; 
    private int queryStart; 
    private int queryEnd; 
 
    @Override 
    public String call() throws Exception { 
        List<List<String>> results = new ArrayList<List<String>>(); 
        for (int i = queryStart; i < queryEnd; i++) { 
            List<String> result = new ArrayList<>(); 
            String query = queries.get(i); 
            for (int j = 0; j < sentenceMaps.size(); j++) { 
                if (isQueryFound(sentenceMaps.get(j), query)) { 
                    result.add(j + ""); 
                } 
            } 
            results.add(result); 
        } 
        return generateResultString(results); 
    } 
 
    public QueryMatcher(List<Map<String, Integer>> sentenceMaps, List<String> queries, int queryStart, int queryEnd) { 
        this.sentenceMaps = sentenceMaps; 
        this.queries = queries; 
        this.queryStart = queryStart; 
        this.queryEnd = queryEnd; 
    } 
} 

虽然我希望对大型测试用例进行一些加速,但它仍然给了 5 个测试用例超时。

对于小型测试用例,由于创建池的额外开销,它增加了 map 生成时间。

基准时间:
map 时间:0.007669489
查询匹配时间:3.22923E-4


<强> 3。我的第三个解决方案——用 C++ 编写上述代码

我怀疑是不是Java导致了超时。
该平台实际上为 C++ 提供了更短的计算时间,所以令我惊讶的是,它仍然提供了相同的 5 次超时。


<强> 4。我的第四种方法正则表达式,

我知道它会更慢,但我还是徒劳地尝试了。 Big O 在这里实际上比较慢,因为我需要按单词对每个句子进行排序以避免 n!正则表达式的排列...

public static void textQueries(List<String> sentences, List<String> queries) { 
    stringSort(sentences); 
    stringSort(queries); 
    StringBuilder stringBuilder = new StringBuilder(); 
 
    boolean isExist = false; 
    for (int index = 0; index < queries.size(); index++) { 
        String query = queries.get(index); 
        isExist = false; 
        for (int i = 0; i < sentences.size(); i++) { 
            if (Matcher(buildNaturalLanguage(query), sentences.get(i))) { 
                stringBuilder.append(i + " "); 
                isExist = true; 
            } 
        } 
        if (!isExist) stringBuilder.append("-1"); 
        if (index != queries.size() - 1) stringBuilder.append("\n"); 
    } 
    System.out.println(stringBuilder.toString()); 
} 
 
private static void stringSort(List<String> strings) { 
    for (int i = 0; i < strings.size(); ++i) { 
        String string = strings.get(i); 
        String[] stringParts = string.split(" "); 
        StringBuilder stringBuilder = new StringBuilder(); 
        Arrays.sort(stringParts); 
        for (int j = 0; j < stringParts.length; j++) { 
            stringBuilder.append(stringParts[j] + " "); 
        } 
        strings.set(i, stringBuilder.toString());  // sure I made it back to string for code cleaness but you can return String[] for efficiency.. But only minor improvement. 
    } 
} 
 
private static String buildNaturalLanguage(String query) { 
    // System.out.println("query " + query); 
    String[] stringParts = query.split(" "); 
    String regular = "(([a-zA-Z])*(\\s))*"; 
    for (String word : stringParts) { 
        regular += word + "(\\s(([a-zA-Z])*(\\s))*)"; 
    } 
    return regular; 
} 
 
private static boolean Matcher(String regular, String sentence) { 
    Pattern p = Pattern.compile(regular); 
    Matcher m = p.matcher(sentence); 
    return m.find(); 
} 

结果: 不仅超时,它还以某种方式导致 2 个额外的未公开测试用例出错(错误答案)。我不知道为什么......

Ω(nm^2 + plogp)..假设正则表达式匹配是 O(m)


我只能想到在运行主要算法之前过滤一些查询或句子的可能性? (约束:每个单词最多匹配 10 个)。

然而,我的第一个和第二个解决方案仍然实现了这个约束检查部分。因此可能需要更智能的过滤。

问题是我认为 BCR - 最好的可能比率是 O(MNP),您仍然需要检查每个查询和句子,如果不使用正则表达式,还需要拆分它们。

我完全迷失在这里,我怎样才能真正提高速度呢?

非常感谢。

请您参考如下方法:

维护 HashMap这将映射 String s 至 Set<Int> .我们的想法是跟踪给定单词出现在哪些句子中。我们使用集合而不是数组来支持高效地计算两个集合的交集。

对于每个输入句子:

  • 将其分词成词,并将当前句子的索引添加到当前分词对应的Set中。

对于每个查询短语:

  • 将其标记为单词。
  • 查询每个词对应的索引集
  • 取所有这些集合的交集。

时间复杂度:鉴于每个句子中有 10 个单词,构建 HashMap 的成本为 O(10N log N)。每个查询的成本是 O(10 * log(N))。


评论关闭
IT虾米网

微信公众号号:IT虾米 (左侧二维码扫一扫)欢迎添加!