Dirty words checker

Краен срок
04.12.2022 18:00

Срокът за предаване на решения е отминал

Време е за шиб*но предизвикателство!

Искаме да направим функция, която да проверява даден текст за мръсни думи (възможно е да я ползваме за бъдещи постове, така че се постарайте!).
Искаме функцията find_dirty_words да приема:

  • Текст, който да проверяваме за мръсни думи
  • Колекция от мръсни думи, които сме предефинирали

Освен това обаче, ще дефинираме някои правила за мръснотата на думите:

  • "Мръсни" са думите, които са в колекцията от мръсни думи, но...
  • Мръсните думи могат да са мръсни "изрази" - "да го духаш".
  • Ако мръсните думи са оградени от скобите () и [] - спират да бъдат мръсни (само мръсните думи трябва да са в скоби, т.е. ако имаме думата "гъз" в колекцията от мръсни думи, (гъз) е позволено, (ти си гъз) - не).
  • Думите оградени от {} все още са мръсни, тъй като по наше време самите тези скоби се третираха като мръсни.
  • В края на изречението имаме право да използваме мръсни думи, дейба.
  • Думите с wildcard (*) водим цензурирани мръсни думи ("т***к"), ако съвпадат с такива от колекцията от мръсни думи.
  • Цензурираните мръсни думи "съвпадат" с дума от колекцията според най-интуитивните ни правила (а не нотацията за регекс) - "т***к" съвпада с "тоник", "тупак" и "ташак" (ако решим да включим някоя от тях в колекцията от мръсни думи).
  • Ако имате цензурирана дума в край на изречение или в скоби - третирайте я като цензурирана, а не като позволена мръсна.

Като резултат искаме да получим речник, оформен по следния начин:

{
    'dirty_words': [...], # намерените мръсни думи
    'censored_dirty_words': [...] # цензурираните мръсни думи
    'allowed_dirty_words': [...] # мръсните думи, които отговарят на правилата за използване на мръсни думи
}

Нагледно

>>> print(find_dirty_words("I don't know whether to worship at your feet or (spank) the living sh*t out of you.",
                          ["shit", "spank", "you", "feet"])) # Christian Grey
{'dirty_words': ['feet'], 'censored_dirty_words': ['sh*t'], 'allowed_dirty_words': ['spank', 'you']}

>>> print(find_dirty_words("""Искам с теб да бъде много тайно...
                           И да свършиш в мене уж случайно...
                           Да ме [ближеш] сякаш съм от захар...
                           Блъскай силно хайде – MOTHER **CKER...""",
                           ["свършиш", "ближеш", "захар", "Блъскам", "DUCKER"])) # Азис
{'dirty_words': ['свършиш'], 'censored_dirty_words': ['**CKER'], 'allowed_dirty_words': ['ближеш', 'захар']}

Бележки

  • Няма да бъдем гадняри и няма да ви даваме пунктуационно подвеждащ инпут (например абревиатури - В.Б.).
  • С други думи, приемайте, че ако видите .?! - имате край на изречението.
  • В аутпута трябва да имате всички срещания на думите, т.е. ако една мръсна дума се среща 4 пъти в текста - очакваме да я има 4 пъти в мръсният ви речник.
  • Не мислете за главни и малки букви, инпута и очакваните резултати ще са съобразени с case-а, не е нужно вие да правите каквото и да било по темата.
  • Няма да бъдем гадняри и по друг параграф - мръсните думи (или изрази) в скоби няма да имат whitespace покрай, с други думи няма да очакваме от вас да прихващате следното - ( готини цици ).

Дисклеймър

Въпреки всичките простотии, държим да отбележим, че не сме селяни в лошата употреба на думата и цинизмът ни си има граници. Не бихме говорили по крайно вулгарният начин, описан в предизвикателството, извън форсмажорни обстоятелства, свързани с трафика. Всичко по-горе е единствено за целта на предизвикателството, мамка му. Може би освен това в последната бележка, който си го има - си го има.

Много важен дисклеймър

Не се притеснявайте от кофти syntax highliting-a, който е много възможно да видите на сайта, след предаване на решенията си.
Сайтът се държи странно с нестандартни поредици от символи (id est регулярни изрази), но ако кодът ви върви като го тествате ръчно, няма да има проблем и при нашите тестове.

Решения

Виктор
  • Коректно
  • 8 успешни тест(а)
  • 0 неуспешни тест(а)
Виктор
import re
DIRTY_RE = r'(?<=[^\(\[)]\b){}(?=\b[^\)\]\.\?!])'
ALLOWED_DIRTY_RE = r'(?<=[\(\[]\b){0}(?=\b[\)\]])|\b{0}(?=\b[\.\?!])'
def find_dirty_words(text, dirty_words):
dirty_info = {'dirty_words': [],
'censored_dirty_words': [],
'allowed_dirty_words': []}
censored = re.findall(r'[\w\*]*\*[\w\*]*', text)
for word in dirty_words:
dirty = re.findall(DIRTY_RE.format(word), text)
dirty_info['dirty_words'].extend(dirty)
for cens_word in censored:
if re.match(cens_word.replace('*', '.'), word):
dirty_info['censored_dirty_words'].append(cens_word)
allowed_dirty = re.findall(ALLOWED_DIRTY_RE.format(word), text)
dirty_info['allowed_dirty_words'].extend(allowed_dirty)
return dirty_info
........
----------------------------------------------------------------------
Ran 8 tests in 0.154s

OK
Роберт Борисов
  • Некоректно
  • 4 успешни тест(а)
  • 4 неуспешни тест(а)
Роберт Борисов
import re
def find_dirty_words(text, bad_words):
to_be_checked = []
allowed = []
cens_nr = []
censored = []
refactor = []
first = 'dirty_words'
second = 'censored_dirty_words'
third = 'allowed_dirty_words'
finald = {first: [], second: [], third: []}
bad = []
for bad_word in bad_words:
squared_brackets = bracket_regex_generator(['[', ']'], bad_word)
to_be_checked.extend(re.findall(squared_brackets, text))
curly_brackets = bracket_regex_generator(['{', '}'], bad_word)
to_be_checked.extend(re.findall(curly_brackets, text))
parantheses = bracket_regex_generator(['(', ')'], bad_word)
to_be_checked.extend(re.findall(parantheses, text))
reg = r'\b'
for letter in bad_word:
reg += re.escape(letter)
reg += r'?(?=[ .\)\]])'
refactor.extend(re.findall(reg, text))
search_for = r'(?<!\S)'
for index,letter in enumerate(bad_word):
search_for += r'[\*' + re.escape(letter) + r']'
search_for += r'?(?=[ \]\)\.])'
cens_nr.extend(re.findall(search_for, text))
for word in to_be_checked:
if word[1:-1] in bad_words:
if word[0] != '{' and word[-1] != '}':
allowed.append(word[1:-1])
else:
bad.append(word[1:-1])
for word in refactor:
if word not in allowed:
num = len(re.findall(word + r'\.{1,3}', text))
if not re.findall(word + r'\.{1,3}', text):
bad.append(word)
else:
allowed.extend([word for _ in range(num)])
for index, word in enumerate(cens_nr):
if '*' in word:
censored.append(word)
finald[first].extend(bad)
finald[second].extend(censored)
finald[third].extend(allowed)
return finald
def bracket_regex_generator(type_both, word):
result = r'' + re.escape(type_both[0])
for letter in word:
result += letter
result += re.escape(type_both[1])
return result
.FFF.F..
======================================================================
FAIL: test_censored_only (test.TestDirtyWords)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/storage/deedee/data/rails/pyfmi-2022/releases/20221115154139/lib/language/python/runner.py", line 67, in thread
    raise result
AssertionError: Element counts were not equal:
First has 0, Second has 1:  's****ed'

======================================================================
FAIL: test_censored_within_brackets_or_punctuation (test.TestDirtyWords)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/storage/deedee/data/rails/pyfmi-2022/releases/20221115154139/lib/language/python/runner.py", line 67, in thread
    raise result
AssertionError: Element counts were not equal:
First has 0, Second has 1:  'mou**'
First has 0, Second has 1:  '**arving'
First has 0, Second has 1:  'p*owl'

======================================================================
FAIL: test_dirty_before_non_terminating_punctuation (test.TestDirtyWords)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/storage/deedee/data/rails/pyfmi-2022/releases/20221115154139/lib/language/python/runner.py", line 67, in thread
    raise result
AssertionError: Element counts were not equal:
First has 1, Second has 4:  'me'
First has 0, Second has 1:  'thirsty'
First has 0, Second has 1:  'cankered'

======================================================================
FAIL: test_dirty_in_other_words (test.TestDirtyWords)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/storage/deedee/data/rails/pyfmi-2022/releases/20221115154139/lib/language/python/runner.py", line 67, in thread
    raise result
AssertionError: Element counts were not equal:
First has 2, Second has 0:  'as'

----------------------------------------------------------------------
Ran 8 tests in 0.190s

FAILED (failures=4)
Стилиян Иванов
  • Некоректно
  • 0 успешни тест(а)
  • 8 неуспешни тест(а)
Стилиян Иванов
import re
def find_dirty_words(text, dirty_words):
result = {'dirty_words': [], 'censored_dity_words': [], 'allowed_dirty_words': []}
for dirty_word in dirty_words:
if dirty_word in text:
allowed_expression = r'\({word}\)|\[{word}\]|\b{word}[\.\?\!]'.format(word = dirty_word)
result['allowed_dirty_words'].extend([dirty_word] * len(re.findall(allowed_expression, text)))
if len(re.findall(allowed_expression, text)) == 0:
dirty_expression = r'\b{word}\b'.format(word = dirty_word)
result['dirty_words'].extend([dirty_word] * len(re.findall(dirty_expression, text)))
else:
censored_expression = r''.join(['[\*' + letter + ']' for letter in dirty_word])
result['censored_dity_words'].extend(re.findall(censored_expression, text))
return result
EEEEEEEE
======================================================================
ERROR: test_allowed_only (test.TestDirtyWords)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/storage/deedee/data/rails/pyfmi-2022/releases/20221115154139/lib/language/python/runner.py", line 67, in thread
    raise result
KeyError: 'censored_dity_words'

======================================================================
ERROR: test_censored_only (test.TestDirtyWords)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/storage/deedee/data/rails/pyfmi-2022/releases/20221115154139/lib/language/python/runner.py", line 67, in thread
    raise result
KeyError: 'censored_dity_words'

======================================================================
ERROR: test_censored_within_brackets_or_punctuation (test.TestDirtyWords)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/storage/deedee/data/rails/pyfmi-2022/releases/20221115154139/lib/language/python/runner.py", line 67, in thread
    raise result
KeyError: 'censored_dity_words'

======================================================================
ERROR: test_dirty_before_non_terminating_punctuation (test.TestDirtyWords)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/storage/deedee/data/rails/pyfmi-2022/releases/20221115154139/lib/language/python/runner.py", line 67, in thread
    raise result
KeyError: 'censored_dity_words'

======================================================================
ERROR: test_dirty_everything (test.TestDirtyWords)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/storage/deedee/data/rails/pyfmi-2022/releases/20221115154139/lib/language/python/runner.py", line 67, in thread
    raise result
KeyError: 'censored_dity_words'

======================================================================
ERROR: test_dirty_in_other_words (test.TestDirtyWords)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/storage/deedee/data/rails/pyfmi-2022/releases/20221115154139/lib/language/python/runner.py", line 67, in thread
    raise result
KeyError: 'censored_dity_words'

======================================================================
ERROR: test_dirty_only (test.TestDirtyWords)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/storage/deedee/data/rails/pyfmi-2022/releases/20221115154139/lib/language/python/runner.py", line 67, in thread
    raise result
KeyError: 'censored_dity_words'

======================================================================
ERROR: test_dirty_with_curly_brackets (test.TestDirtyWords)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/storage/deedee/data/rails/pyfmi-2022/releases/20221115154139/lib/language/python/runner.py", line 67, in thread
    raise result
KeyError: 'censored_dity_words'

----------------------------------------------------------------------
Ran 8 tests in 0.134s

FAILED (errors=8)
Даниела Дишлянова
  • Некоректно
  • 5 успешни тест(а)
  • 3 неуспешни тест(а)
Даниела Дишлянова
import re
def find_dirty_words(text, dirty_words):
found_words = {
'dirty_words': [],
'censored_dirty_words': [],
'allowed_dirty_words': []
}
for dirty_word in dirty_words:
for matched_word in re.finditer(fr'{dirty_word}', text):
start, end = matched_word.span()
if re.search(fr'{dirty_word}[\.!?]', text[start:end + 1]) is not None:
found_words['allowed_dirty_words'].append(matched_word.group())
elif re.search(fr'[(\[]{dirty_word}[)\]]', text[start - 1:end + 1]) is not None:
found_words['allowed_dirty_words'].append(matched_word.group())
elif re.search(fr'(\s)+{dirty_word}(\s)+', text[start - 1:end + 1]) is not None:
found_words['dirty_words'].append(matched_word.group())
'''censored'''
for censored_word in re.finditer(fr'(\w)*(\*)+(\w)*(\*)*(\w)*', text):
censored_pattern = censored_word.group().replace('*','.')
for dirty_word in dirty_words:
if re.search(censored_pattern, dirty_word) is not None:
found_words['censored_dirty_words'].append(censored_word.group())
break
return found_words
...F.F.F
======================================================================
FAIL: test_dirty_before_non_terminating_punctuation (test.TestDirtyWords)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/storage/deedee/data/rails/pyfmi-2022/releases/20221115154139/lib/language/python/runner.py", line 67, in thread
    raise result
AssertionError: Element counts were not equal:
First has 1, Second has 4:  'me'
First has 0, Second has 1:  'thirsty'
First has 0, Second has 1:  'cankered'

======================================================================
FAIL: test_dirty_in_other_words (test.TestDirtyWords)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/storage/deedee/data/rails/pyfmi-2022/releases/20221115154139/lib/language/python/runner.py", line 67, in thread
    raise result
AssertionError: Element counts were not equal:
First has 1, Second has 0:  'men'

======================================================================
FAIL: test_dirty_with_curly_brackets (test.TestDirtyWords)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/storage/deedee/data/rails/pyfmi-2022/releases/20221115154139/lib/language/python/runner.py", line 67, in thread
    raise result
AssertionError: Element counts were not equal:
First has 0, Second has 1:  'splayed'
First has 0, Second has 1:  'feast'

----------------------------------------------------------------------
Ran 8 tests in 0.136s

FAILED (failures=3)
Радостин Маринов
  • Некоректно
  • 6 успешни тест(а)
  • 2 неуспешни тест(а)
Радостин Маринов
from re import findall
from itertools import chain
def find_censored_dirty_words(text, dirty_words):
return list(chain.from_iterable(
filter(lambda word: '*' in word, findall(''.join([rf"(?:{char}|\*)" for char in dirty_word]), text)) for
dirty_word in dirty_words))
def find_normal_dirty_words(text, dirty_words):
return list(chain.from_iterable(
findall(rf"((?:^|(?<=\s)){word}(?=\s)|(?<={{){word}(?=}}))", text) for word in dirty_words))
def find_allowed_dirty_words(text, dirty_words):
return list(chain.from_iterable(
findall(rf"((?<=\[){word}(?=])|(?<=\(){word}(?=\))|(?<=\s){word}(?=[.?!]))", text) for word in dirty_words))
def find_dirty_words(text, dirty_words):
return {
'dirty_words': find_normal_dirty_words(text, dirty_words),
'censored_dirty_words': find_censored_dirty_words(text, dirty_words),
'allowed_dirty_words': find_allowed_dirty_words(text, dirty_words)
}
.F.F....
======================================================================
FAIL: test_censored_only (test.TestDirtyWords)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/storage/deedee/data/rails/pyfmi-2022/releases/20221115154139/lib/language/python/runner.py", line 67, in thread
    raise result
AssertionError: Element counts were not equal:
First has 1, Second has 0:  '****'

======================================================================
FAIL: test_dirty_before_non_terminating_punctuation (test.TestDirtyWords)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/storage/deedee/data/rails/pyfmi-2022/releases/20221115154139/lib/language/python/runner.py", line 67, in thread
    raise result
AssertionError: Element counts were not equal:
First has 1, Second has 4:  'me'
First has 0, Second has 1:  'thirsty'
First has 0, Second has 1:  'cankered'

----------------------------------------------------------------------
Ran 8 tests in 0.188s

FAILED (failures=2)
Александър Сариков
  • Коректно
  • 8 успешни тест(а)
  • 0 неуспешни тест(а)
Александър Сариков
import re
def find_dirty_words(text_input, dirty_collection):
result = {
'dirty_words': [],
'censored_dirty_words': [],
'allowed_dirty_words': []
}
for dirty_word in dirty_collection:
result['allowed_dirty_words'].extend([word for word in re.findall(rf'\(({dirty_word})\)', text_input) if word]) # inside ()
result['dirty_words'].extend([word for word in re.findall(rf'\(.+\s+({dirty_word})\)', text_input) if word]) # inside ()
result['dirty_words'].extend([word for word in re.findall(rf'\(({dirty_word})\s+.+\)', text_input) if word]) # inside ()
result['allowed_dirty_words'].extend([word for word in re.findall(rf'\[({dirty_word})\]', text_input) if word]) # inside []
result['dirty_words'].extend([word for word in re.findall(rf'\[.+\s+({dirty_word})\]', text_input) if word]) # inside []
result['dirty_words'].extend([word for word in re.findall(rf'\[({dirty_word})\s+.+\]', text_input) if word]) # inside []
result['dirty_words'].extend([word for word in re.findall(rf'{{({dirty_word})}}', text_input) if word]) # inside {}
result['dirty_words'].extend([word for word in re.findall(rf'{{.+\s+({dirty_word})}}', text_input) if word]) # inside {}
result['dirty_words'].extend([word for word in re.findall(rf'{{({dirty_word})\s+.+}}', text_input) if word]) # inside {}
# Assuming that rules are still applied even if a word is decorated with symbols ("': etc.)
result['dirty_words'].extend([word for word in re.findall(rf"(?:^|\s+)[^\w\[{{(]*({dirty_word})[^\w\]}}).!?]*\s+", text_input) if word]) # whole word
result['allowed_dirty_words'].extend([word for word in re.findall(rf'\b[^\w\[{{(]*({dirty_word})[^\w\]}})]*[.!?]+', text_input) if word]) # end of sentence
# find all censored single words
censored_words = list(filter(lambda word: word, re.findall(r'[\w*-]*\*+[\w*-]*', text_input)))
for possibly_dirty_word in censored_words:
reg_expr = re.compile(possibly_dirty_word.replace('*','\w'))
if (re.fullmatch(reg_expr, dirty_word)):
result['censored_dirty_words'].append(possibly_dirty_word)
# find all censored phrases
censored_phrases = list(filter(lambda word: word, re.findall(r'[\w*-]*\*+[\w*-]*(?:\s+[\w*-]*\*+[\w*-]*)+', text_input)))
for possibly_dirty_phrase in censored_phrases:
reg_expr = re.compile(possibly_dirty_phrase.replace('*','\w'))
if (re.fullmatch(reg_expr, dirty_word)):
result['censored_dirty_words'].append(possibly_dirty_phrase)
return result
........
----------------------------------------------------------------------
Ran 8 tests in 0.296s

OK
Харут Партамиан
  • Некоректно
  • 5 успешни тест(а)
  • 3 неуспешни тест(а)
Харут Партамиан
from re import split, findall
def is_censored_dirty_word(censored_word, dirty_word):
for pair in zip(censored_word, dirty_word):
if pair[0] == '*':
continue
if pair[0] != pair[1]:
return False
return True
def find_dirty_words(sentence: str, dirty_words_input: list[str]):
dirty_words = []
censored_dirty_words = []
allowed_dirty_words = []
end_of_sentece_words = findall("([^.?!\s]+)[.?!]", sentence)
censored_words = findall("[A-Za-z]*\*+[A-Za-z]*", sentence)
split_original_sentence = split("[,.!?\s]+", sentence)
print(split_original_sentence)
for dirty_word in dirty_words_input:
if dirty_word in split_original_sentence:
if dirty_word in end_of_sentece_words:
allowed_dirty_words.append(dirty_word)
else:
dirty_words.extend([dirty_word] * split_original_sentence.count(dirty_word))
if f"({dirty_word})" in split_original_sentence or f"[{dirty_word}]" in split_original_sentence:
allowed_dirty_words.extend([dirty_word] *
(split_original_sentence.count(f"({dirty_word})") +
split_original_sentence.count(f"[{dirty_word}]")))
if f"({dirty_word}" in split_original_sentence or f"{dirty_word})" in split_original_sentence or \
f"[{dirty_word}" in split_original_sentence or f"{dirty_word}]" in split_original_sentence or \
f"{{{dirty_word}" in split_original_sentence or f"{dirty_word}}}" in split_original_sentence:
dirty_words.extend([dirty_word] *
(split_original_sentence.count(f"({dirty_word}") +
split_original_sentence.count(f"{dirty_word})") +
split_original_sentence.count(f"[{dirty_word}") +
split_original_sentence.count(f"{dirty_word}]") +
split_original_sentence.count(f"{{{dirty_word}") +
split_original_sentence.count(f"{dirty_word}}}")
)
)
if f"{{{dirty_word}}}" in split_original_sentence:
dirty_words.extend([dirty_word] * split_original_sentence.count(f"{{{dirty_word}}}"))
for censored_word in censored_words:
if len(censored_word) == len(dirty_word) and is_censored_dirty_word(censored_word, dirty_word):
censored_dirty_words.append(censored_word)
return {'dirty_words': dirty_words,
'censored_dirty_words': censored_dirty_words,
'allowed_dirty_words': allowed_dirty_words}
.F.FF...
======================================================================
FAIL: test_censored_only (test.TestDirtyWords)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/storage/deedee/data/rails/pyfmi-2022/releases/20221115154139/lib/language/python/runner.py", line 67, in thread
    raise result
AssertionError: Element counts were not equal:
First has 0, Second has 1:  'c*m*s'

======================================================================
FAIL: test_dirty_before_non_terminating_punctuation (test.TestDirtyWords)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/storage/deedee/data/rails/pyfmi-2022/releases/20221115154139/lib/language/python/runner.py", line 67, in thread
    raise result
AssertionError: Element counts were not equal:
First has 3, Second has 4:  'me'

======================================================================
FAIL: test_dirty_everything (test.TestDirtyWords)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/storage/deedee/data/rails/pyfmi-2022/releases/20221115154139/lib/language/python/runner.py", line 67, in thread
    raise result
AssertionError: Element counts were not equal:
First has 0, Second has 1:  'ц*цата'

----------------------------------------------------------------------
Ran 8 tests in 0.142s

FAILED (failures=3)
Сузана Петкова
  • Некоректно
  • 4 успешни тест(а)
  • 4 неуспешни тест(а)
Сузана Петкова
import re
def word_check(word, checker):
new_word = [el for el in word]
for i in range(len(new_word)):
if new_word[i] == '*':
new_word[i] = checker[i]
if ''.join(new_word) == checker:
return True
else:
return word
def find_dirty_words(text, dirty_words):
result = {'dirty_words': [], 'censored_dirty_words': [], 'allowed_dirty_words': []}
words_in_brackets = []
words_in_brackets += re.findall(r'\((.*?)\)', text)
words_in_brackets += re.findall(r'\[(.*?)\]', text)
end_of_sentence_words = re.findall(r'(\w+)[.!?]', text)
censored_words = re.findall(r'(\S*[*\*]\S*[^.?!\s])', text)
for word in dirty_words:
if word in text:
word_occurances = text.count(word)
if word_occurances == 1:
if not (word in words_in_brackets or word in end_of_sentence_words):
result['dirty_words'].append(word)
else:
n = end_of_sentence_words.count(word)
word_occurances -= n
[result['dirty_words'].append(word) for _ in range(word_occurances)]
if word in words_in_brackets or word in end_of_sentence_words:
result['allowed_dirty_words'].append(word)
format_censored_words = []
for word in censored_words:
word1 = re.findall(r'\[(.*?)\]', word)
word2 = re.findall(r'\((.*?)\)', word)
if word1:
censored_words += word1
if word2:
censored_words += word2
format_censored_words.append([el for el in word])
for dirty_word in dirty_words:
for word in format_censored_words:
if len(word) == len(dirty_word):
if word_check(word, dirty_word) == True:
result['censored_dirty_words'].append(''.join(word))
else:
word = word_check(word, dirty_word)
return result
.FFF.F..
======================================================================
FAIL: test_censored_only (test.TestDirtyWords)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/storage/deedee/data/rails/pyfmi-2022/releases/20221115154139/lib/language/python/runner.py", line 67, in thread
    raise result
AssertionError: Element counts were not equal:
First has 0, Second has 1:  's****ed'

======================================================================
FAIL: test_censored_within_brackets_or_punctuation (test.TestDirtyWords)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/storage/deedee/data/rails/pyfmi-2022/releases/20221115154139/lib/language/python/runner.py", line 67, in thread
    raise result
AssertionError: Element counts were not equal:
First has 0, Second has 1:  'mou**'

======================================================================
FAIL: test_dirty_before_non_terminating_punctuation (test.TestDirtyWords)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/storage/deedee/data/rails/pyfmi-2022/releases/20221115154139/lib/language/python/runner.py", line 67, in thread
    raise result
AssertionError: Element counts were not equal:
First has 5, Second has 4:  'me'

======================================================================
FAIL: test_dirty_in_other_words (test.TestDirtyWords)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/storage/deedee/data/rails/pyfmi-2022/releases/20221115154139/lib/language/python/runner.py", line 67, in thread
    raise result
AssertionError: Element counts were not equal:
First has 1, Second has 0:  'ass'
First has 2, Second has 0:  'side'
First has 1, Second has 0:  'men'
First has 1, Second has 0:  'not'

----------------------------------------------------------------------
Ran 8 tests in 0.146s

FAILED (failures=4)
Емилиан Спасов
  • Некоректно
  • 5 успешни тест(а)
  • 3 неуспешни тест(а)
Емилиан Спасов
import string
def is_word_allowed(word):
if word[0] == "[" and word[-1] == "]":
return True
elif word[0] == "(" and word[-1] == ")":
return True
return False
def is_end_of_sentence(word):
stop_tokens = ('...', '.', '?', '!', '!!!')
for token in stop_tokens:
if word.endswith(token):
return True
return False
def matches(word, dirty_word):
if len(word) != len(dirty_word):
return False
for i in range(len(word)):
if word[i] != "*" and word[i] != dirty_word[i]:
return False
return True
def is_dirty(word, dirty_words):
for dirty_word in dirty_words:
if matches(word, dirty_word):
return True
return False
def get_classification(word, is_last, is_allowed):
if "*" in word:
return "censored_dirty_words"
elif is_last or is_allowed:
return "allowed_dirty_words"
return "dirty_words"
def get_cleaned_word(word):
for character in string.punctuation:
if character in ("*"):
continue
word = word.replace(character, '')
return word
def find_dirty_words(text, dirty_words):
result = {
"dirty_words": [],
"censored_dirty_words": [],
"allowed_dirty_words": [],
}
for raw_word in text.split(" "):
raw_word = raw_word.replace("\n", '')
word = get_cleaned_word(raw_word)
if is_dirty(word, dirty_words):
classification = get_classification(word,
is_end_of_sentence(raw_word),
is_word_allowed(raw_word))
result[classification].append(word)
return result
F...F..F
======================================================================
FAIL: test_allowed_only (test.TestDirtyWords)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/storage/deedee/data/rails/pyfmi-2022/releases/20221115154139/lib/language/python/runner.py", line 67, in thread
    raise result
AssertionError: Element counts were not equal:
First has 1, Second has 0:  'teats'

======================================================================
FAIL: test_dirty_everything (test.TestDirtyWords)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/storage/deedee/data/rails/pyfmi-2022/releases/20221115154139/lib/language/python/runner.py", line 67, in thread
    raise result
AssertionError: Element counts were not equal:
First has 0, Second has 1:  'ц*цата'

======================================================================
FAIL: test_dirty_with_curly_brackets (test.TestDirtyWords)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/storage/deedee/data/rails/pyfmi-2022/releases/20221115154139/lib/language/python/runner.py", line 67, in thread
    raise result
AssertionError: Element counts were not equal:
First has 0, Second has 1:  'feast'

----------------------------------------------------------------------
Ran 8 tests in 0.114s

FAILED (failures=3)
Йоанна Кръстева
  • Некоректно
  • 5 успешни тест(а)
  • 3 неуспешни тест(а)
Йоанна Кръстева
import re
def find_dirty_words(text, pattern_dirty_words):
fucking_dictionary = {
'dirty_words': [],
'censored_dirty_words': [],
'allowed_dirty_words': []
}
for word in pattern_dirty_words:
if re.findall(r"[ ,-]{1}" + re.escape(word) + r"[ ,-]{1}", text):
for _ in re.findall(r"[ ,-]{1}" + re.escape(word) + r"[ ,-]{1}", text):
fucking_dictionary['dirty_words'].append(word)
if re.findall(rf"{re.escape(word)}[.?!]", text):
for _ in re.findall(rf"{re.escape(word)}[.?!]", text):
fucking_dictionary['allowed_dirty_words'].append(word)
if re.findall(r'(\(|\[)' + re.escape(word) + r'(\)|\])', text):
for _ in re.findall(r'(\(|\[)' + re.escape(word) + r'(\)|\])', text):
fucking_dictionary['allowed_dirty_words'].append(word)
if re.findall(r'(\{)' + re.escape(word) + r'(\})', text):
for _ in re.findall(r'(\{)' + re.escape(word) + r'(\})', text):
fucking_dictionary['dirty_words'].append(word)
if re.findall(r'(\(|\[|\{)[^ ,]+' + re.escape(word) + r'[^ ,]+(\)|\]|\})', text):
for _ in re.findall(r'(\(|\[|\{)[^ ,]+' + re.escape(word) + r'[^ ,]+(\)|\]|\})', text):
fucking_dictionary['dirty_words'].append(word)
if re.findall(r'(\(|\[|\{)' + re.escape(word) + r'[^ ,]+(\)|\]|\})', text):
for _ in re.findall(r'(\(|\[|\{)' + re.escape(word) + r'[^ ,]+(\)|\]|\})', text):
fucking_dictionary['dirty_words'].append(word)
if re.findall(r'(\(|\[|\{)[^ ,]+' + re.escape(word) + r'(\)|\]|\})', text):
for _ in re.findall(r'(\(|\[|\{)[^ ,]+' + re.escape(word) + r'(\)|\]|\})', text):
fucking_dictionary['dirty_words'].append(word)
else:
new_word = ''
for i in range(len(word)):
new_word += f"({word[i]}|\\*)"
if re.findall(rf'{new_word}', text):
for match in re.findall(rf'{new_word}', text):
if len(match) == len(word) and word not in fucking_dictionary['allowed_dirty_words'] \
and word not in fucking_dictionary['dirty_words']:
match = "".join(match)
fucking_dictionary['censored_dirty_words'].append(match)
return fucking_dictionary
.F.F.F..
======================================================================
FAIL: test_censored_only (test.TestDirtyWords)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/storage/deedee/data/rails/pyfmi-2022/releases/20221115154139/lib/language/python/runner.py", line 67, in thread
    raise result
AssertionError: Element counts were not equal:
First has 1, Second has 0:  '****'

======================================================================
FAIL: test_dirty_before_non_terminating_punctuation (test.TestDirtyWords)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/storage/deedee/data/rails/pyfmi-2022/releases/20221115154139/lib/language/python/runner.py", line 67, in thread
    raise result
AssertionError: Element counts were not equal:
First has 3, Second has 4:  'me'

======================================================================
FAIL: test_dirty_in_other_words (test.TestDirtyWords)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/storage/deedee/data/rails/pyfmi-2022/releases/20221115154139/lib/language/python/runner.py", line 67, in thread
    raise result
AssertionError: Element counts were not equal:
First has 1, Second has 0:  'ass'
First has 2, Second has 0:  'side'
First has 1, Second has 0:  'not'

----------------------------------------------------------------------
Ran 8 tests in 0.281s

FAILED (failures=3)
Цветелина Чакърова
  • Некоректно
  • 4 успешни тест(а)
  • 4 неуспешни тест(а)
Цветелина Чакърова
import re
def find_dirty_words(text, words):
result = {'dirty_words': [], 'censored_dirty_words': [], 'allowed_dirty_words': []}
for word in words:
for _ in range(len(list(re.finditer(r'(?=(\s' + word + r'\s))', text)))):
result['dirty_words'].append(word)
for _ in range(len(list(re.finditer(r'(?=(\s{' + word + r'}\s))', text)))):
result['dirty_words'].append(word)
for _ in range(len(list(re.finditer(r'(?=([\s|{]' + word + r'[}]?[.|?|!]))', text)))):
result['allowed_dirty_words'].append(word)
for _ in range(len(list(re.finditer(r'(?=(\s[(|[]' + word + r'[)|\]][\s|.|?|!]))', text)))):
result['allowed_dirty_words'].append(word)
censored_word = ""
for letter in word:
censored_word += r'[{}|\*]'.format(letter)
reg = r'\s[(|[|{]?' + censored_word + r'[)|\]|}]?[\s|.|?|!]'
if re.search(reg, text) is not None:
start, end = re.search(reg, text).span()
if '*' in text[start+1:end-1]:
print(re.search(reg, text))
if '(' in text[start+1:end-1] or '[' in text[start+1:end-1] or '{' in text[start+1:end-1]:
result['censored_dirty_words'].append(text[start+2:end-2])
else:
result['censored_dirty_words'].append(text[start+1:end-1])
return result
.FFF...F
======================================================================
FAIL: test_censored_only (test.TestDirtyWords)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/storage/deedee/data/rails/pyfmi-2022/releases/20221115154139/lib/language/python/runner.py", line 67, in thread
    raise result
AssertionError: Element counts were not equal:
First has 0, Second has 1:  's****ed'

======================================================================
FAIL: test_censored_within_brackets_or_punctuation (test.TestDirtyWords)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/storage/deedee/data/rails/pyfmi-2022/releases/20221115154139/lib/language/python/runner.py", line 67, in thread
    raise result
AssertionError: Element counts were not equal:
First has 0, Second has 1:  'mou**'
First has 0, Second has 1:  '**arving'

======================================================================
FAIL: test_dirty_before_non_terminating_punctuation (test.TestDirtyWords)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/storage/deedee/data/rails/pyfmi-2022/releases/20221115154139/lib/language/python/runner.py", line 67, in thread
    raise result
AssertionError: Element counts were not equal:
First has 1, Second has 4:  'me'
First has 0, Second has 1:  'thirsty'
First has 0, Second has 1:  'cankered'

======================================================================
FAIL: test_dirty_with_curly_brackets (test.TestDirtyWords)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/storage/deedee/data/rails/pyfmi-2022/releases/20221115154139/lib/language/python/runner.py", line 67, in thread
    raise result
AssertionError: Element counts were not equal:
First has 0, Second has 1:  'feast'

----------------------------------------------------------------------
Ran 8 tests in 0.227s

FAILED (failures=4)
Йордан Глигоров
  • Некоректно
  • 2 успешни тест(а)
  • 6 неуспешни тест(а)
Йордан Глигоров
import re
def find_dirty_words(text, input_dirty_words):
result = {
"dirty_words": [],
"censored_dirty_words": [],
"allowed_dirty_words": []
}
for word in input_dirty_words:
result["dirty_words"].extend(find_dirty_word_in_text(text, word))
result["allowed_dirty_words"].extend(find_allowed_dirty_word_in_text(text, word))
text_words = text.split(' ')
for word in text_words:
if '*' in word:
result["censored_dirty_words"].extend(find_censored_dirty_words_in_text(word, input_dirty_words))
return result
def find_dirty_word_in_text(text, word):
result = []
temp = []
# find the normal dirty words that arent at the end of the string
my_regex = r'(?<![\[\(])' + re.escape(word) + r'(?=\s)' + r'(?![\]\)\.\?!])'
result.extend(re.findall(my_regex, text))
# find the dirty words in () and []
my_regex = r'\(([^)]+' + re.escape(word) + r'[^)]*)\)'
temp.extend(re.findall(my_regex, text))
my_regex = r'\[([^)]+' + re.escape(word) + r'[^)]*)\]'
temp.extend(re.findall(my_regex, text))
dirty_words_in_brackets = " ".join(temp)
result.extend(re.findall(word, dirty_words_in_brackets))
return result
def find_censored_dirty_words_in_text(censured_word, input_dirty_words):
word = censured_word.replace(".", "")
word = word.replace("*", ".")
temp = " ".join(input_dirty_words)
regex = r''+word+r''
if re.match(regex, temp):
return [censured_word]
regex = r'.* '+word+r''
if re.match(regex, "свършиш ближеш захар Блъскам DUCKER"):
print("da be")
return [censured_word.replace(".", "")]
return []
def find_allowed_dirty_word_in_text(text, word):
result = []
my_regex = r'(?<=\[)' + re.escape(word) + r'(?=\])'
result.extend(re.findall(my_regex, text))
my_regex = r'(?<=\()' + re.escape(word) + r'(?=\))'
result.extend(re.findall(my_regex, text))
my_regex = re.escape(word) + r'(?=[\.\?!])'
result.extend(re.findall(my_regex, text))
return result
.FFFFF.F
======================================================================
FAIL: test_censored_only (test.TestDirtyWords)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/storage/deedee/data/rails/pyfmi-2022/releases/20221115154139/lib/language/python/runner.py", line 67, in thread
    raise result
AssertionError: Element counts were not equal:
First has 0, Second has 1:  's****ed'
First has 0, Second has 1:  '***dal'
First has 0, Second has 1:  'c*m*s'
First has 0, Second has 1:  'in**'

======================================================================
FAIL: test_censored_within_brackets_or_punctuation (test.TestDirtyWords)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/storage/deedee/data/rails/pyfmi-2022/releases/20221115154139/lib/language/python/runner.py", line 67, in thread
    raise result
AssertionError: Element counts were not equal:
First has 1, Second has 0:  'h*ir.'
First has 0, Second has 1:  'mou**'
First has 0, Second has 1:  'h*ir'
First has 0, Second has 1:  '**arving'
First has 0, Second has 1:  'p*owl'

======================================================================
FAIL: test_dirty_before_non_terminating_punctuation (test.TestDirtyWords)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/storage/deedee/data/rails/pyfmi-2022/releases/20221115154139/lib/language/python/runner.py", line 67, in thread
    raise result
AssertionError: Element counts were not equal:
First has 1, Second has 4:  'me'
First has 0, Second has 1:  'thirsty'
First has 0, Second has 1:  'cankered'

======================================================================
FAIL: test_dirty_everything (test.TestDirtyWords)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/storage/deedee/data/rails/pyfmi-2022/releases/20221115154139/lib/language/python/runner.py", line 67, in thread
    raise result
AssertionError: Element counts were not equal:
First has 0, Second has 1:  'ц*цата'

======================================================================
FAIL: test_dirty_in_other_words (test.TestDirtyWords)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/storage/deedee/data/rails/pyfmi-2022/releases/20221115154139/lib/language/python/runner.py", line 67, in thread
    raise result
AssertionError: Element counts were not equal:
First has 2, Second has 0:  'side'

======================================================================
FAIL: test_dirty_with_curly_brackets (test.TestDirtyWords)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/storage/deedee/data/rails/pyfmi-2022/releases/20221115154139/lib/language/python/runner.py", line 67, in thread
    raise result
AssertionError: Element counts were not equal:
First has 0, Second has 1:  'splayed'
First has 0, Second has 1:  'feast'

----------------------------------------------------------------------
Ran 8 tests in 0.208s

FAILED (failures=6)
Никола Михайлов
  • Некоректно
  • 5 успешни тест(а)
  • 3 неуспешни тест(а)
Никола Михайлов
import re
def find_dirty_words(text, predefined_dirty_words):
dirty_dict = dict.fromkeys(['dirty_words', 'censored_dirty_words', 'allowed_dirty_words'], [])
if not text or not predefined_dirty_words:
return dirty_dict
combine_dirty_words = '|'.join(predefined_dirty_words)
# dirty words
normal_dirty_words = re.findall(r"(?<!\S)({})(?!\S)".format(combine_dirty_words), text, re.I)
curly_brackets = re.findall(r"(?:(?<=\s{{)|(?<=^{{))({})(?=}}\s)".format(combine_dirty_words), text, re.I)
dirty_dict['dirty_words'] = normal_dirty_words + curly_brackets
# censored
censored_words = []
for word in re.findall(r'(?<!\S)\w*\*+[\w|*]*', text):
pattern_word = ''.join(['.' if char == '*' else char for char in word])
for dirty_word in predefined_dirty_words:
if len(word) == len(dirty_word) and re.match(r'{}'.format(pattern_word), dirty_word, re.I):
censored_words.append(word)
break
dirty_dict['censored_dirty_words'] = censored_words
# allowed
dirty_dict['allowed_dirty_words'].extend(
re.findall(r"(?:(?<=\s\[)|(?<=^\[))({})(?=\])".format(combine_dirty_words), text, re.I)) # square brackets
dirty_dict['allowed_dirty_words'].extend(
re.findall(r"(?:(?<=\s\()|(?<=^\[))({})(?=\))".format(combine_dirty_words), text, re.I)) # parentheses
dirty_dict['allowed_dirty_words'].extend(
re.findall(r"(?<!\S)({})(?=[.!?])".format(combine_dirty_words), text, re.I)) # end of sentence
return dirty_dict
..FF...F
======================================================================
FAIL: test_censored_within_brackets_or_punctuation (test.TestDirtyWords)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/storage/deedee/data/rails/pyfmi-2022/releases/20221115154139/lib/language/python/runner.py", line 67, in thread
    raise result
AssertionError: Element counts were not equal:
First has 0, Second has 1:  '**arving'
First has 0, Second has 1:  'p*owl'

======================================================================
FAIL: test_dirty_before_non_terminating_punctuation (test.TestDirtyWords)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/storage/deedee/data/rails/pyfmi-2022/releases/20221115154139/lib/language/python/runner.py", line 67, in thread
    raise result
AssertionError: Element counts were not equal:
First has 1, Second has 4:  'me'
First has 0, Second has 1:  'thirsty'
First has 0, Second has 1:  'cankered'

======================================================================
FAIL: test_dirty_with_curly_brackets (test.TestDirtyWords)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/storage/deedee/data/rails/pyfmi-2022/releases/20221115154139/lib/language/python/runner.py", line 67, in thread
    raise result
AssertionError: Element counts were not equal:
First has 0, Second has 1:  'feast'

----------------------------------------------------------------------
Ran 8 tests in 0.172s

FAILED (failures=3)