正規表現を用いた文字列の置換 | 合同会社シノノメテクニカ

reを用いた文字列の置換処理も目で見て数えることが可能な回数を行う際は殊更工夫することもないと思う。しかし、回数が増えるにつれ処理時間もある程度考慮したくなる。

開発環境：

Python 3.9.2

参考情報：

re.compileで正規表現パターンを正規表現オブジェクトに変換してみよう。実際に比較すると100回程度の繰り返し処理ではコンパイルしなくとも気になるものではないかも知れない。50,000回試行して処理時間の差異は下記のようになった。

※計測方法が雑であるため折を見て修正する。

>>> hippolyta = 'Four days will quickly steep themselves in Light. '\
... 'Four Knights will quickly dream away the time. '\
... 'And then the moon, like to a silver bow '\
... 'New-bent in heaven, shall behold the night '\
... 'Of our solemnities.'
>>> started = time.time()
>>> for i in range(0,50000):
...  result = re.sub(r'(L|Kn)ight',\
... r'night', hippolyta)
... 
>>> finished = time.time()
>>> duration = finished - started
>>> duration
1.0431718826293945
>>> 
>>> pattern = re.compile(r'(L|Kn)ight')
>>> started = time.time()
>>> for i in range(0,50000):
...  result = pattern.sub('night', hippolyta)
... 
>>> finished = time.time()
>>> duration = finished - started
>>> duration
0.3786191940307617
>>> result
'Four days will quickly steep themselves in night. Four nights will quickly dream away the time. And then the moon, like to a silver bow New-bent in heaven, shall behold the night Of our solemnities.'
>>>

先の投稿にて複数ファイルに対する書き込みサンプルを記載していなかったため、今回の正規表現も含んだサンプルを掲載する。正規表現オブジェクトにマッチした際に指定した文字列に置換し、ファイルへ書き込まれる。

def main():
    # 検索する正規表現パターンのコンパイル
    pattern = re.compile(r'Y(AML|aml)')
    # 置換後の文字列
    rep = r'foobarbaz'
    # 置換処理の実行
    results = replace_words(initialize(), pattern, rep)

# 対象ファイルから対象文字列を検索し置換する関数
def replace_words(targets, pattern, rep):
    with fi.input(targets, inplace =True) as f:
        for line in f:
            print(pattern.sub(rep, line), end='')