ML | DL

[NLP] keyBERT 사용해보기

pushpush 2023. 8. 16. 21:25

먼저 라이브러리를 깔아줍니다

pip install keybert
from keybert import KeyBERT

import를 해주고

doc = """
         As we age, our once youthful, healthy skin succumbs to an enzymatic imbalance that wears away the cellular network, 
         resulting in skin thinning and aging. Combining the best of nature and cosmetic biotechnology, Bio-Active products are 
         formulated with Enzymes that gently exfoliate the skin and stimulate regeneration for a youthful glow. Benefiting from 
         fertile orchards in the Italian countryside, Bio-active formulas are rich in phytohormones, flavonoids and fatty acids 
         from active extracts in Apple and Pear Seeds ,enzymatically modified and developed especially for the care of aging skin. 
         This repairing fluid helps to nourish and firm by accelerating penetration and delivery of active principles to the skin, 
         giving it a more youthful appearance. \n\nAdvanced "Probiotic" Complex from nourishing milk proteins regains the skin\'s 
         natural equilibrium, boosts its immunities and protects it against environmental and biological stress.\n\nPeptides and 
         Ceramides help to firm and regenerate the skin by stimulating collagen production and strengthening the epidermis.\n\nA 
         Calming Botanical Complex of Hyaluronic Acid and Wheat Germ Extract hydrates and restores the skin\'s protective barriers.
         \n\nA nutritive Vitamin Complex moisturizes and protects the skin from damaging environmental factors. \n\nParacress Extract, 
         a natural alternative to cosmetic injections, limits and relaxes micro-contractions that create facial lines, producing 
         immediate and long-term smoothing of the skin.\n\nTo Use: Apply a few pumps to Apply a few pumps to a clean and dried face,
         neck and dcollet.

      """

문장은 amazon beauty dataset에 존재하는 한 아이템의 description을 가져왔습니다.

import os
# os.environ["CUDA_VISIBLE_DEVICES"] = "1" 
# 저는 GPU 1번을 사용해서 해당 코드를 넣었습니다.

kw_model = KeyBERT()
keywords = keywords = kw_model.extract_keywords(doc,keyphrase_ngram_range=(1,2),use_maxsum = True,top_n = 20)

여기서 keyphrase_ngram_range를 설정해야되는데

N-gram이란 자연어 처리에서 단어의 순서를 고려해 언어의 특성을 분석하는데 사용되고 (1, 2)로 설정하면 한 단어 또는 두 단어로 keyword가 추출됩니다.

추출 결과!

요런식으로 뜹니다!