[CodeKata] 프로그래머스: 11.21(일), 매칭 점수

Algorithm 2021. 11. 21. 17:40

🥋 Oooth More!! (Level 3)

🧮 풀이

정규식을 활용한 풀이로 접근했으나, 모두 오답이 나왔다. 아마 컨텐츠나 태그들을 판정하는 방법을 직관적으로 사용해서 에러가 난 것 같다.

모범답안을 연구하며 틀렸을만한 포인트를 짚어봐야겠다.

const getContent = (html, tagName) => {
  const tnl = tagName.length + 2;
  const start = `<${tagName}>`
  const end = `\n</${tagName}>`
  const spliter = tagName === 'head' ? '\n  ' : '\n'
  return html.slice(html.indexOf(start)+tnl, html.indexOf(end)).split(spliter).slice(1)
}

const getLinkList = (body) => {
  let list = [];
  body.forEach(code => {
    const idx = code.indexOf('<a href="');
    if (idx !== -1) {
      const start = idx + 9;
      const end = code.indexOf('">');
      list.push(code.slice(start, end));
    }
  })
  return list;
}

function solution(word, pages) {
  let pageObj = {};
  
  pages.forEach(p => {
    const head = getContent(p, 'head');
    const url = head.find(code => code.includes('og:url')).slice(33, -3)
    const body = getContent(p, 'body');
    const regex = new RegExp(word, 'gi')
    // const matchList = body.join('').match(regex);
    // const basicPoint = matchList ? matchList.length : 0;
    const basicPoint = body.join('').match(regex).length
    let linkList = getLinkList(body);
    const linkDonator = basicPoint / linkList.length;
    pageObj[url] = { basicPoint, linkList, linkDonator, linkPoint: 0 }
  })
  
  Object.keys(pageObj).forEach(key => {
    pageObj[key].linkList.forEach(link => {
      if (pageObj[link]) {
        pageObj[link].linkPoint += pageObj[key].linkDonator
      }
    })
  })
  
  return Object.values(pageObj).map((p,i) => [p.basicPoint + p.linkPoint, i].sort((a,b) => a[0] === b[0] ? a[1] - b[1] : b[0] - a[0])).shift()[1];
}

🖇 리뷰

function solution (word, pages) {
  word = word.toLowerCase();
  const REGEX_WORD = /[\d|\W]/;
  const REGEX_URL = /<a href="https:\S*"/gi;
  const META_URL = 'meta property';
  const pageInfo = new Map();
  
  pages.forEach((page, idx) => {
    const pageArr = page.split('\n');
    const urlIdx = pageArr.findIndex(el => el.includes(META_URL));
    const pageURL = pageArr[urlIdx].match(/"https:\S*"/gi)[0];
    const bodyStart = pageArr.findIndex(el => el.includes("<body>"));
    const bodyEnd = pageArr.findIndex(el => el.includes("</body>"));
    const body = pageArr.slice(bodyStart+1, bodyEnd);
    const point = body.flatMap(str => str.toLowerCase().split(REGEX_WORD)).filter(e => e === word).length;
    const outLinks = body.flatMap(str => str.match(REGEX_URL)).filter(e => e).map(e => e.substr(8, e.length));
    
    pageInfo.set(pageURL, { point, outLinks, idx, matchPoint : 0 });
  });
  
  for(const [key, value] of pageInfo) {
    const linkPoint = value.point / value.outLinks.length;
    
    for(const link of value.outLinks) {
      if(pageInfo.has(link)) {
        const origin = pageInfo.get(link);
        const calculatedPoint = origin.matchPoint 
        	? origin.matchPoint + linkPoint
        	: origin.point + linkPoint;
        pageInfo.set(link, { ...origin, matchPoint: calculatedPoint });
      }
    }
  }
  
  const answer = [];
  
  for(const [key, value] of pageInfo) {
    const { point, idx, matchPoint } = value;
    const finalPoint = matchPoint ? matchPoint : point;
    answer.push([ idx, finalPoint ]);
  };
  
  answer.sort((a, b) => a[1] === b[1] ? a[0] - b[0] : b[1] - a[1]);
  
  return answer[0][0];
}

* 출처 : https://velog.io/@longroadhome/%ED%94%84%EB%A1%9C%EA%B7%B8%EB%9E%98%EB%A8%B8%EC%8A%A4-LV.3-%EB%A7%A4%EC%B9%AD%EC%A0%90%EC%88%98-JS

1. 정규식 준비

HTML 파싱을 위해 일부 정규식을 준비하면 문제를 용이하게 풀 수 있다.
우선, word(단어) 검색시 대소문자를 구분하지 않는다고 했다. 이를 소문자로 통일하기 위해 toLowerCase() 를 적용한다.
REGEX_WORD 는 숫자 & 특수문자를 찾아주는 정규식이다. 이를, 문자열의 split() 에 활용하면 문자만 남은 배열로 바꿔준다.
REGEX_URL 은 외부링크를 찾는 정규식이다. 링크의 경우 항상 <a href="https:URL..." /> 형태를 가지므로 아래 정규식을 쓴다.

// 외부링크를 찾는 정규식
// \S는 공백문자가 아닌 하나의 문자에 대응 + *는 연속 반복에 대응
// url은 특수기호 및 숫자, 영대소문자 등 다양한 문자가 가능하므로
// 공백문자가 아닌 문자로 접근하면 올바르게 접근이 가능하다.
const REGEX_URL = /<a href="https:\S*"/gi;

META_URL 은 메타태그의 내 url 파싱을 위한 정규식이다. 정규식의 목적보다는, property를 가진 <meta>태그를 찾기 위함이다.

2. HTML 파싱

파싱은 다음 4단계로 이루어진다. ( 페이지 코드라인 구분 ➡️ 본인 url 파싱 ➡️ <body> 내용 파싱 ➡️ 파싱된 내용으로 점수계산 )
pageInfo는 pages 배열의 각 페이지 정보들을 객체형태로 담기 위한 Map() 자료형이다. (Object 객체를 써도 무방)
pages 배열을 순회한다. pageArr는 각 page를 ('\n')로 split() 한 것으로, HTML 코드 각 1줄씩을 배열로 반환한다.
urlIdx는 pageArr에서 자기url을 가지고 있는 코드를 검색한다. pageURL이 url 주소이자, pageInfo에서 key값이 된다.
body는 pageArr에서 <body>, </body> 사이를 slice()한 배열이다.
point는 기본점수로, body를 flatMap()으로 이어준 다음, toLowerCase() 소문자 처리 / REGEX_WORD 숫자 및 특수문자 제거한 뒤 word와 같은 값들의 숫자를 세준다.
outLinks는 body에서 <a>태그를 찾아 링크값만 걸러낸 배열이다. REGEX_URL 매칭 / undefined filter() 한 배열을 태그를 잘라낸 값으로 map() 매핑한다.
pageInfo의 각 값은 { point: 기본점수, outLinks: 링크목록, idx: 인덱스, matchPoint: 매칭점수 } 로 설정해준다.

3. 점수계산

pageInfo를 순회하며, 점수(matchPoint)를 계산한다. 기본점수는 point로, 링크점수(linkPoint)는 (기본점수 / outLinks 수) 를 현재링크와 연결된 사이트들에 더해줘야한다.
현재 value.outLinks를 순회하며 pageInfo에 해당 link가 존재하는 경우, matchPoint에 현재 페이지의 linkPoint 값을 더해준다.

4. 정답 반환

정답으로 요구한 조건 (가장 큰 matchPoint, 상대적으로 작은 인덱스) 를 반환하기 위해 배열형태로 정리한다.
pageInfo를 순회하며 점수를 다시 정리한다. matchPoint가 0점이라면 기본점수(point)를, 존재한다면 matchPoint를 최종점수로 반영한다.
이를, answer 배열에 [인덱스, 점수] 형태로 담고, matchPoint(1번째) 내림차순, idx(0번째) 오름차순으로 sort() 한다.
이것의 맨 앞의 값에서, 0번째 값이 인덱스이므로 이를 정답으로 최종 반환하면 된다.

'Algorithm' 카테고리의 다른 글

[CodeKata] 프로그래머스: 12.12(일), 카드 짝 맞추기 (0)	2021.12.13
[CodeKata] 프로그래머스: 11.28(일), 단속 카메라 (0)	2021.11.28
[CodeKata] 프로그래머스: 11.14(일), 외벽 점검 (0)	2021.11.14
[CodeKata] 프로그래머스: 11.6(토), 모두 0으로 만들기 (0)	2021.11.06
[CodeKata] 프로그래머스: 10.31(일), 징검다리 건너기 (0)	2021.10.31

ABOUT ME

ttaeng_99 's 개발의 흔적들 ttaeng_99 's 개발의 흔적들

🥋 Oooth More!! (Level 3)

🧮 풀이

🖇 리뷰

3. 점수계산

4. 정답 반환

'Algorithm' 카테고리의 다른 글

티스토리툴바

ABOUT ME

🥋 Oooth More!! (Level 3)

🧮 풀이

🖇 리뷰

3. 점수계산

4. 정답 반환

'Algorithm' 카테고리의 다른 글

관련글 관련글 더보기

티스토리툴바