Skip to content

Conversation

@github-classroom
Copy link

@github-classroom github-classroom bot commented Sep 30, 2024

👋! GitHub Classroom created this pull request as a place for your teacher to leave feedback on your work. It will update automatically. Don’t close or merge this pull request, unless you’re instructed to do so by your teacher.
In this pull request, your teacher can leave comments and feedback on your code. Click the Subscribe button to be notified if that happens.
Click the Files changed or Commits tab to see all of the changes pushed to the default branch since the assignment started. Your teacher can see this too.

Notes for teachers

Use this PR to leave feedback. Here are some tips:

  • Click the Files changed tab to see all of the changes pushed to the default branch since the assignment started. To leave comments on specific lines of code, put your cursor over a line of code and click the blue + (plus sign). To learn more about comments, read “Commenting on a pull request”.
  • Click the Commits tab to see the commits pushed to the default branch. Click a commit to see specific changes.
  • If you turned on autograding, then click the Checks tab to see the results.
  • This page is an overview. It shows commits, line comments, and general comments. You can leave a general comment below.
    For more information about this pull request, read “Leaving assignment feedback in GitHub”.

Subscribed: @cukminseo @Sujinkim-625 @yeseoLee @hsmin9809 @koreannn @gayeon7877

github-classroom bot and others added 30 commits September 30, 2024 04:29
as-is: lr, epoch가 default로 설정되어 실행됨.
be-to: lr, epoch 등 커스터마이징이 가능하도록 코드 추가

#5
[FEAT] 베이스라인 코드 간소화 #4
transformers.set_seed -> utils_qa.set_seed 로 변경
시드 상수화 및 set_seed 함수 default 인자 값에 의존하도록 변경하여 시드 통일

[FEAT] 베이스라인 코드 간소화 #4
evaluation_strategy -> eval_strategy

[FEAT] 베이스라인 코드 간소화 #4
- 시드 상수로 관리하지 않고 학습 인자로 처리
- (실행 스크립트의 인자를 통해 시드 관리할 예정)
- set_seed 함수 deterministic 인자 추가
[FEAT] 베이스라인 코드 간소화 #4
- model별 다른 return_token_type_ids를 적용
#4
Feat/베이스라인 코드 간소화 - 마무리
nevertmr and others added 26 commits October 22, 2024 19:18
- integration_pipeline에 retriever_file명 수정하여 train->test로 자동 전환 옵션 추가
- 오탈자 수정
Exp/rerank rerank 관련 오류 수정 및 전이학습 인자 추가
Exp/analysis retrieval Retrieval 및 Reranker 성능 분석
[feat]GPT기반앙상블코드구현
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.vscode 도 추가해주면 좋습니다

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

기본적으로 github에서 제공하는 python gitignore에서 시작하면 어떨까 합니다

https://github.com/github/gitignore/blob/main/Python.gitignore

--model_name_or_path klue/bert-base \
--config_name None \
--tokenizer_name None \
\
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

중간 \는 불필요해보입니다

@@ -0,0 +1,70 @@
# 실행 방법
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dependency 설치가 빠져 있습니다. python version, cuda 버전 등도 기록해주세요

## 통합 파이프라인으로 실행
```bash
# train-eval-inference 통합 파이프라인
python integration_pipline.py ./config/integration.json
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
python integration_pipline.py ./config/integration.json
python integration_pipeline.py ./config/integration.json

@@ -0,0 +1,37 @@
{
"model_name_or_path": "klue/roberta-small",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

실제 리포트에 기록된 값들과 좀 달라보입니다

Comment on lines +151 to +152
if manual_mode:
return
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • manual_mode로 실행된다는 것을 가벼운 로그로 찍어주면 좋습니다.
  • 코드 전반적으로 print를 사용하고 있는데 logger 쓰는 것 권장합니다.

# elasticsearch 서버 세팅
def connect(self):
es = Elasticsearch(
"http://localhost:9200", timeout=30, max_retries=10, retry_on_timeout=True
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"http://localhost:9200" -> 이 값을 처음 init 할때 self.host 와 같은 곳에 담아주세요

print("Index already exists.")
return False

with open(self.setting_path, "r") as f:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
with open(self.setting_path, "r") as f:
with open(self.setting_path, "r", encoding="utf-8") as f:

Comment on lines +49 to +50
ngram_range=(1, 2),
max_features=50000,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__init__ 의 argument로 빼주세요

Comment on lines +1 to +18
# Data manipulation and analysis
pandas
numpy==1.24.1

# Machine Learning libraries
scikit-learn
torch==1.13
datasets==2.15.0
transformers==4.25.1
faiss-gpu

# Progress and logging utilities
tqdm
wandb

# Information retrieval libraries
rank_bm25
elasticsearch
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

모든 라이브러리는 반드시 버전을 고정해주세요

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants