Author Topic: establishing a thematically-framed corpus: HOW?  (Read 325 times)

Offline Jess

  • New Linguist
  • *
  • Posts: 1
establishing a thematically-framed corpus: HOW?
« on: May 01, 2018, 05:01:54 AM »
hi fellow linguists,

Say, I want to come up with a corpus of ca 200 headlines about global warming taken from various popular science magazines (New Scientist, MIT Technology Review etc): which method of collection do you recommend? (besides the tedious option of collecting them "by hand"). would you recommend using a webcrawler?

any help is welcome!
cheers, jess

Offline Daniel

  • Administrator
  • Experienced Linguist
  • *****
  • Posts: 1724
  • Country: us
    • English
Re: establishing a thematically-framed corpus: HOW?
« Reply #1 on: May 01, 2018, 09:24:06 AM »
200? Just collect them by hand. You won't save any time writing code to do it for you.
200,000? Then you would save time.

Headlines are also much easier to collect than full text articles, and probably easier to deal with in terms of copyright/access issues.

Regardless, a "corpus of 200 headlines" is a very limited amount of data, at least compared to how the term "corpus" is usually used in research. So I'm not sure I fully understand the question.
Welcome to Linguist Forum! If you have any questions, please ask.