Commit graph

60 commits

Author SHA1 Message Date
bact
4c9cde0749 Add Thai stopwords (#669)
* Add Thai stopwords from stopwordsiso

* add "th" to language_dict

* add unit test and test data files for Thai language

* - add pythainlp to requirements.txt
- sort requirements.txt

* Update and sort supported language list

* sort the language list

* update language list in docs/index.rst
2019-03-16 20:53:04 -04:00
ekaterinasmarp
11cbf3a303 Ignoring http pages depending on their content-type (#658)
* Ignoring http pages depending on their content-type, PDFs are ignored by default

* Code review fixes

* Code review fixes

* Code review fixes

* Code review fixes
2018-12-27 07:06:06 -08:00
Evaldas Kazlauskis
7f388b37a7 Adding lithuanian language support (#639) 2018-10-27 15:09:09 -07:00
sfi-dannybrady
2dea0097c0 Added Japanese language support. (#584)
* Added Japanese language support.

* Investigate tinysegmenter requirement
2018-08-27 00:38:04 -07:00
viymak
c09da44c9e Add Belarusian support (#607)
* added belarusian stopwords

* added Belarusian to README.rst, index.rst, quickstart.rst, utils.py
2018-08-26 11:39:02 -07:00
froessler
ed09bb7e49 added estonian stopwords (#523) 2018-02-28 01:57:14 -08:00
Teodor Ivanov
c0eed1a571 Add suport for Croatian, Romanian, Slovenian and Serbian languages (#517) 2018-02-15 22:12:18 -08:00
Teodor Ivanov
52bc92a07a Add Bulgarian stopwords and update documentation (#510) 2018-01-25 20:17:56 -08:00
codelucas
1f0aeffeff Remove fork image, making site https 2018-01-24 01:23:19 -08:00
codelucas
279ffa7587 More changes to readthedocs 2018-01-24 01:00:01 -08:00
codelucas
651e670b9d Update readthedocs guides 2018-01-24 00:48:24 -08:00
Andrew Aslinger
aab09d8808 Adds Persian and Swahili Language Support (#495)
* Adds Swahili language Support

* populate Swahili stop words

* Adds Persian Language Support

* fixes unit test due to missing language code

* bump build due to tarvis
2018-01-04 18:40:14 -08:00
Oleg Deribas
9ee97c51e3 Add Ukrainian language support 2017-09-02 23:24:58 +02:00
Lucas Ou-Yang
563840a638 Merge pull request #397 from tramwaj29/master
Added Polish language support
2017-07-11 20:12:12 -07:00
JJ
377863df43 Extended user guide for adding new language support 2017-07-11 03:28:48 +03:00
JJ
51111d0557 Added Polish language support 2017-07-11 03:02:13 +03:00
Logan Head
308469838f added meta refresh redirect support 2016-06-10 15:03:15 -04:00
Lucas Ou-Yang
89848c976b Merge pull request #194 from yprez/num-sentences
Adding setting for num of sentences in summary
2016-02-20 11:24:40 -08:00
Kenneth Reitz
f160ffeb96 Cleanup Scripts
These docs were messing up my analytics! :)

----------------

Cool project, btw!
2016-02-13 20:06:29 -05:00
Yuri Prezument
4e3c66f23b Docs for MAX_SUMMARY_SENT setting 2016-02-01 12:03:24 +02:00
Lucas Ou-Yang
0ef32a66be Merge pull request #188 from alon7/hebrew-support
Added Hebrew stop words for language support
2016-01-30 01:03:58 -08:00
Yuri Prezument
10e1a0c854 Fail on error http responses
Fixes #142
2016-01-27 17:54:31 +02:00
alon7
8a36b90201 Added Hebrew stop words for language support 2016-01-27 01:47:35 +02:00
Yuri Prezument
ed0cd3e0ce Installation docs - change according to README
* Note about Python2 and 3.
* Change package name in pip install example to newspaper3k.
2016-01-21 11:02:59 +02:00
Lucas Ou-Yang
3c72358e2d Bump version to 0.1.2 2015-01-01 00:08:31 -08:00
Lucas Ou-Yang
e53134f2ce Deprecate parser_class config option
After integrating UnicodeDammit there is no need to let users customize parsers to BeautifulSoup over lxml, lxml is faster and UnicodeDammit gives us the encoding recognition win from BeautifulSoup
2014-12-31 22:08:53 -08:00
Paul English
1579c10002 Update documentation examples 2014-11-11 20:11:15 -07:00
Paul English
4ad825c364 Migrate to python3 2014-11-11 15:56:19 -07:00
Lucas Ou-Yang
b9e4c877e2 Modify install.rst on web site docs 2014-10-12 17:05:42 -07:00
Lucas Ou-Yang
00a481af1b Reflect new installation directions in the site user docs 2014-10-12 16:45:59 -07:00
Lucas Ou-Yang
6a0a365694 fixed formatting bug in docs, this is terrible 2014-06-17 03:34:54 -07:00
Lucas Ou-Yang
38fdfc0d48 fixed docs, added link to examples on how to add non-latin languages 2014-06-17 03:32:02 -07:00
Lucas Ou-Yang
d5d532cbcd fix spacing in docs 2014-06-17 03:27:17 -07:00
Lucas Ou-Yang
f961091679 fix docs 2014-06-17 03:18:48 -07:00
Lucas Ou-Yang
d0081bee68 update and finalize docs 2014-06-17 02:58:20 -07:00
Lucas Ou-Yang
55efa5f4e1 added stopwords and support for indonesian and vietnamese languages 2014-06-15 02:32:00 -07:00
Jacopo Notarstefano
e92c5319f9 Fix small typo in documentation about newspaper.languages() 2014-06-01 11:06:38 +02:00
Lucas Ou-Yang
c7fce705c6 add new languages to readme 2014-02-02 14:10:34 -08:00
Lucas Ou-Yang
b6455ae64b completly revamped docs, added 6 new languages 2014-02-02 14:04:23 -08:00
Lucas Ou-Yang
f9aafc4a0c updated contributors, added goose licensing info 2014-01-31 12:01:18 -08:00
WheresWardy
d9bead061e Replace instances of 'Portugease' with 'Portuguese' 2014-01-25 15:58:48 +00:00
Lucas Ou-Yang
997857f744 update readme and docs 2014-01-18 10:52:50 -08:00
Lucas Ou-Yang
bcb834ba5e slight update on docs and readme 2014-01-18 10:48:41 -08:00
Lucas Ou-Yang
7eb808fb3d trim chiense example in readme and docs 2014-01-09 21:02:21 -08:00
Lucas Ou-Yang
ef49c755e6 fixed docs format err 2014-01-09 03:23:37 -08:00
Lucas Ou-Yang
11b295d8e8 adding contributors, history, updating setup.py, updating docs 2014-01-09 03:02:31 -08:00
Lucas Ou-Yang
23c2d912c6 fix indenting 2014-01-06 02:31:09 -08:00
Lucas Ou-Yang
a66e5c8601 fix indenting 2014-01-06 02:28:42 -08:00
Lucas Ou-Yang
86d69ff4a9 adding installation instructions for ubuntu users 2014-01-06 02:21:39 -08:00
Lucas Ou-Yang
7fa675dd73 tidying up docs 2013-12-31 12:56:42 -08:00