Newspaper : manipulation d'articles

NicolasWeb · #1 2016-06-13 09:28:18

Pour relancer un peu l'activité sur cette section du forum, voici une petite librairie Python que je suis entrain de tester : https://github.com/codelucas/newspaper

L'objectif est d'extraire des informations de pages web pour "supprimer" ce qui n'est pas pertinent.

Exemples :

>>> from newspaper import Article

>>> url = 'http://fox13now.com/2013/12/30/new-year-new-laws-obamacare-pot-guns-and-drones/'

>>> article = Article(url)

>>> article.download()

>>> article.html

'<!DOCTYPE HTML><html itemscope itemtype="http://...'

>>> article.parse()

>>> article.authors

['Leigh Ann Caldwell', 'John Honway']

>>> article.publish_date

datetime.datetime(2013, 12, 30, 0, 0)

>>> article.text

'Washington (CNN) -- Not everyone subscribes to a New Year's resolution...'

>>> article.top_image

'http://someCDN.com/blah/blah/blah/file.png'

>>> article.movies

['http://youtube.com/path/to/link.com', ...]

A tester

🟣 Newspaper : manipulation d'articles

#1 2016-06-13 09:28:18

Newspaper : manipulation d'articles

Pied de page des forums