An AI-powered RSS workflow that collects Saudi (general, business, tech) and global tech headlines, extracts the article text, and summarizes each story into one Arabic sentence.
Live JSON is served at https://c3ziz.github.io/saudi-news-ai-rss/.
How it works
- Fetch RSS feeds defined in aggregator.py using a browser-like
requests.Session to avoid bot blocking.
- Skip recently-seen links by reading past JSON files (get_seen_ids).
- Download each article with
newspaper3k, cap extracted text to 2,000 chars, then summarize via Gemini (summarize_with_ai).
- Rate-limit gently and surface safety blocks; only take a couple of items per source to stay light (fetch_feed).
- Write results to date-stamped JSON and mirror to
api/latest.json for easy consumption.
- The hosted feed on GitHub Pages auto-refreshes daily around 8:00 AM Saudi time (UTC+3).
Why it is built this way
- Bot evasion: Browser-like headers and per-source throttling reduce RSS and article blocking.
- Duplication control: Short history prevents reposting the same link across runs.
- Graceful degradation: If AI or extraction fails, the JSON still ships with errors noted rather than breaking the run.
- Static API: Output is plain JSON files, easy to serve from any static host or CDN.
API usage
Example fetch:
curl -s https://c3ziz.github.io/saudi-news-ai-rss/api/latest.json | jq '.[0]'
Fields per item: id (URL), title, link, source, category, published, summary_ai.
Notes
- Ensure your host serves the
api/ directory as static files.
- Gemini summaries require the env var; without it,
summary_ai contains an error message instead.
- The live GitHub Pages feed auto-refreshes daily around 8:00 AM Saudi time (UTC+3).
Setup
1) Install Python deps:
pip install -r requirements.txt
2) Export your Gemini API key:
export GEMINI_API_KEY="YOUR_KEY"
3) Run the collector: