With the current scrape implementation, it only fetches events from the most active 100 tech groups due to the default sort & pagination imposed by the Meetup Website. Hence @danielepolencic suggested to fetch newer groups and add them to DB first, then we fetch events based on the groups in the DB.
This issue should be addressed with the following solution:
Step by Step description
Current Implementation:
- Fetch 100 most active groups & their RSS urls
- Parse RSS urls to get relevant event urls
- fetch event details from event urls
- if events don't already exist in events table, add them to events table. otherwise update state of existing events.
Proposed New Implementation:
Getting groups
- get 100 newest groups
- if groups don't already exist in groups table, add them to groups table. otherwise update state of existing events.
- if any already exist, stop this task.
Getting events
- based on groups table, get RSS urls, and parse them to get relevant event urls.
- fetch event details from event urls
- if events don't already exist in events table, add them to events table. otherwise update state of existing events.
High level overview of tasks:
With the current scrape implementation, it only fetches events from the most active 100 tech groups due to the default sort & pagination imposed by the Meetup Website. Hence @danielepolencic suggested to fetch newer groups and add them to DB first, then we fetch events based on the groups in the DB.
This issue should be addressed with the following solution:
Step by Step description
High level overview of tasks: