Content Scraping Principle
In the src/contents/scraper.ts file, we define the scraper logic for getting webpage content when publishing articles.
Similarly, we listen for messages from the Options page. When users click the Get Content button in the Article tab, it triggers this message and calls the scrapeContent function to get webpage content.
By default, we use the defaultScraper function to get webpage content, and it determines which scraper function to use based on the webpage URL.
For example, https://blog.csdn.net/ will use the scrapeCSDNContent function to get webpage content.
Taking CSDN as an example, we use the scrapeCSDNContent function to get webpage content. The principle is to use the Readability library to get webpage content, use the preprocessor function to process webpage content, and finally use different selectors to get article title, author, cover, content, summary and other information based on the characteristics of different types of websites.