You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
 
FYReader/source/LocalSource.md

7.3 KiB

风月读书内置书源说明

  • 如何自行制作并添加书源.

    • 基于面向接口开发的思想,对于书源我设计了如下接口:

      • // 这个接口位于xyz.fycz.myreader.webapi.crawler.base包下
        public interface ReadCrawler {
            String getSearchLink();  // 书源的搜索url
            String getCharset(); // 书源的字符编码
            String getSearchCharset(); // 书源搜索关键字的字符编码,和书源的字符编码就行
            String getNameSpace(); // 书源主页地址
            Boolean isPost(); // 是否以post请求搜索
            String getContentFormHtml(String html); // 获取书籍内容规则
            ArrayList<Chapter> getChaptersFromHtml(String html); // 获取书籍章节列表规则
            ConcurrentMultiValueMap<SearchBookBean, Book> getBooksFromSearchHtml(String html); // 搜索书籍规则
        }
        
    • 了解上述接口的方法,我们就可以开始创建书源了

      • 第一步:创建一个书源类实现上述接口,下面以笔趣阁44为例进行说明

        • // 注意:如果搜索书籍页没有图片、最新章节、书籍简介等信息,可以通过实现BookInfoCrawler接口,从书籍详情页获取
          public class BiQuGe44ReadCrawler implements ReadCrawler, BookInfoCrawler {
              //网站主页地址
              public static final String NAME_SPACE = "https://www.wqge.cc";
              /*
              	搜索url,搜索关键词以{key}进行占位
              	如果是post请求,以“,”分隔url,“,”前是搜索地址,“,”后是请求体,搜索关键词同样以{key}占位
              		例如:"https://www.9txs.com/search.html,searchkey={key}"
              */
              public static final String NOVEL_SEARCH = "https://www.wqge.cc/modules/article/search.php?searchkey={key}";
              // 书源字符编码
              public static final String CHARSET = "GBK";
              // 书源搜索关键词编码
              public static final String SEARCH_CHARSET = "utf-8";
              @Override
              public String getSearchLink() {
                  return NOVEL_SEARCH;
              }
          
              @Override
              public String getCharset() {
                  return CHARSET;
              }
          
              @Override
              public String getNameSpace() {
                  return NAME_SPACE;
              }
              @Override
              public Boolean isPost() {
                  return false;
              }
              @Override
              public String getSearchCharset() {
                  return SEARCH_CHARSET;
              }
          
              /**
               * 从html中获取章节正文
               * @param html
               * @return
               */
              public String getContentFormHtml(String html) {
                  Document doc = Jsoup.parse(html);
                  Element divContent = doc.getElementById("content");
                  if (divContent != null) {
                      String content = Html.fromHtml(divContent.html()).toString();
                      char c = 160;
                      String spaec = "" + c;
                      content = content.replace(spaec, "  ");
                      return content;
                  } else {
                      return "";
                  }
              }
          
              /**
               * 从html中获取章节列表
               *
               * @param html
               * @return
               */
              public ArrayList<Chapter> getChaptersFromHtml(String html) {
                  ArrayList<Chapter> chapters = new ArrayList<>();
                  Document doc = Jsoup.parse(html);
                  String readUrl = doc.select("meta[property=og:novel:read_url]").attr("content");
                  Element divList = doc.getElementById("list");
                  String lastTile = null;
                  int i = 0;
                  Elements elementsByTag = divList.getElementsByTag("dd");
                  for (int j = 9; j < elementsByTag.size(); j++) {
                      Element dd = elementsByTag.get(j);
                      Elements as = dd.getElementsByTag("a");
                      if (as.size() > 0) {
                          Element a = as.get(0);
                          String title = a.text() ;
                          if (!StringHelper.isEmpty(lastTile) && title.equals(lastTile)) {
                              continue;
                          }
                          Chapter chapter = new Chapter();
                          chapter.setNumber(i++);
                          chapter.setTitle(title);
                          String url = readUrl + a.attr("href");
                          chapter.setUrl(url);
                          chapters.add(chapter);
                          lastTile = title;
                      }
                  }
                  return chapters;
              }
          
              /**
               * 从搜索html中得到书列表
               * @param html
               * @return
               */
              public ConcurrentMultiValueMap<SearchBookBean, Book> getBooksFromSearchHtml(String html) {
                  ConcurrentMultiValueMap<SearchBookBean, Book> books = new ConcurrentMultiValueMap<>();
                  Document doc = Jsoup.parse(html);
                  Elements divs = doc.getElementsByTag("table");
                  Element div = divs.get(0);
                  Elements elementsByTag = div.getElementsByTag("tr");
                  for (int i = 1; i < elementsByTag.size(); i++) {
                      Element element = elementsByTag.get(i);
                      Book book = new Book();
                      Elements info = element.getElementsByTag("td");
                      book.setName(info.get(0).text());
                      book.setChapterUrl(NAME_SPACE + info.get(0).getElementsByTag("a").attr("href"));
                      book.setAuthor(info.get(2).text());
                      book.setNewestChapterTitle(info.get(1).text());
                      book.setSource(BookSource.biquge44.toString());
                      // SearchBookBean用于合并相同书籍
                      SearchBookBean sbb = new SearchBookBean(book.getName(), book.getAuthor());
                      books.add(sbb, book);
                  }
                  return books;
              }
          
              /**
               * 获取书籍详细信息
               * @param book
               */
              public Book getBookInfo(String html, Book book){
                  Document doc = Jsoup.parse(html);
                  Element img = doc.getElementById("fmimg");
                  book.setImgUrl(img.getElementsByTag("img").get(0).attr("src"));
                  Element desc = doc.getElementById("intro");
                  book.setDesc(desc.getElementsByTag("p").get(0).text());
                  Element type = doc.getElementsByClass("con_top").get(0);
                  book.setType(type.getElementsByTag("a").get(2).text());
                  return book;
              }
          
          }
          
      • 第二步:注册书源信息。

        • 在app/src/main/resources/crawler.properties配置文件中添加书源类信息,例如:

          • // biquge44书源的命名,与BookSource中的命名一致,xyz.fycz.myreader.webapi.crawler.read.BiQuGe44ReadCrawler是书源类的完整路径
            biquge44=xyz.fycz.myreader.webapi.crawler.read.BiQuGe44ReadCrawler
            
      • 第三步:添加书源到数据库。