源码家

  • 开发语言:
  • Java
  • 源码大小:
  • 0.26M
  • 源码类别:
  • Java语言基础
  • 文件格式:
  • .zip

源码介绍

【源码简介】WebMagic 0.4.0 发布,Java爬虫框架
修复0 3 2及之前版本连接池不生效的问题 #30 使用HttpClient 4 3 1新的连接池机制 实现连接复用功能 经测试 下载速度可达到90%左右的提升 测试代码:Kr36NewsModel java 二 增加同步抓取的API 对于小规模的抓取...

【源码截图】

【核心源码】
文件清单
└── webmagic-master
    ├── en_docs
    │   └── README.md
    ├── pom.xml
    ├── README.md
    ├── release-note.md
    ├── user-manual.md
    ├── webmagic-core
    │   ├── pom.xml
    │   ├── README.md
    │   └── src
    │       ├── main
    │       │   ├── java
    │       │   │   └── us
    │       │   │       └── codecraft
    │       │   │           └── webmagic
    │       │   │               ├── downloader
    │       │   │               │   ├── Downloader.java
    │       │   │               │   ├── HttpClientDownloader.java
    │       │   │               │   ├── HttpClientGenerator.java
    │       │   │               │   └── package.html
    │       │   │               ├── package.html
    │       │   │               ├── Page.java
    │       │   │               ├── pipeline
    │       │   │               │   ├── CollectorPipeline.java
    │       │   │               │   ├── ConsolePipeline.java
    │       │   │               │   ├── FilePipeline.java
    │       │   │               │   ├── package.html
    │       │   │               │   ├── Pipeline.java
    │       │   │               │   └── ResultItemsCollectorPipeline.java
    │       │   │               ├── processor
    │       │   │               │   ├── example
    │       │   │               │   │   ├── BaiduBaikePageProcesser.java
    │       │   │               │   │   ├── GithubRepoPageProcesser.java
    │       │   │               │   │   └── OschinaBlogPageProcesser.java
    │       │   │               │   ├── package.html
    │       │   │               │   ├── PageProcessor.java
    │       │   │               │   └── SimplePageProcessor.java
    │       │   │               ├── Request.java
    │       │   │               ├── ResultItems.java
    │       │   │               ├── scheduler
    │       │   │               │   ├── package.html
    │       │   │               │   ├── PriorityScheduler.java
    │       │   │               │   ├── QueueScheduler.java
    │       │   │               │   └── Scheduler.java
    │       │   │               ├── selector
    │       │   │               │   ├── AndSelector.java
    │       │   │               │   ├── BaseElementSelector.java
    │       │   │               │   ├── CssSelector.java
    │       │   │               │   ├── ElementSelector.java
    │       │   │               │   ├── Html.java
    │       │   │               │   ├── OrSelector.java
    │       │   │               │   ├── package.html
    │       │   │               │   ├── PlainText.java
    │       │   │               │   ├── RegexResult.java
    │       │   │               │   ├── RegexSelector.java
    │       │   │               │   ├── ReplaceSelector.java
    │       │   │               │   ├── Selectable.java
    │       │   │               │   ├── Selector.java
    │       │   │               │   ├── Selectors.java
    │       │   │               │   ├── SmartContentSelector.java
    │       │   │               │   ├── XpathSelector.java
    │       │   │               │   └── XsoupSelector.java
    │       │   │               ├── Site.java
    │       │   │               ├── Spider.java
    │       │   │               ├── Task.java
    │       │   │               └── utils
    │       │   │                   ├── EnvironmentUtil.java
    │       │   │                   ├── Experimental.java
    │       │   │                   ├── FilePersistentBase.java
    │       │   │                   ├── NumberUtils.java
    │       │   │                   ├── package.html
    │       │   │                   ├── ThreadUtils.java
    │       │   │                   └── UrlUtils.java
    │       │   └── resources
    │       │       └── log4j.xml
    │       └── test
    │           ├── java
    │           │   └── us
    │           │       └── codecraft
    │           │           └── webmagic
    │           │               ├── downloader
    │           │               │   └── HttpClientDownloaderTest.java
    │           │               ├── HtmlTest.java
    │           │               ├── scheduler
    │           │               │   └── PrioritySchedulerTest.java
    │           │               ├── selector
    │           │               │   ├── ExtractorsTest.java
    │           │               │   └── RegexSelectorTest.java
    │           │               ├── SpiderTest.java
    │           │               └── utils
    │           │                   ├── EnvironmentUtilTest.java
    │           │                   └── UrlUtilsTest.java
    │           └── resources
    │               └── log4j.xml
    ├── webmagic-extension
    │   ├── pom.xml
    │   ├── README.md
    │   └── src
    │       ├── main
    │       │   └── java
    │       │       └── us
    │       │           └── codecraft
    │       │               └── webmagic
    │       │                   ├── downloader
    │       │                   │   └── FileCache.java
    │       │                   ├── example
    │       │                   │   ├── BaiduBaike.java
    │       │                   │   ├── GithubRepo.java
    │       │                   │   └── OschinaBlog.java
    │       │                   ├── model
    │       │                   │   ├── AfterExtractor.java
    │       │                   │   ├── annotation
    │       │                   │   │   ├── ComboExtract.java
    │       │                   │   │   ├── ExtractBy.java
    │       │                   │   │   ├── ExtractByUrl.java
    │       │                   │   │   ├── Formatter.java
    │       │                   │   │   ├── HelpUrl.java
    │       │                   │   │   ├── package.html
    │       │                   │   │   └── TargetUrl.java
    │       │                   │   ├── ConsolePageModelPipeline.java
    │       │                   │   ├── Extractor.java
    │       │                   │   ├── FieldExtractor.java
    │       │                   │   ├── formatter
    │       │                   │   │   ├── BasicTypeFormatter.java
    │       │                   │   │   ├── DateFormatter.java
    │       │                   │   │   ├── ObjectFormatter.java
    │       │                   │   │   └── ObjectFormatters.java
    │       │                   │   ├── HasKey.java
    │       │                   │   ├── ModelPageProcessor.java
    │       │                   │   ├── ModelPipeline.java
    │       │                   │   ├── OOSpider.java
    │       │                   │   ├── package.html
    │       │                   │   ├── PageModelCollectorPipeline.java
    │       │                   │   └── PageModelExtractor.java
    │       │                   ├── MultiPageModel.java
    │       │                   ├── pipeline
    │       │                   │   ├── CollectorPageModelPipeline.java
    │       │                   │   ├── FilePageModelPipeline.java
    │       │                   │   ├── JsonFilePageModelPipeline.java
    │       │                   │   ├── JsonFilePipeline.java
    │       │                   │   ├── MultiPagePipeline.java
    │       │                   │   └── PageModelPipeline.java
    │       │                   ├── scheduler
    │       │                   │   ├── FileCacheQueueScheduler.java
    │       │                   │   └── RedisScheduler.java
    │       │                   ├── selector
    │       │                   │   └── JsonPathSelector.java
    │       │                   └── utils
    │       │                       ├── DoubleKeyMap.java
    │       │                       ├── ExtractorUtils.java
    │       │                       └── MultiKeyMapBase.java
    │       └── test
    │           ├── java
    │           │   └── us
    │           │       └── codecraft
    │           │           └── webmagic
    │           │               ├── downloader
    │           │               │   └── FileCacheTest.java
    │           │               ├── formatter
    │           │               │   └── DateFormatterTest.java
    │           │               ├── MockDownloader.java
    │           │               ├── MockPageModelPipeline.java
    │           │               ├── MockPipeline.java
    │           │               ├── model
    │           │               │   └── GithubRepoTest.java
    │           │               ├── processor
    │           │               │   └── GithubRepoProcessor.java
    │           │               ├── scheduler
    │           │               │   └── RedisSchedulerTest.java
    │           │               └── selector
    │           │                   └── JsonPathSelectorTest.java
    │           └── resouces
    │               └── log4j.xml
    ├── webmagic-lucene
    │   ├── pom.xml
    │   ├── README.md
    │   └── src
    │       └── main
    │           ├── java
    │           │   └── us
    │           │       └── codecraft
    │           │           └── webmagic
    │           │               └── pipeline
    │           │                   └── LucenePipeline.java
    │           └── test
    │               └── java
    │                   └── us
    │                       └── codecraft
    │                           └── webmagic
    │                               └── lucene
    │                                   └── OschinaBlog.java
    ├── webmagic-samples
    │   ├── pom.xml
    │   ├── README.md
    │   └── src
    │       ├── main
    │       │   ├── java
    │       │   │   └── us
    │       │   │       └── codecraft
    │       │   │           └── webmagic
    │       │   │               ├── main
    │       │   │               │   └── QuickStarter.java
    │       │   │               ├── model
    │       │   │               │   └── samples
    │       │   │               │       ├── Blog.java
    │       │   │               │       ├── GithubRepo.java
    │       │   │               │       ├── IteyeBlog.java
    │       │   │               │       ├── Kr36NewsModel.java
    │       │   │               │       ├── News163.java
    │       │   │               │       ├── OschinaAnswer.java
    │       │   │               │       └── OschinaBlog.java
    │       │   │               └── samples
    │       │   │                   ├── DiandianBlogProcessor.java
    │       │   │                   ├── HuxiuProcessor.java
    │       │   │                   ├── InfoQMiniBookProcessor.java
    │       │   │                   ├── IteyeBlogProcessor.java
    │       │   │                   ├── NjuBBSProcessor.java
    │       │   │                   ├── OschinaBlogPageProcesser.java
    │       │   │                   ├── OschinaPageProcesser.java
    │       │   │                   ├── QzoneBlogProcessor.java
    │       │   │                   ├── scheduler
    │       │   │                   │   ├── DelayQueueScheduler.java
    │       │   │                   │   ├── LevelLimitScheduler.java
    │       │   │                   │   └── ZipCodePageProcessor.java
    │       │   │                   ├── SinaBlogProcesser.java
    │       │   │                   └── TianyaPageProcesser.java
    │       │   └── resources
    │       │       └── log4j.xml
    │       └── test
    │           └── java
    │               └── us
    │                   └── codecraft
    │                       └── webmagic
    │                           ├── model
    │                           │   └── ProcessorBenchmark.java
    │                           ├── processor
    │                           │   └── SinablogProcessorTest.java
    │                           ├── samples
    │                           │   └── scheduler
    │                           │       └── DelayQueueSchedulerTest.java
    │                           └── SpiderTest.java
    ├── webmagic-saxon
    │   ├── pom.xml
    │   ├── README.md
    │   └── src
    │       ├── main
    │       │   └── java
    │       │       └── us
    │       │           └── codecraft
    │       │               └── webmagic
    │       │                   └── selector
    │       │                       └── Xpath2Selector.java
    │       └── test
    │           └── java
    │               └── us
    │                   └── codecraft
    │                       └── webmagic
    │                           └── selector
    │                               └── XpathSelectorTest.java
    ├── webmagic-selenium
    │   ├── pom.xml
    │   ├── README.md
    │   └── src
    │       ├── main
    │       │   └── java
    │       │       └── us
    │       │           └── codecraft
    │       │               └── webmagic
    │       │                   └── downloader
    │       │                       └── selenium
    │       │                           ├── SeleniumDownloader.java
    │       │                           └── WebDriverPool.java
    │       └── test
    │           └── java
    │               └── us
    │                   └── codecraft
    │                       └── webmagic
    │                           ├── downloader
    │                           │   ├── selenium
    │                           │   │   ├── SeleniumDownloaderTest.java
    │                           │   │   └── WebDriverPoolTest.java
    │                           │   └── SeleniumTest.java
    │                           └── samples
    │                               └── HuabanProcessor.java
    └── zh_docs
        ├── README.md
        └── us
            └── codecraft
                └── webmagic
                    ├── downloader
                    │   ├── Destroyable-cmnt.xml
                    │   ├── Downloader-cmnt.xml
                    │   ├── FileDownloader-cmnt.xml
                    │   ├── HttpClientDownloader-cmnt.xml
                    │   ├── HttpClientPool-cmnt.xml
                    │   └── package.cmnt
                    ├── model
                    │   ├── AfterExtractor-cmnt.xml
                    │   ├── annotation
                    │   │   ├── ComboExtract-cmnt.xml
                    │   │   ├── ExtractBy2-cmnt.xml
                    │   │   ├── ExtractBy2.Type-cmnt.xml
                    │   │   ├── ExtractBy3-cmnt.xml
                    │   │   ├── ExtractBy3.Type-cmnt.xml
                    │   │   ├── ExtractBy-cmnt.xml
                    │   │   ├── ExtractByRaw-cmnt.xml
                    │   │   ├── ExtractByRaw.Type-cmnt.xml
                    │   │   ├── ExtractBy.Type-cmnt.xml
                    │   │   ├── ExtractByUrl-cmnt.xml
                    │   │   ├── HelpUrl-cmnt.xml
                    │   │   ├── package.cmnt
                    │   │   └── TargetUrl-cmnt.xml
                    │   ├── ConsolePageModelPipeline-cmnt.xml
                    │   ├── HasKey-cmnt.xml
                    │   ├── OOSpider-cmnt.xml
                    │   ├── package.cmnt
                    │   └── PageModelPipeline-cmnt.xml
                    ├── package.cmnt
                    ├── Page-cmnt.xml
                    ├── PagedModel-cmnt.xml
                    ├── pipeline
                    │   ├── ConsolePipeline-cmnt.xml
                    │   ├── FilePipeline-cmnt.xml
                    │   ├── JsonFilePageModelPipeline-cmnt.xml
                    │   ├── JsonFilePipeline-cmnt.xml
                    │   ├── package.cmnt
                    │   ├── PagedPipeline-cmnt.xml
                    │   └── Pipeline-cmnt.xml
                    ├── processor
                    │   ├── package.cmnt
                    │   ├── PageProcessor-cmnt.xml
                    │   └── SimplePageProcessor-cmnt.xml
                    ├── Request-cmnt.xml
                    ├── ResultItems-cmnt.xml
                    ├── scheduler
                    │   ├── FileCacheQueueScheduler-cmnt.xml
                    │   ├── package.cmnt
                    │   ├── QueueScheduler-cmnt.xml
                    │   ├── RedisScheduler-cmnt.xml
                    │   └── Scheduler-cmnt.xml
                    ├── selector
                    │   ├── AndSelector-cmnt.xml
                    │   ├── CssSelector-cmnt.xml
                    │   ├── Html-cmnt.xml
                    │   ├── JsonPathSelector-cmnt.xml
                    │   ├── OrSelector-cmnt.xml
                    │   ├── package.cmnt
                    │   ├── PlainText-cmnt.xml
                    │   ├── RegexSelector-cmnt.xml
                    │   ├── ReplaceSelector-cmnt.xml
                    │   ├── Selectable-cmnt.xml
                    │   ├── Selector-cmnt.xml
                    │   ├── SelectorFactory-cmnt.xml
                    │   ├── SmartContentSelector-cmnt.xml
                    │   └── XpathSelector-cmnt.xml
                    ├── Site-cmnt.xml
                    ├── Spider-cmnt.xml
                    ├── Task-cmnt.xml
                    └── utils
                        ├── DoubleKeyMap-cmnt.xml
                        ├── FilePersistentBase-cmnt.xml
                        ├── MultiKeyMapBase-cmnt.xml
                        ├── package.cmnt
                        ├── ThreadUtils-cmnt.xml
                        └── UrlUtils-cmnt.xml

134 directories, 232 files
  • 商品评价
  • 交易规则
  • 交易流程

  • 发货方式

  • 1、自动:在上方保障服务中标有自动发货的商品,拍下后,将会自动收到来自卖家的商品获取(下载)链接;

    2、手动:未标有自动发货的的商品,拍下后,卖家会收到邮件、短信提醒,也可通过QQ或订单中的电话联系对方。

  • 交易周期

  • 1、源码默认交易周期:自动发货商品为1天,手动发货商品为3天,买家有1次额外延长3天交易周期的权利;

    2、若上述交易周期双方依然无法完成交易,任意一方可发起追加周期(1~7天)的请求,对方同意即可延长。

  • 退款说明

  • 1、描述:源码描述(含标题)与实际源码不一致的(例:描述PHP实际为ASP、描述的功能实际缺少、版本不符等);

    2、演示:有演示站时,与实际源码小于95%一致的(但描述中有"不保证完全一样、有变化的可能性"类似显著声明的除外);

    3、发货:手动发货源码,在卖家未发货前,已申请退款的;

    4、安装:免费提供安装服务的源码但卖家不履行的;

    5、收费:额外收取其他费用的(但描述中有显著声明或双方交易前有商定的除外);

    6、其他:如质量方面的硬性常规问题等。

    注:经核实符合上述任一,均支持退款,但卖家予以积极解决问题则除外。

  • 1注意事项

  • 1、源码家会对双方交易的过程及交易商品的快照进行永久存档,以确保交易的真实、有效、安全!

    2、源码家无法对如“永久包更新”、“永久技术支持”等类似交易之后的商家承诺做担保,请买家自行鉴别;

    3、在源码同时有网站演示与图片演示,且站演与图演不一致时,默认按图演作为纠纷评判依据(特别声明或有商定除外);

    4、在没有"无任何正当退款依据"的前提下,商品写有"一旦售出,概不支持退款"等类似的声明,视为无效声明;

    5、在未拍下前,双方在QQ上所商定的交易内容,亦可成为纠纷评判依据(商定与描述冲突时,商定为准);

    6、因聊天记录可作为纠纷评判依据,故双方联系时,只与对方在互站上所留的QQ、手机号沟通,以防对方不承认自我承诺。

    7、虽然交易产生纠纷的几率很小,但一定要保留如聊天记录、手机短信等这样的重要信息,以防产生纠纷时便于互站介入快速处理。

  • 互站声明

  • 1、源码家作为第三方中介平台,依据交易合同(商品描述、交易前商定的内容)来保障交易的安全及买卖双方的权益;

    2、非平台线上交易的项目,出现任何后果均与互站无关;无论卖家以何理由要求线下交易的,请联系管理举报。