QueryList API文档

QueryList get($url,$args = null,$otherArgs = [])


Http get插件,用来轻松获取网页。该插件基于GuzzleHttp,请求参数与它一致。

GuzzleHttp 手册: http://guzzle-cn.readthedocs.io/zh_CN/latest/request-options.html

用法


基本用法

$ql = QueryList::get('http://httpbin.org/get?param1=testvalue');
echo $ql->getHtml();

等价于下面操作:

$html = file_get_contents('http://httpbin.org/get?param1=testvalue');
$ql = QueryList::html($html);
echo $ql->getHtml();

带url请求参数

$ql->get('http://httpbin.org/get',[
    'param1' => 'testvalue',
    'params2' => 'somevalue'
]);

$ql->get('http://httpbin.org/get','param1=testvalue& params2=somevalue');

echo $ql->getHtml();

输出:

{
  "args": {
    "param1": "testvalue",
    "params2": "somevalue"
  },
  "headers": {
    "Connection": "close",
    "Host": "httpbin.org",
    "Referer": "http://httpbin.org/get",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
  },
  "origin": "112.97.*.*",
  "url": "http://httpbin.org/get?param1=testvalue¶ms2=somevalue"
}

携带Cookie采集需要登录的页面

  • 例一
//采集新浪微博需要登录才能访问的页面

$ql = QueryList::get('http://weibo.com',[],[
    'headers' => [
        //填写从浏览器获取到的cookie
        'Cookie' => 'SINAGLOBAL=546064; wb_cmtLike_2112031=1; wvr=6;....'
    ]
]);

//echo $ql->getHtml();

echo $ql->find('title')->text();
//输出: 我的首页 微博-随时随地发现新鲜事
  • 例二 http插件默认已经开启了cookie功能,当然你也可以手动设置cookie,具体用法可查看GuzzleHttp文档。
$cookieJar = new \GuzzleHttp\Cookie\CookieJar();

$ql = QueryList::get('https://www.baidu.com/',[],[
    'cookies' => $cookieJar
]);

伪造浏览器请求头部信息

$ql->get('http://httpbin.org/get',[
    'param1' => 'testvalue',
    'params2' => 'somevalue'
],[
    'headers' => [
        'Referer' => 'https://querylist.cc/',
        'User-Agent' => 'testing/1.0',
        'Accept'     => 'application/json',
        'X-Foo'      => ['Bar', 'Baz'],
        'Cookie'    => 'abc=111;xxx=222'
    ]
]);
echo $ql->getHtml();

输出:

{
  "args": {
    "param1": "testvalue",
    "params2": "somevalue"
  },
  "headers": {
    "Accept": "application/json",
    "Connection": "close",
    "Cookie": "abc=111;xxx=222",
    "Host": "httpbin.org",
    "Referer": "https://querylist.cc/",
    "User-Agent": "testing/1.0",
    "X-Foo": "Baz"
  },
  "origin": "112.97.*.*",
  "url": "http://httpbin.org/get?param1=testvalue¶ms2=somevalue"
}

使用Http代理

$ql->get('http://httpbin.org/get',[
    'param1' => 'testvalue',
    'params2' => 'somevalue'
],[
    'proxy' => 'http://222.141.11.17:8118',
    //设置超时时间,单位:秒
    'timeout' => 30,
    'headers' => [
        'Referer' => 'https://querylist.cc/',
        'User-Agent' => 'testing/1.0',
        'Accept'     => 'application/json',
        'X-Foo'      => ['Bar', 'Baz'],
        'Cookie'    => 'abc=111;xxx=222'
    ]
]);
echo $ql->getHtml();

输出:

{
  "args": {
    "param1": "testvalue",
    "params2": "somevalue"
  },
  "headers": {
    "Accept": "application/json",
    "Connection": "close",
    "Cookie": "abc=111;xxx=222",
    "Host": "httpbin.org",
    "Proxy-Connection": "Keep-Alive",
    "Referer": "https://querylist.cc/",
    "User-Agent": "testing/1.0",
    "X-Foo": "Baz"
  },
  "origin": "222.141.11.17",
  "url": "http://httpbin.org/get?param1=testvalue¶ms2=somevalue"
}

更多强大的Http网络操作

GuzzleHTTP是一款功能非常强大的Http客户端,你需要的Http功能它都有,更多用法可以查看GuzzleHTTP文档:http://guzzle-cn.readthedocs.io/zh_CN/latest/request-options.html