批处理之家 - Powered by Discuz! Board

标题: [其他] 有无第三方命令行工具可将百度搜索结果中的重定向/跳转链接转换成实际网址？ [打印本页]

作者: 我来了 时间: 2016-8-31 12:03 标题: 有无第三方命令行工具可将百度搜索结果中的重定向/跳转链接转换成实际网址？

本帖最后由 pcl_test 于 2016-9-3 19:28 编辑

举例如下：

http://www.baidu.com/link?url=Wu73A5XbmmXbZhGlXCtulGo9VW6nUhnXwxMI4cjO5_j9mVXde7r9LcP5h1GF0qR1&wd=&eqid=8f503133000038ff0000000257c655f0=====>http://www.bathome.net 转换成网址的命令行工具可有乎？

关键不知这类的东西的关键字，或相关的工具，所以来求助下！

谢谢了。试了很多关键字也未得其法。

问题补充：因为第一轮得到的搜索引擎结果，要活人有针对的过滤下（粘到TXT中）
。。。
接下来要读取，所以最好用命令行工具实现。

作者: happy886rr 时间: 2016-8-31 12:21

本帖最后由 happy886rr 于 2016-8-31 12:35 编辑

回复 1# 我来了
好像没有吧，再说你的网址里包含很多特殊字符，http://...qR1&wd=&eqid=8f503...你的网址有两个&，在命令行下就歧义了，根本不存在这样的工具。用cmd不好解决。
而且这都是没有关键词的跳转链

作者: 523066680 时间: 2016-8-31 12:45

额……看了别的帖子，好像不是直接能解的

http://bbs.125.la/forum.php?mod= ... 6%E6%90%9C%E7%B4%A2

作者: codegay 时间: 2016-8-31 12:57

需要访问过才行。用wget curl 之类的

作者: 我来了 时间: 2016-8-31 13:10

需要访问过才行。用wget curl 之类的
codegay 发表于 2016-8-31 12:57

既然没有就算了，弄个虚拟机，开个浏览器，按精精灵+读取了。。

方法笨点，能解决就得了，感谢各位抽空回贴~~

锁吧。姥爷~~

作者: pcl_test 时间: 2016-8-31 13:42

mshta http://bathome.net/s/hta "web('http://www.baidu.com/link?url=Wu73A5XbmmXbZhGlXCtulGo9VW6nUhnXwxMI4cjO5_j9mVXde7r9LcP5h1GF0qR1&wd=&eqid=8f503133000038ff0000000257c655f0=====').match(/URL='([^']+)'/)[1]"
复制代码

作者: wskwfkbdn 时间: 2016-8-31 14:05

GET获取源码，其中就是要跳转的网址，我抓包抓到的

https://www.baidu.com/link?url=e ... f7b0000000457c67115

<meta content="always" name="referrer"><script>try{if(window.opener&&window.opener.bds&&window.opener.bds.pdc&&window.opener.bds.pdc.sendLinkLog){window.opener.bds.pdc.sendLinkLog();}}catch(e) {};var timeout = 0;if(/bdlksmp/.test(window.location.href)){var reg = /bdlksmp=([^=&]+)/,matches = window.location.href.match(reg);timeout = matches[1] ? matches[1] : 0};setTimeout(function(){window.location.replace("http://www.bathome.net/")},timeout);</script>
<noscript><META http-equiv="refresh" content="0;URL='http://www.bathome.net/'"></noscript>

作者: 我来了 时间: 2016-8-31 15:34

pcl_test 发表于 2016-8-31 13:42

版主姥爷，俺愚钝不太明白，那要是任意的
https://www.baidu.com/link?url=q ... 04e0000000657c68765
呢？

作者: 我来了 时间: 2016-8-31 15:39

mshta http://bathome.net/s/hta "web('https://www.baidu.com/link?url=XXR7bWJ1n0S0rbKpvT6FWixt863bF3sIUEEC4inUGJRPDj2OAiUva4c5vIcULrdE&wd=&eqid=82a228fa00000c1a0000000257c686b3').match(/URL='([^']+)'/)[1]"
懂了，修改粉处，

那样不要弹框的，改成写入文件的呢?

作者: codegay 时间: 2016-8-31 17:18

本帖最后由 codegay 于 2016-8-31 17:20 编辑

python 有个selenium库主要用于web测试

from selenium import webdriver

driver = webdriver.Chrome()

driver.get('https://www.baidu.com/link?url=qjLLUDJLIlOx0TUSg_xpz-Zcjnos1TprMc6_3H0XyXcD0OQD5RSeaPwKMtkzDUC1_G2uXfkzF2bos7uoPclSda&wd=&eqid=be17ee7a0000a04e0000000657c68765')

print (driver.title)
print(driver.current_url)
input("暂停")
driver.quit()
复制代码

输出：

jQuery页面滚动图片等元素动态加载实现 jquery.scrollLoading.js的运用-天云网络
[url]http://www.itiyun.com/jquery-scrollloading-js.html[/url]
复制代码

作者: happy886rr 时间: 2016-8-31 18:43

回复 10# codegay
python3.5运行报错，你用的什么版本的python

作者: codegay 时间: 2016-8-31 21:15

回复 11# happy886rr

python3.4
装个chromedriver
添加到path

作者: codegay 时间: 2016-8-31 21:17

回复 11# happy886rr

同类的工具还有 ghost.py

安装pyqt4
然后pip install ghost.py
https://github.com/jeanphix/Ghost.py

from ghost import Ghost
ghost = Ghost()

with ghost.start() as session:
    page, extra_resources = session.open("http://jeanphix.me")
    assert page.http_status == 200 and 'jeanphix' in page.content
复制代码

作者: happy886rr 时间: 2016-8-31 22:51

回复 13# codegay
这些模块，安装起来还蛮费劲。一直不成功，cannot find Chrome binary。各种冲突，而且我的是360浏览器，不知支持不？

作者: codegay 时间: 2016-8-31 23:17

phantomjs 你也装一下看看。
之前文档上没说要装这个，我也没装，结果我from selenium.webdriver import PhantomJS
也不用了。

http://phantomjs.org/download.html

作者: codegay 时间: 2016-8-31 23:21

回复 14# happy886rr

ghost.py似乎是后台执行的吧，好像不需要浏览器。
selenium 下面的webdriver 之类的功能是利用chrome作为容器执行网页代码的。所以必须要有。

作者: 523066680 时间: 2016-8-31 23:56

本帖最后由 523066680 于 2016-8-31 23:59 编辑

就是说无论如何都必须在线获取了对吗
perl

use LWP::UserAgent;
my $ua = LWP::UserAgent->new;
my $res;
my $url = "http://www.baidu.com/link?url=Wu73A5XbmmXbZhGlXCtulGo9VW6nUhnXwxMI4cjO5_j9mVXde7r9LcP5h1GF0qR1&wd=";

$res = $ua->get($url) or warn "$!";
$res->content() =~ /\("(.*)"\)/;
print $1;
复制代码

LWP::Simple

use LWP::Simple;
my $content;
my $url = "http://www.baidu.com/link?url=Wu73A5XbmmXbZhGlXCtulGo9VW6nUhnXwxMI4cjO5_j9mVXde7r9LcP5h1GF0qR1&wd=";

$content = get($url) or warn "$!";
$content =~ /\("(.*)"\)/;
print $1;
复制代码

作者: happy886rr 时间: 2016-9-1 14:37

回复 16# codegay
确实只针对谷歌chrome,火狐浏览器才有效。
那个ghost.py不错，可以运行。

作者: 我来了 时间: 2016-9-3 17:51

回复 17# 523066680

我记的上次我用【火车头采集】时，那个bing的结果
设好后能得到前面几页的链接反馈

作者: 我来了 时间: 2016-9-3 19:38

HwndEx = Plugin.Window.Search("cmd.exe")
MyArray = Split(HwndEx, "|")
If UBound(MyArray)>=0 Then
Delay 50
MessageBox "没运行批处理"
Else
RunApp "C:\Users\wcc\Desktop\test.cmd"
End if
我终于知道为什么程序，没运行结束就执行下面的操作了，因为cmd.exe始终在进程中
以至于下面的程序

Rem 下一个
读第1行 = Lib.文件.读取指定行文本内容("C:\Users\wcc\Desktop\要读取的原始链接.txt", 1)
Delay 100
Call Lib.文件.删除指定行文本内容("C:\Users\wcc\Desktop\要读取的原始链接.txt", 1)
Call Plugin.File.DeleteFile("C:\Users\wcc\Desktop\test.cmd")   

MessageBox 读第1行
批处理 = "mshta http://bathome.net/s/hta " & """web('"&读第1行&"').match(/URL='([^']+)'/)[1]"" " & ">C:\Users\wcc\Desktop\转换后的要读取的.txt"
MessageBox 批处理

Call Plugin.File.WriteFileEx("C:\Users\wcc\Desktop\test.cmd", 批处理)
Delay 1000
RunApp "C:\Users\wcc\Desktop\test.cmd"






/////////////////////


Text = Plugin.File.ReadFileEx("C:\Users\wcc\Desktop\转换后的要读取的.txt")
MessageBox Text
///////////////////////
Rem 判断是否生成转换后的要读取的文本
IsFile = Plugin.File.IsFileExit("C:\Users\wcc\Desktop\转换后的要读取的.txt")
If IsFile = True Then 
    Delay 100
    goto 写入文件
Else 
    Delay 100
    Goto 判断是否生成转换后的要读取的文本
End If

Rem 写入文件

Call Plugin.File.WriteFileEx("C:\Users\wcc\Desktop\转换后的要读取的.txt", 内容)
RunApp "F:\PowerPro_4.9n7\配置文件夹\nircmd.exe clipboard addfile C:\Users\wcc\Desktop\转换后的要读取的.txt"
Goto 下一个
复制代码

作者: 我来了 时间: 2016-9-4 08:47

回复 17# 523066680

那如果，我只要
https://www.baidu.com/s?ie=utf-8 ... u&wd=bbflash%20破解版&oq=bbflash%20破解版&rsv_pq=bbfb65b30000572f&rsv_t=a97b52zNwocqQUEwcVGZv9VfGnHAnmsfVWHxKE730OXajWQLS1y5GApJnbo&rqlang=cn&rsv_enter=0
前10页内的搜索链接，还有批处理有办法实现么？

作者: pcl_test 时间: 2016-9-4 15:18

本帖最后由 pcl_test 于 2016-9-4 17:01 编辑

获取百度搜索结果中的链接

//&cls&cscript -nologo -e:jscript "%~f0"&pause&exit

var keyword = 'bbflash 破解版';  //指定搜索关键字
var n=10;  //指定需获取搜索结果的总页数
for(var i=0;i<(n*10);i+=10)GetURL(i, keyword);

function GetURL(page, str){
    var url = 'https://www.baidu.com/s?pn='+page+'&ie=utf-8&wd='+encodeURIComponent(str);
    try{var http = new ActiveXObject('MSXML2.XMLHTTP')}
    catch(e){var http = new ActiveXObject('WinHttp.WinHttpRequest.5.1')}
    http.open('GET', url, false);
    http.send();
    var htmltext = http.ResponseText;
    var reg = new RegExp('<h3\\s?[^<]*?>[\\s\\S]*?<a\\s?[^<]*?(data-click=[^<]+?)?href\\s*?=\\s*?"([^"]+)"[^<]+?>([\\s\\S]+?)<\\/a>', 'ig');
    var result='', m=0, t=page/10+1;
    WSH.echo('-----------------------第'+t+'页搜索结果-----------------------');
    while((result = reg.exec(htmltext)) != null){
        m++;
        WSH.echo(t+'-'+m+'、'+result[3].replace(/<\/?em>/ig, '')+'\r\n百度加密链接:'+result[2]);
        http.open('GET', result[2]+'&wd=', false);
        http.send();
        WSH.echo('原链接:'+http.ResponseText.match(/URL='([^']+)'/)[1]+'\r\n');
    }
    http=null;
}
复制代码

结果如下：

-----------------------第1页搜索结果-----------------------
1-1、BBFlashBack破解版|BB FlashBack Pro(屏幕录像机) v5..._xp系统之家
百度加密链接:http://www.baidu.com/link?url=So ... *N_r58Sw7gQkUF5HOfa
原链接:http://www.xp510.com/xiazai/media/Recording/11348.html

1-2、BB FlashBack Pro下载v5.12.0 汉化破解版_五星级屏幕捕捉和记录...
百度加密链接:http://www.baidu.com/link?url=TE ... oPQggExnkdbOkIukP5P
原链接:http://www.cr173.com/soft/17292.html
……

作者: 我来了 时间: 2016-9-4 17:03

本帖最后由 pcl_test 于 2016-9-4 17:15 编辑

回复 22# pcl_test

我改成这样子了，路过的老师帮助我下，谢谢。
只保留一句输出
WSH.echo(''+http.ResponseText.match(/URL='([^']+)'/)[1]+'\r\n'); 这句如何改才能输出c:\1.txt

欢迎光临批处理之家 (http://www.bathome.net/)