Baidu กับ Yandex มีไอพีอัพเดทใหม่บ้างไหมครับ ผมจะบล๊อกหน่ะ

ihotVPS · May 17, 2013, 11:28am

อันนี้ที่ผมบล๊อกอยู่ตอนนี้ครับ

61.135.192.0/18 # cn-youdaobot
77.88.0.0/18 # ru-yandex
77.91.224.0/24 # ru-webalta
87.250.224.0/19 # ru-yandex
92.241.182.0/24 # ru-webalta
93.158.128.0/18 # ru-yandex
95.108.128.0/17 # ru-yandex
119.63.192.0/21 # jp-baidu
123.125.64.0/18 # cn-baidu
178.154.128.0/17 # ru-yandex
180.76.0.0/16 # cn-baidu
193.47.80.0/24 # fr-exabot
213.180.192.0/19 # ru-yandex
220.181.0.0/18 # cn-chinanet-baidu
220.181.108.0/24 # cn-chinanet-baidu

pizzaman911 · May 17, 2013, 11:45am

ข้อมูลจากต้นทางเลยครับ

6. How can I prevent Baiduspider from crawling my site?

[COLOR=#333333][FONT=Arial]Baiduspider works on the robots.txt protocol. You can prevent Baiduspider from crawling your entire site or the specific contents by specifying them in robots.txt. Please note that by doing this, the pages of your site will not be found in Baidu search results and in any other the search results which is provided by Baidu. For details of setting a robots.txt, please see How to create a robots.txt

You can set different rules towards different user-agents. (Please note Baiduspider-video does not support the rules currently). If you prefer to prevent all the user-agents of Baidu, you can simply block Baiduspider.

Below robots command will block all the crawling from Baidu.
User-agent: Baiduspider
Disallow: /

Below robots command will allow Baiduspider-image only to crawl the directory of /image/
User-agent: Baiduspider
Disallow: /

User-agent: Baiduspider-image
Allow: /image/

[/FONT][/COLOR]
Please note that the pages that crawled by Baiduspider-cpro will not be built into the index and Baiduspider-cpro works on the agreement that set with customers. In this case, Baiduspider-cpro will not work on the records set by robots.txt. If you are not comfortable with Baiduspider-cpro, please contact union1@baidu.com. Baiduspider-ads will not be built into the index and Baiduspider-ads works on the agreement that set with customers. In this case, Baiduspider-ads will not work on the records set by robots.txt. If you are not comfortable with Baiduspider-ads, please contact your customer service representative.

ton1 · May 17, 2013, 2:58pm

เอาไม่อยู่หรอกครับ baiduspider มันไม่สนใจ robots.txt ครับ

360 · May 17, 2013, 3:02pm

Baidu ล่อ IP จีนเข้า server

Yandex ล่อ IP รัสเซียเข้า server

ผมยังไม่เห็นประโยชน์อะไรจากสองบอทนี้เลย

ตอนนี้ที่มีเพิ่มมา

180.76.5.0/24
180.76.6.0/24

ไล่ดูเพิ่มได้จาก

https://ipdb.at/org/Beijing_Baidu_Netcom_Science_and_Technology_Co.,_L

อัพเดทล่าสุดคือ 16/05/2013

kke · May 17, 2013, 4:14pm

พวกนี้มาทีถ้าเว็บใหน db ไม่ทำ index ดีๆนี่ server ล่มได้เลย :9bbc76d5:

pizzaman911 · May 17, 2013, 9:29pm

เคยเห็นพวกฝรั่งคุยกันเรื่องนี้เหมือนกันครับ บางคนบอกว่ามันฟังนะ แต่เพราะเราไป block ip เค้า เลยอ่าน robots.txt ไม่ได้ บลาๆๆๆ

ไม่รู้ที่เทพต้นเจอมาเป็นไงบ้าง

ton1 · May 18, 2013, 11:16am

ที่เจอมา baiduspider จะไม่สนใจ robots.txt ครับ ถึงจะ Disallow ก็ยังมา crawling
ผมก็เลย block ที่ caching (Varnish) ถ้า User-Agent เป็น Baiduspider|baiduspider
ก็ Error 403 ไปเลย ไม่งันไม่ไหวครับ มาทีนึง ระบบหนักมาก

domainxhosting · May 18, 2013, 11:18am

How to การตั้งค่า caching (Varnish) ได้ไหมครับ
ขอบคุณล่วงหน้าครับ

ton1 · May 18, 2013, 11:26am

ในส่วนของ vcl_recv ให้ใส่ตรงนี้เพิ่มไปครับ

Block Baidu Spider

if (req.http.user-agent ~ “(Baiduspider|baiduspider)”) {
error 403 “Forbidden, specified in robots.txt. You should know better.”;
}

domainxhosting · May 18, 2013, 11:28am

ขอบคุณครับ

ihotVPS · May 18, 2013, 11:46am

มีของ Nginx บ้างไหมครับ อยากได้ ขอบคุณครับ :875328cc:

icez · May 18, 2013, 11:46am

if ($http_user_agent ~* “baiduspider|yandex”) {
return 403;

}