GIGJ.COM
welcome to my space
X
Search:  
Welcome to:gigj.com
Personal Injury | Languages | Soups | Photography | Email | Prepress | Exotic Locations | Computer Games | Related articles
NAVIGATION - HOME
vBulletin Spiders Directory (for updated spiders_vbulletin.xml files)
Published by: jack 2009-01-07

  • Radio and TV Player - Page 3 - vBulletin.org Forum::
    16 posts - Last post: Jan 22, 2008vBulletin Spiders Directory (download updated spiders_vbulletin.xml files and submit the spiders you know)
    http://www.vbulletin.org/forum/showthread.php?t=167567&page=3
    HOME
    Heya,

    I made a system where people can submit spiders and download updated spiders_vbulletin.xml files for their forum. After you submit a spider I must approve it for it to be included in the list.

    Hope this helps everyone.

    http://spiderlist.codeforgers.com

    ================================================

    FYI, this is a nice mod made by Paul M. to keep track of your daily spiders visits:

    http://www.vbulletin.org/forum/showthread.php?t=167278


  • Added spider Internet for learning.

    Ah good, wasn't sure whether that was enough information. :) New to this spider identification.


  • Here are all reported spiders so far: http://www.vbulletin.com/forum/showthread.php?t=76662
    Feel free to compare it against your list.


  • Thought of making a .htaccess version ( or simular) for Apache, Solaris, BSD, Linux, Windows, Mac. It should not be that hard to make a list of bad bots. :)


  • Here's one for you to add to the list:


    Accoona
    209.212.73.133
    accoona-a133.client.pins.net


  • 2 more guests and one of them is the Accoona again. Accoona keeps showing up as a guest.


    server1932015481.serverpool.info

    Mozilla/5.0 (compatible; http://www.whoisde.de/2.1; +http://www.whoisde.de)



    accoona-a133.client.pins.net

    Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1


  • I've been trying to 'watch' my guest ips lately to add some to your list, but I really don't get too many other than yahoo, google, and msnbot (which I already have listed on my site).


  • This is excellent now I can see the acual names instead of a forum packed full of hungry guests.

    Also: I submitted a spider he's called "Visions"


  • Did you submit the user agent? The "ident" field is not the IP, it's the user agent. You can get that on the who's online page.


  • This thread seems to be picking up a couple IM,ing feeds.. :)


  • Thank you Dream

    I totally forgot about the spider list :D

    This is really good

    Thanks again ;)


  • Wow just updated my xml and caught 10 Yeti spiders already. I think they are from www.naver.com, but I can't be sure as the site is in chinese.


  • Yes I use that too. Were you using the latest XML?


  • I didn't want to add everything because some spiders don't even exist anymore.

    I ask your help to submit the spiders you know exist to the system.


  • Added GurujiBot.
    Thanks :)

    A little question about livebot spider.


    Livebot

    Isn;t this is the computer with .NET Framework installed ?


  • Added another:
    TinEye/1.1 (http://tineye.com/crawler.html)


  • I don't understand what you mean.


  • MR K, I'm not sure what's up, the list should have detected the bot with one of those two idents used... You said they were being detected before? Did you upload the xml to the correct folder?

    yes, absolutely, it runs cause it detects the others bots like google or yahoo ... i put the file into the /includes/xml folder ...


  • updated


  • Thanks, added as eBingBong.


  • There's no way to detect it by IP address?


  • No one congratulated me for my leet captcha skills :(

    Thanks Floris, I only not add all of those because I think most don't exist anymore. I could use some help checking the ones that do exist and registering them.


  • Thanks, approved both.


  • That's weird, omgili is already on the list.


  • Very needed, thank you.

    _V


  • No Yahoo! Slurps are being detected? You should post this in the support forums.


  • :rolleyes:

    The ident couldn't be used...


  • I think the help in the spider site has info on that.


  • Hello,

    The last file "Thu 26th Jun 2008" don't work.
    Any spiders is show. I think the file is not correct.
    Thank you.

    They show fine for me. Maybe you got a bad copy.


  • the spider was added

    if you submit more, just put in the ident field the string that is unique to the spider, no need for those "Mozilla 4.0" etc, and I'm not sure but I think that can change. I'm gonna do a FAQ now and add this to it.


  • Hello,

    The last file "Thu 26th Jun 2008" don't work.
    Any spiders is show. I think the file is not correct.
    Thank you.


  • I use spiders as random "foes" appearing on my forum. MSN as an Orc, Yahoo as a Goblin, etc. :cool:
    http://www.comicguide.net/images/smilies/lolrot.gif


  • Three weeks and not one submission :(

    And you probably won't get any until they see an update of some sort released. That seems to get people motivated.


  • Just to let you know why I like this project so much:
    I show adsense adverts to guests only.
    I have the hack that shows spiders separately on the home page.
    By having the spiders accurately identified, I can get a very quick glance from the home page of the size of the audience viewing adverts at that moment.

    I'm thinking of creating a vBulletin hack to add a scheduled task to fetch the spider list weekly to ensure it's never too far out of date. Not daily as it doesn't change that much and the traffic to your server might be excessive if it ended up a popular hack.


  • Twiceler is already on the list.


  • Excellent work, thank you! :)

    You are welcome :)

    Dream, the snapshots spiders aren't detected ... (48 hours are gone) i know this cause i perfectly know its IP ... :)
    I added your Snapbot, see if you can detect them now :)


  • You are the man! Excellent job! ;)

    Why is the xml file so much smaller than the other one?

    I noticed that, because it doesn't have the spider type I think.


  • Removed, problem solved.


  • Added spider Najdi.si.


  • Thanks, added.


  • Sorry I didn't notice that was the IP.

    There's one Acconna in the xml, but I never said it was duplicated, did I? Anyway, when you see it as a guest in your forums again, paste the User Agent for me ok? So I can fix it.

    edit: oh yes, someone else submitted it for you actually.


  • I really dont care if Dream added it or not.... it's not a spider, period.

    If he choses to add it, or not add it - despite the glaring evidence it is NOT a spider, then thats totally up to him.

    A massive part of my job involves checking log files and I stand to be corrected only if you can post me url's proving this to be a spider.

    As far as I am concerned, I've proved it isnt - if you wish to call black white, then please post the proof.

    You ought to learn to read closer. The last spider I submited was not the FunWebProducts that you are referring to. That issue is done. The spider I posted, I posted the link for the proof.


  • Nope, I added it, thanks.


  • I noticed that, because it doesn't have the spider type I think.

    Shouldn't the type be in there?


  • i've just found another spider that isn't recognized (from yahoo) ... check the attachment ...


  • Ok, I apologise.... I must have mis-interpreted the conversation.

    Here are some regular bots on my boards:-

    85.225.137.240
    Mozilla/4.0 (BejiBot Crawler 1.2a)

    88.131.106.7
    Speedy Spider (http://www.entireweb.com/about/search_tech/speedy_spider/)

    82.80.252.110
    BoardTracker (http://www.boardtracker.com/spider.html) (Mozilla/4.0 compatible; MSIE 6.0; Linux Cent

    61.247.217.36
    Yeti/0.01 (nhn/1noon, yetibot@naver.com, check robots.txt daily and follow it)

    209.11.177.198
    Mozilla/4.0 (compatible; BOTW Spider; +http://botw.org)

    142.166.3.122
    R6_CommentReader(www.radian6.com/crawler (http://www.radian6.com/crawler))

    thanks

    added BejiBot, Bot W, Radian6 Comment Reader, Radian6 FeedFetcher and Yeti

    the others were already there


  • 1,000 downloads :)


  • Here are all reported spiders so far: http://www.vbulletin.com/forum/showthread.php?t=76662
    Feel free to compare it against your list.

    That one hasn't been updated for a while though. Is there anything newer?


  • Thanks :), I added it.


  • Exactly. I have no idea why it showed up as a Guest, but it did, sir. ;).


  • Here is a list I use.

    http://www.botsvsbrowsers.com/category/1/index.html

    It's the biggest I've seen so far.

    First. is that fairly accurate? And second, wanna share the xml file for it? ;)


  • Another one not showing up ...

    38.98.19.67
    Mozilla/5.0 (SnapPreviewBot) Gecko/20061206 Firefox/1.5.0.9


  • yep
    Cool, I had no idea there were this many bots out there. Awesome, thanks for making it. So do you just import it somehow into vbulletin?


  • A second Yeti bot was added.


  • I just submitted the ident string for thre Accoona spider that keeps showing up as a guest.


  • Added

    Internet Research Institute UK
    Scrubby


  • There has to be a way around this even if it means us coming up with a hack to do it. Are you game?

    Sorry this goes beyond my interest in the problem, but if you get someone to do it I'll add IPs to the system.

    Whatever you guys decide is good by me, let me know if I have to remove the spider.


  • I never used it before doing this project either.


  • I thought about making a mailing, I'll do it eventually. Just subscribe this thread for now.


  • not sure of the email or website, but i submitted this ...

    livebot-65-55-165-84.search.live.com
    livebot-65-55-165-114.search.live.com

    65.55.165.84 (http://www.gangroomforum.com/online.php?do=resolveip&ipaddress=65.55.165.84) & 65.55.165.114 (http://www.gangroomforum.com/online.php?do=resolveip&ipaddress=65.55.165.84)
    Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1.4322)is this what your looking for? thanks for all the work you do with the spider list. its greatly appreciated. :)


  • Thanks, added.


  • Hi,

    I'd like to add that a few spiders/bots are not being removed using your list together with the mod I described above. See the attached picture.

    Maybe Yeti isn't being removed because it identifies itself as Yeti/1.0, see screenshot.

    Cheers,
    Gabriel.


  • Very nice! I'll be using this.

    Heya,

    I made a system where people can submit spiders and download updated spiders_vbulletin.xml files for their forum. After you submit a spider I must approve it for it to be included in the list.

    Hope this helps everyone.

    http://spiderlist.codeforgers.com


  • ok removed mrsputnik and added google mobile spider
  • Ana Sayafada Arama Motorları Gözüksün (sprits) Yardım ::
    $spider = $vbulletin->wol_spiders['agents']["$agent"]; .. Upload\includes\xml \spiders_vbulletin.xml dosyasını forum/includes/xml/ klasörünüze upload et.
    http://www.hanemiz.com/ana-sayafada-arama-motorlari-gozuksun-sprits-yardim-t73176.html
    HOME


  • I'm just the spider list janitor :P


  • Ok, I was using the old spiders XML file to find spiders on my forum to add to the system, but I realized I would never remove it for fear of missing a spider. So, I added all spiders from the old XML to the system, and we'll remove the ones that doesn't exist anymore.

    So, now we have 400 spiders in the system, and I need a hug after all this work or I'll cry. :p


  • Ok, I was using the old spiders XML file to find spiders on my forum to add to the system, but I realized I would never remove it for fear of missing a spider. So, I added all spiders from the old XML to the system, and we'll remove the ones that doesn't exist anymore.

    So, now we have 400 spiders in the system, and I need a hug after all this work or I'll cry. :p
    You deserve a hug. give me a hug.

    Regards


  • The ident or user agent is the ID of the spider, that shows in the who's online when you choose to show the user agent, for example

    Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

    In this case, this is the Google spider with the ident Googlebot.

    Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)

    This is the Yahoo! spider with ident Yahoo! Slurp.


  • Spider TinEye added.


  • This is a good list, but again you would need to click though or do a internet search to make sure these spiders are still active. I would still like to be able to ban a spider, as some do just take your images.
    http://www.user-agents.org/


  • just added another msnbot for the verification ...
    That one is not MSNBOT, it's users who have .NET installed, it was discussed earlier in the thread.


  • Thanks, added.


  • Maybe, I don't know for sure.

    You are right:

    http://www.webmasterworld.com/forum11/2715.htm

    .net clr 1.1.4322 will be present in any IE (90%+ of the market) that is on a machine with .net FrameWork installed. I think this framework became part of Service Pack 2, so its very widespread -- you will ban innocent people.

    I removed Livebot from the list.


  • http://dream.epicfailed.us/ ?


  • Someone submitted yours I think, I thought it was you honestly.

    You just look at the spider list and see if the spider is already there. If not you can submit it.

    The list is updated whenever there are new spiders to approve.

    Is there a way to send out a notice when it is updated by chance?

    And I will keep an eye out for spiders and report them to you as I have some pesky ones show up every now and then.


  • I am not familair at all with spiderlists so please forgive my newbee questions; is this spiderlist suitable for all types of sites?
    Pool spiders on Yedda - People. Sharing. Knowledge.::
    vBulletin Spiders Directory (for updated spiders_vbulletin.xml files). Heya, I made a system where people can submit spiders and download updated
    http://yedda.com/questions/Pool_spiders_1498151176856/
    HOME

    I suppose what it does is get your site spidered by the spiders that are listed in the .xml file? Wouldn't these spiders normally find your sites by themselves?


  • Here's a strange one:

    38.104.58.118 (http://www.fathers-rights-forums.com/forums/online.php?do=resolveip&ipaddress=38.104.58.118)
    panscient.com
    (http://www.fathers-rights-forums.com/forums/online.php?do=resolveip&ipaddress=38.104.58.118)
    The panscient.com was the full User Agent string, believe it or not.


  • Well then tell me so in the note, or you expect I discover "T312461" is an ident? Why didn't you submit only "T312461"?

    If you don't like my work, then use something else. I don't have to put up with demands and angryness as I'm not being paid to do this. And you talk about my attitude.


  • You don't owe me anything, use whatever you like ;)

    To get the User Agent, in the Who's Online page there's an option at the bottom "show user-agent: yes / no". It's that simple.


  • Of course I'm not aiming to have India website cloning bots Jose...


  • nothing it doesn't works ... however, Dream i submitted the 'new' agent of the snapbot with its IPs ... check out ... thx


  • I quoted one already and there are many more on the net. Use the search like I did. MRSPUTNIK is spammers, no matter how you try to justify it. Are you one of them?


  • Great to hear that :)


  • No problem. As I said, I need the User Agent of the spider, not the IP. This is a sample user agent:

    Mozilla/5.0 (Windows; U; Windows NT 5.1; pt-BR; rv:1.8.1.12) Gecko/20080201 Firefox/2.0.0.12

    I don't know how the old xml can get more spiders, as this one has all spiders the old one has. But it's your call and you may be right, if you find out why please tell me.

    Also, the Omgili spider was added and the file updated.

    Why does the Accoona one keep showing up as a guest now?

    I'll, try your file again. I owe you that much.

    How do I go about getting the user agent stuff? When I resolve it it doesn't come up with that stuff.


  • I thought it was already in there but it came in as a guest so I thought I should report it.


  • Here you go.

    WebAlta Crawler/2.0 (http://www.webalta.net/ru/about_webmaster.html) (Windows; U; Windows NT 5.1; ru


  • Before you roll your eyes in that sarcastic tone, maybe you ought to check this out and learn how to use an IDENT properly.

    http://www.botsvsbrowsers.com/details/46560/index.html

    T312461 is what you need.


  • added Attributor and YandexBlog


  • rofl :)

    Means he recognizes his fault and went: EPIC FAIL

    Nice job though Dream, I am looking forward to an updated version.


  • I installed your list of spiders about a week or so ago and now I see all sorts of the things that I had never heard about before! "Long, thin, slimy ones; Short, fat, juicy ones, Itsy, bitsy, fuzzy wuzzy spiders." Ok, ok, that is supposed to be regarding worms, but I thought it was appropriate here. :)


  • is mail.ru a spam site or it's "ok"?mail.ru is one of biggest webmail services in Russia, just like hotmail.com or yahoo.com in US.


  • Here's one for you to add to the list:

    Someone else submitted it, and I approved it.


  • if i see it, ill let you know. :)


  • Ok, added EnaBall spider.


  • Many thanks Dream!


  • It's hard to believe that the spiders only like my site. ;)
    I don't know how you notice them all!


  • Will do.

    Here is another one:


    64.13.138.6 (http://www.fathers-rights-forums.com/forums/online.php?do=resolveip&ipaddress=64.13.138.6)
    Mozilla/5.0 (compatible; ScoutJet; +http://www.scoutjet.com/)


  • Just submitted this one

    Charlotte
    Mozilla/5.0 (compatible; Charlotte/1.0b; http://www.searchme.com/support/) (2549df004ae664faef17dce174913cea

    http://www.searchme.com/support/pages/spider.php


    info@searchme.com


  • This is pretty much why I do not care for having this renewed, as you think you have good bots and not caring that some bots do harm for example; by using any images you have and then posting it up somewhere eles.

    I was thinking more something like this, but it would seem vBulletin is moving away from using or adding an area to show names of banned spiders coming to your board.
    http://www.vbulletin.com/forum/showpost.php?p=1496224&postcount=616


  • Thanks, I added it to the list.


  • New one not picked up.

    *submitting*

    c0c.entireweb.com
    Speedy Spider (http://www.entireweb.com/about/search_tech/speedy_spider/)


  • Cool :) don't forget to submit some spiders if you know of any.

    Dream could offer an spider.xml file for the FAQ for admins to import. And instructions on how to change a phrase to link from who's online to the faq explanation. Might be cool for some sites. (just brainstorming)

    I might not be the best person to create a FAQ about spiders, I'm just the guy who coded the system. I have to confess I don't know exactly what the "ident" is, if anyone would be so kind to explain.

    Also those spider types (rss, search) the old xml has, anyone knows if vBulletin uses that info?


  • I am not familair at all with spiderlists so please forgive my newbee questions; is this spiderlist suitable for all types of sites?

    I suppose what it does is get your site spidered by the spiders that are listed in the .xml file? Wouldn't these spiders normally find your sites by themselves?cant help you with the first question, but to answer your second one ...
    it doesnt help get your site spidered, it simply identifies the spiders/bots that are comming to your site. :)


  • Sorry I was really busy this week.

    I updated the list, but deleted the toolbars and plugins as they aren't spiders.

    Boofo, the accoona user agents you are submitting only contain common stuff that any user agent can have. For vbulletin to be able to detect a spider the Ident must have just the string that is unique to the spider. Like this:

    Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; Deepnet Explorer 1.5.0; .NET CLR 1.0.3705)

    The detection ident is

    Deepnet Explorer


  • Acoona does the same thing. They have figured out how to squeeze by. ;)


  • MR K, I'm not sure what's up, the list should have detected the bot with one of those two idents used... You said they were being detected before? Did you upload the xml to the correct folder?


  • Here are two that are HAMMERING MY SITE.:


    c138.cyan.fastwebserver.de <--This one the prefix changes color ( purple.fastwebserver.de..etc etc)

    and


    ns.km31707.keymachine.de


    I have no idea what they are but I have blocked the IP addesses and they STILL keep hammering away.


    HELP!!



    -Dave


  • The other one ident's is only 'omgilibot'.


  • Great idea, and job!!! I will definitly be downloading and using them, :) Thanks.
    It would be nice if this thread was stickied.

    I use spiders as random "foes" appearing on my forum. MSN as an Orc, Yahoo as a Goblin, etc. :cool:


  • I won't bother replying that. Re-submit your spider and I'll add it.


  • I already am subscribed. ;)

    Let me know if I can help in any way.


  • Actually, the ones you are submitting appear on my site too :)

    But yeah, I'm a very lazy guy :p


  • I have no idea what this is for. Forgive my ignorance, but what does this do for a forum? Does it just help vbulletin list the correct bot that is sucking content from the forums?


  • sweet thanks!


  • If people expect I code and host the system AND add 200 spiders the project dies here.

    If Stadler had had that attittude it never would have started in the first place. If it dies, it dies. I was just letting you know what it takes to get people motivated.


  • Well, I did not know that. How do I know if it is already been submitted? Better to submit it than not, right?

    How often do you update the spiders xml for download?


  • The types are: blog crawler, RSS crawler or crawler etc.

    See the definitions in the XML file in this thread for more information http://www.vbulletin.com/forum/showpost.php?p=565415


  • just updated the spider.xml nothing is changed ... (ok i'll wait a bit) however check the pic attached ... it's a weird '+' before the 'http' ...


  • thats arguable, people may want to see how many people using that service is there

    Then I think MRSPUTNIK needs to be re-added as I originally reported. As you can see, the link I posted says what it is. His argument pretty well re-enforces my original report.


  • Well, I decided it shouldn't, for some reason at the start of the project.

    Most info on that was wrong I have a hunch too. I think the ones he didn't know were classified searchspiders. I considered later adding spider type, but haven't got to it, and thought having the spider website was enough.

    Also I'm not sure vbulletin uses the spider type field.


  • I think it's because I have no real life except for catching spiders.


  • Ok, I was using the old spiders XML file to find spiders on my forum to add to the system, but I realized I would never remove it for fear of missing a spider. So, I added all spiders from the old XML to the system, and we'll remove the ones that doesn't exist anymore.

    So, now we have 400 spiders in the system, and I need a hug after all this work or I'll cry. :p

    You are the man! Excellent job! ;)

    Why is the xml file so much smaller than the other one?


  • Here's another one for you. This one came in as a Guest.

    Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.12) Gecko/20080201 Dealio Toolbar 3.1 Firef

    this one doesn't look like a spider


  • Ok, added


  • I do apologise, I use the spiders.xml to tell my guests who have visited plugin (http://www.vbulletin.org/forum/showthread.php?t=131314) which visitors are spiders and as this one doesn't identify as a spider, I assumed that it wasn't in the latest list I downloaded.


  • Here's another one:

    93.103.33.238 (http://www.fathers-rights-forums.com/forums/online.php?do=resolveip&ipaddress=93.103.33.238)
    Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; MRA 4.6 (build 01425); MRSPUTNIK 1, 5, 0, 19

    And here is a link talking about what it is:

    http://www.webhostingtalk.com/showthread.php?t=660662It's wrong to include this useragent in the spiders list because this is the normal user with installed mail.ru Agent application (something like Google Talk). Many people from russian speaking country install mail.ru agent. Please remove it.


  • Here is a list I use.

    http://www.botsvsbrowsers.com/category/1/index.html

    It's the biggest I've seen so far.


  • The bad thing is I just caught another spider posing as a Guest. The IDENT string was normal but after following the link and resolve it turned out to be a spider site. I posted it in the thread you were asking about the spider IP. We have got to figure out a way to get the ones who bypass the IDENT string catch.

    With the type of site I have with such low traffic, it is easier to catch them on there than it is on a much busier site. Most of them have no business there as to what the site is about.


  • Just to let you know why I like this project so much:
    I show adsense adverts to guests only.
    I have the hack that shows spiders separately on the home page.
    By having the spiders accurately identified, I can get a very quick glance from the home page of the size of the audience viewing adverts at that moment.

    I'm thinking of creating a vBulletin hack to add a scheduled task to fetch the spider list weekly to ensure it's never too far out of date. Not daily as it doesn't change that much and the traffic to your server might be excessive if it ended up a popular hack.

    I'd be interested in seeing what you come up with for that hack. If you need any help, let me know. And good to see you again, sir. ;)


  • Actually I meant no one seems to care about this so far. I thought the first half-hour after I posted this would go:

    - dream you rock!
    - YAY!!!

    But it's fine, I only lost one night's sleep over this, so no biggie.
    Never mind, Thanks are not the only reason behind why we do work.


  • Oh ok :) cheers


  • Did you bother to check the link and this? MRSPUTNIK

    That is no regular user.


  • Was it different than what was already in there? I didn't look.


  • Yeti/1.0 added.


  • You sure it's a spider? Could be someone on the offices where the IP is used I think. Say an employee from accoona is surfing. Maybe unlikely though, depending on what your site is.


  • Never mind for the type of the crawler, or never mind as I shouldn't mind people not coming into this thread?


  • Then I think MRSPUTNIK needs to be re-added as I originally reported. As you can see, the link I posted says what it is. His argument pretty well re-enforces my original report.
    Then we should add that MSN .NET service too by that logic...


  • Excellent work, thank you! :)


  • Three weeks and not one submission :(


  • Google Spider
    Searching Forums
    User: Beerman1 (http://www.monstermayhem.org/forums/member.php?u=79)

    Yahoo! Slurp Spider
    Viewing Who Posted
    sweet find (http://www.monstermayhem.org/forums/showthread.php?t=4065)



    Why do Google spiders show up with a physical user of the site? The others only show the spider and what thread it is picking up which is what I assume is the way it is supposed to work.


  • Well, where's Stadler? I'm not motivated to add 200 spiders alone, and I'm just letting you know. If you think my attitude is wrong so be it.


  • This just came in as a guest:

    194.90.190.48
    omgilibot/0.3 +http://www.omgili.com/Crawler.html


  • Added spider Panscient, thanks.

    In my forums it shows as Google Spider. Are you sure the user name isn't on the field "activity"? As in, searching for posts of user XX?

    This was a simple cut and past from the Whos Online list.


  • Could someone please provide me with info on how to gather the data needed to submit a spider? I am using the most up to date xml file and when I view guests on my forum, I see a list of ip addresses. One I click on some of them, the resolve to names with the word "spider" in them.

    With that said, I'm assuming they are not listed within the XML file I am using and they are a bot.

    How do I gather enough info a spider to submit it?



    Thanks guys


  • I forgot about that on the Who's Online page. I never use it. ;)


  • You are welcome Alfa1 :)


  • No you just upload it to includes/xml/.

    Just don't forget to not overwrite it when you upgrade your vb.


  • You told me to submit the IDENT. That is what I did. All I did to find out it was a legit spider was to do a Google search for T312461. Aren't you even doing that?

    I don't like your work because you don't bother to check them out. Or if you are, then you need to find another source. You whined because no one was submitting spider info. I start submitting them and you reject them. What sense does that make? Just because you are doing a spider list, doesn't make you an expert or any better than the rest of us. Come down off of your high horse and join the rest of the world.


  • Just submitted this one

    Charlotte
    Mozilla/5.0 (compatible; Charlotte/1.0b; http://www.searchme.com/support/) (2549df004ae664faef17dce174913cea

    http://www.searchme.com/support/pages/spider.php


    info@searchme.com
    This was submitted a few posts back.


  • Ok, but that ident has no unique string from Acconna. vBulletin won't be able to detect it with that.


  • Another one.

    EnaBot/1.2 (http://www.enaball.com/crawler.html)


  • Another one:


    Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; Google Wireless Transcoder;)


  • Just added three:
    Windows RSS Windows-RSS-Platform/1.0
    Windows RSS Windows-RSS-Platform/2.0
    Mozilla/4.0 (vBSEO; http://www.vbseo.com)

    I've taken the windows platform off of the Windows RSS ident string as that would unnecessarily bloat the number of idents needed and I'm not sure anyone would care whether it's XP or Vista that the RSS requests are coming from.


  • Just to let you know why I like this project so much:
    I show adsense adverts to guests only.
    I have the hack that shows spiders separately on the home page.
    By having the spiders accurately identified, I can get a very quick glance from the home page of the size of the audience viewing adverts at that moment.

    I'm thinking of creating a vBulletin hack to add a scheduled task to fetch the spider list weekly to ensure it's never too far out of date. Not daily as it doesn't change that much and the traffic to your server might be excessive if it ended up a popular hack.
    I would think Google has a list of spiders of their own, and don't show adverts to those spiders on googlesyndication.com, but I'm not sure.


  • If people expect I code and host the system AND add 200 spiders the project dies here.


  • Not that I know of, sorry. Stadler's XML had IP entries, but I'm not sure if vBulletin can detect spiders by IP.

    To be honest, I'm curious now and I'm gonna ask it in the support forums. If vBulletin does detect spiders by IP, I'll add spiders IPs to the system.


  • I think it's very cool that you did this, Dream. Don't forget that it is the weekend and people may be out and about at the time.


  • ok removed mrsputnik and added google mobile spiderThank you.

    Another user agent to be removed (IMHO) is Google Wireless Transcoder. In fact it's not a crawler, but a service (http://google.com/gwt/n). So, even if the connection is coming from Google's IP, there's a user browsing forum thru this service.


  • Sorry this goes beyond my interest in the problem, but if you get someone to do it I'll add IPs to the system.

    No problem, I'll take care of it on my own.

    Here are some regular bots on my boards:-


    Did you already add these to the spiders list then?


  • Thanks, added.


  • Ok I added it, wasn't it on there already though? It was there with a different ident, it wasn't being picked up?


  • I added some now, hope I filled out everything correctly.
    I think you should add some on your own aswell...
    I approved the 4 you sent

    I'm adding the ones I find in my forum


  • Thanks, yes maybe you are right :)

    I use the old spiders list (http://www.vbulletin.com/forum/showpost.php?p=565415) on my forums, and I'm adding them to the new list as I see them appear on my forums, just so you know. Please only submit spiders from the old list if you know they still exist.


  • Add this to the top of your robots.txt file:

    I rather just checkmark a spider instead of the tedious task of typing it out.

    I do know about using that method and .htaccess, just rather have it shown in admincp and choose which spider can access what area and prevent any from all site.


    First. is that fairly accurate? And second, wanna share the xml file for it? ;)

    Some are dead links and so goes for what I have posted...


  • This accoona-a133.client.pins.net is showing up as a Guest. It was added in your last update as Accoona which I am using. It is not showing as a Spider. The IP address is 209.212.73.133. Maybe taking that extra code as you call it out of the old spiders xml might not have been such a good idea? I don't have my setup to resolve IPs addresses.


  • Added spider Internet for learning.


  • Thanks, that bot was in the list, I just updated the Ident.


  • Sorry I was really busy this week.

    I updated the list, but deleted the toolbars and plugins as they aren't spiders.

    Boofo, the accoona user agents you are submitting only contain common stuff that any user agent can have. For vbulletin to be able to detect a spider the Ident must have just the string that is unique to the spider. Like this:

    Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; Deepnet Explorer 1.5.0; .NET CLR 1.0.3705)

    The detection ident is

    Deepnet Explorer

    Sorry, but the Accoona one I gave you the INDENT string for straight from the Who's Online. Maybe that's how it slips by the robots.txt file.


  • thank you dream for the latest lists. :)


  • Thanks zappsan :)

    I think "woriobot" catches both "woriobot heritrix" and "woriobot", so I removed the heritrix one. I'm not 100% sure of that though, so if you see it again please let me know.


  • http://dream.epicfailed.us/ ?
    What do you mean?


  • Well, all I know is I get 4 to 5 guests at a time and I know they are spiders as they don't do anything. What spiders they are, I don't know. As far as any that don't exist any more in the lisitng, that won't hurt leaving them in there for now. But someone needs to update the list, even if only a few at a time.


  • yep


  • Here's another one:

    93.103.33.238 (http://www.fathers-rights-forums.com/forums/online.php?do=resolveip&ipaddress=93.103.33.238)
    Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; MRA 4.6 (build 01425); MRSPUTNIK 1, 5, 0, 19

    And here is a link talking about what it is:

    http://www.webhostingtalk.com/showthread.php?t=660662


  • You ought to learn to read closer. The last spider I submited was not the FunWebProducts that you are referring to. That issue is done. The spider I posted, I posted the link for the proof.

    Ok, I apologise.... I must have mis-interpreted the conversation.

    Here are some regular bots on my boards:-

    85.225.137.240
    Mozilla/4.0 (BejiBot Crawler 1.2a)

    88.131.106.7
    Speedy Spider (http://www.entireweb.com/about/search_tech/speedy_spider/)

    82.80.252.110
    BoardTracker (http://www.boardtracker.com/spider.html) (Mozilla/4.0 compatible; MSIE 6.0; Linux Cent

    61.247.217.36
    Yeti/0.01 (nhn/1noon, yetibot@naver.com, check robots.txt daily and follow it)

    209.11.177.198
    Mozilla/4.0 (compatible; BOTW Spider; +http://botw.org)

    142.166.3.122
    R6_CommentReader(www.radian6.com/crawler)


  • http://www.cuill.com/twiceler/robot.html


  • I added some new ones which I came across today.
    I also resubmitted one, it had a different ident than the first time I've seen it (I've explained it in the notes field).


  • Added

    Begun Robot Crawler
    Mail.Ru


  • Not sure if this has been reported ....

    crawl2.nat.svl.searchme.com
    Mozilla/5.0 (compatible; Charlotte/1.0b; http://www.searchme.com/support/)


  • And added this one too:
    NetNewsWire/3.1b4 (Mac OS X; Lite; http://www.newsgator.com/Individuals/NetNewsWire/)


  • I quoted one already and there are many more on the net. Use the search like I did.You quoted a thread with link to the mail.ru Agent, nothing more. Also, like I previously explained MRSPUTNIK is a string insereted into the browser user-agent when mail.ru Agents software is installed (it's not a spyware, it's not a malware, it's not a virusn not a spamware). Just FYI: mail.ru is a russian TOP10 portal, getting about 15.000.000 unique visitors by month. And here is the home page of mail.ru Agent (http://agent.mail.ru/en/) (in english).

    MRSPUTNIK is spammers, no matter how you try to justify it. Are you one of themHmmm, some people is so brillant, nothing can escape them :rolleyes:


  • See my cut and past at the top of my post. Google Bot shows as user "Beerman" Was wondering why it lists as a user and not like the Yahoo Bot that is just viewing.


  • This is sweet. Thanks for the hard work, man.


  • this one doesn't look like a spider

    I wasn't really sure on that one but I thought it better to report it than take the chance it might be a spider.


  • Just added these two:
    Yodao Mozilla/5.0 (compatible; YodaoBot/1.0; http://www.yodao.com/help/webmaster/spider/; )
    Mac OS X RSS Apple-PubSub/59


  • Well I do, it keeps with the setting, :)

    Spiders appear anyways, so I figured I should put them to good use, :cool:
    I just remembered when I first started separating out the guests and spiders count on my forum and people were wondering what spiders were. I'm trying to imagine what they would think if I changed the spider names to "goblins" and "orcs". Sounds like some fun for October though.


  • the livebot spider seems to be including anonomouse proxies. i confirmed this by visiting my site through a webpage that offers an anonomouse browsing service and sure enough, it idintified me as a livebot spider. im also afraid that the ns.km31707.keymachine.de spider that was submitted a few posts back, is actually a spambot.


  • This is a good list, but again you would need to click though or do a internet search to make sure these spiders are still active. I would still like to be able to ban a spider, as some do just take your images.
    http://www.user-agents.org/

    Add this to the top of your robots.txt file:

    User-Agent: Googlebot-Image
    Disallow: /


  • nice job Dream :)

    i've a little question about this: yesterday i've upgraded the vB and unfortunately i forgot to delete the newer file from the vB package ... so today i can't detect correctly the spiders ... however i just uploaded your latest file, but i saw that it doesn't reconize some spiders like 'snapshots' that before your older files reconize ... so the question is: how much time it takes before that the system runs 'correctly'?


  • Another one:


    Deepnet Explorer

    Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; Deepnet Explorer 1.5.0; .NET CLR 1.0.3705)


  • Got another one that just hit the site along with 25 yahoo spiders.


    69.90.42.67 (http://www.fathers-rights-forums.com/forums/online.php?do=resolveip&ipaddress=69.90.42.67)
    Mozilla/5.0 (compatible; OWPBot/0.3; http://www.openwhitepages.com/)


  • thats arguable, people may want to see how many people using that service is there


  • A second Yeti bot was added.you are the spider/bot God :cool:


  • just added another msnbot for the verification ...


  • That last one boofo is just some dodgy IE plugin abit like mywebsearch or some other nasty. Definitly not a spider ;)

    I checked out their site and it looked like a spider to me. You may be right, I don't know. Better to be safe than sorry in reporting it. ;)

    Is anyone still updating the list?


  • Maybe, I don't know for sure.

    You are right:

    http://www.webmasterworld.com/forum11/2715.htm



    I removed Livebot from the list.oops ... nevermind about the last post. :D


  • I changed the Ident from http://www.omgili.com/Crawler.html to omgilibot, tell me if it still doesn't work.


  • Another one.


    Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; BCD2000; SV1; FunWebProducts)


  • added soso


  • Added GurujiBot.


  • Here's another one that showed up as a Guest:

    194.90.190.48 (http://www.fathers-rights-forums.com/forums/online.php?do=resolveip&ipaddress=194.90.190.48)
    omgilibot/0.3 +http://www.omgili.com/Crawler.html


  • Congratulations! Although there are many repeat ones, I'm sure. ;)


  • Did you bother to check the link and this? MRSPUTNIK

    That is no regular user.Yes, I did. If MRSPUTNIK is included in the spiders list we will get wrong results, because it's not really a bot, but a regular user just browsing the forum with mail.ru Agent (and some other products from mail.ru) installed.


  • The spider I was referring to in the post you quoted was just added by Dream. ;)

    I really dont care if Dream added it or not.... it's not a spider, period.

    If he choses to add it, or not add it - despite the glaring evidence it is NOT a spider, then thats totally up to him.

    A massive part of my job involves checking log files and I stand to be corrected only if you can post me url's proving this to be a spider.

    As far as I am concerned, I've proved it isnt - if you wish to call black white, then please post the proof.


  • Sorry I didn't notice that was the IP.

    There's one Acconna in the xml, but I never said it was duplicated, did I? Anyway, when you see it as a guest in your forums again, paste the User Agent for me ok? So I can fix it.

    edit: oh yes, someone else submitted it for you actually.

    And nobody said you said any such thing. What is that all about?

    Here is another one for you, but this time only the IP showed and it was showing as a Guest. It is explained in the quote box.


    omgilibot

    http://www.omgili.com/Crawler.html

    Here is the IP that shows:
    194.90.190.48

    I got the Ident info from do a var_dump for something else. The IP is all that showed for this and it was showing as a guest. The IP did NOT resolve to anything other than itself. The ident came from the var_dump.


    In these next ones, the first one shows up fine. The next three show up as guests. And your accoona-a133.client.pins.net is still showing up as a guest.


    livebot-65-55-209-98.search.live.com - MSNBot Spider
    livebot-65-55-165-117.search.live.com - http://search.msn.com/msnbot.htm
    livebot-65-55-165-52.search.live.com - http://search.msn.com/msnbot.htm
    livebot-65-55-165-42.search.live.com - http://search.msn.com/msnbot.htm


    I think I'm going back to Stadler's version as it caught a lot more spiders than this version does. I'll just add them to that as I find them. Good luck!


  • Thanks, added.


  • Added SiteVibeBot

    Can anyone confirm this one gets detected? I'm trying to understand vB's regex for the IDENT string (because if I ask in the How To forum no one will know).


  • So, I had 12 other regular people using mail.ru that same day on at the same time on my little 65 member site? Not likely.


  • This is what I submitted and you said it had already been submitted. It is in the current xml file.

    accoona-a133.client.pins.net

    but it shows up as a guest, not a spider.


  • Dream could offer an spider.xml file for the FAQ for admins to import. And instructions on how to change a phrase to link from who's online to the faq explanation. Might be cool for some sites. (just brainstorming)


  • thanks


  • It has CAPTCHA! I'm so proud of myself :P


  • Thanks buro9, your submissions were added and are greatly appreciated.


  • Yeah I agree with you, it's most likely a spider, depending on the occurrence and URL it's visiting.

    There's not much I can do but to whine for them to add IP recognition in the suggestions forum though.

    There has to be a way around this even if it means us coming up with a hack to do it. Are you game?


  • Submitted dragonfly

    ebingbong#playstarmusic.com (though they say the # is an @?)
    http://www.ebingbong.com/help/ourRobot.php


  • Yes I use that too. Were you using the latest XML?

    Downloaded again last night to double check and the searchme spider still shows as a regular guest. :) Also upgraded to the latest release of product to make sure that it wasn't at fault, but made no difference.

    This of course has not too much to do with this thread and I would like to say how much I appreciate you keeping the spider list updated. :)


  • Added spider Panscient, thanks.

    Google Spider
    Searching Forums
    User: Beerman1 (http://www.monstermayhem.org/forums/member.php?u=79)

    Yahoo! Slurp Spider
    Viewing Who Posted
    sweet find (http://www.monstermayhem.org/forums/showthread.php?t=4065)



    Why do Google spiders show up with a physical user of the site? The others only show the spider and what thread it is picking up which is what I assume is the way it is supposed to work.

    See my cut and past at the top of my post. Google Bot shows as user "Beerman" Was wondering why it lists as a user and not like the Yahoo Bot that is just viewing.

    In my forums it shows as Google Spider. Are you sure the user name isn't on the field "activity"? As in, searching for posts of user XX?


  • Trust me, it's a spider. Check out my site URL and tell me what you think then.

    http://www.fathers-rights-forums.com/forums/

    Then you and I will form a plan to tackle this dilemma. ;)


  • Another one.

    That last one boofo is just some dodgy IE plugin abit like mywebsearch or some other nasty. Definitly not a spider ;)


  • It should detect as soon as you upload the new file.


  • Cool. Sorry, IM as in instant messenger feeds?


  • No problem. As I said, I need the User Agent of the spider, not the IP. This is a sample user agent:

    Mozilla/5.0 (Windows; U; Windows NT 5.1; pt-BR; rv:1.8.1.12) Gecko/20080201 Firefox/2.0.0.12

    I don't know how the old xml can get more spiders, as this one has all spiders the old one has. But it's your call and you may be right, if you find out why please tell me.

    Also, the Omgili spider was added and the file updated.


  • Do you want all spiders or just the good ones on that list of yours.


  • the second :)


  • Great idea Dream, I'll be using this for other projects rather than just vB :)


  • Actually I meant no one seems to care about this so far. I thought the first half-hour after I posted this would go:

    - dream you rock!
    - YAY!!!

    But it's fine, I only lost one night's sleep over this, so no biggie.

    You may add:
    - the type of the crawler.
    - how many spiders your file does contain.
    - when last updated.

    Regards

    I added both but the type of the crawler, not sure what you mean.

    If you have more suggestions let me know.


  • Here's another one for you. This one came in as a Guest.


    Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.12) Gecko/20080201 Dealio Toolbar 3.1 Firef


  • Well, according to what I have read, MRSPUTNIK is spammers. It's up to Dream, it is his list.


  • Yeah I agree with you, it's most likely a spider, depending on the occurrence and URL it's visiting.

    There's not much I can do but to whine for them to add IP recognition in the suggestions forum though.


  • That is why I am using the latest one he did. It may be old and outdated on a few Spiders but it does recognize a lot of them. That is better than nothing in my book. It isn't like the old days when everyone jumped in for a common purpose. No one said you had to add 200 spiders. All I said was give them an update, even if only with 10 new spiders, to wet their appetite. If you aren't willing to do that, then scrapping it is probably your best bet.


  • That's a pretty good idea. I've always been looking for an updated spiders list.

    I might not be the best person to create a FAQ about spiders, I'm just the guy who coded the system. I have to confess I don't know exactly what the "ident" is, if anyone would be so kind to explain.

    I'm not sure about the ident either. What would I have to put in there? It says user-agent, should I just put the info here which is displayed when I choose to display the user agent?

    I am willing to help if I'm sure about what I should put into the fields.


  • That last spider I submitted IS a spider. I checked out their web site. No sense in submitting them if they are going to get ignored.

    Boofo, I beg to differ greatly. This is the scumware that is mywebsearch IE toolbars and their many variants. Why else would they identify in many different IE versions?

    I would love to see the source for your information...

    Here are my sources:-
    http://www.webmasterworld.com/forum39/1510.htm
    http://www.seroundtable.com/archives/001430.html

    There are even lots of pages across the net detailing how to get rid of said 'scumware'. Here is just one of them:-
    http://www.liamdelahunty.com/tips/fun_web_products.php


  • Well, according to what I have read, MRSPUTNIK is spammers. It's up to Dream, it is his list.Well, please quote your sources. Anyway, it's possible that guys running browsers with this user-agent string spam some forums, but it's true for any user-agents strings.


  • Another: MLBot (www.metadatalabs.com)


  • Heya,

    I made a system where people can submit spiders and download updated spiders_vbulletin.xml files for their forum. After you submit a spider I must approve it for it to be included in the list.

    Hope this helps everyone.

    http://spiderlist.codeforgers.com

    thank you great work mate :D


  • You may add:
    - the type of the crawler.
    - how many spiders your file does contain.
    - when last updated.

    Regards


  • The spider I was referring to in the post you quoted was just added by Dream. ;)


  • That was quick. And you're welcome. ;)


  • So, I had 12 other regular people using mail.ru that same day on at the same time on my little 65 member site? Not likely.Boofo, I don't know why there was 12 regular people browsing your little site, but I know for sure that this useragent string belongs to regular users. If you bother to check user-agents.org for MRA and MRSPUTNIK now, you will see that both was dropped from the listing.

    Update: user-agents.org still report MRA, but as regular browser


  • I added some now, hope I filled out everything correctly.
    I think you should add some on your own aswell...


  • Someone submitted yours I think, I thought it was you honestly.

    You just look at the spider list and see if the spider is already there. If not you can submit it.

    The list is updated whenever there are new spiders to approve.


  • is mail.ru a spam site or it's "ok"?


  • It's hard to believe that the spiders only like my site. ;)


  • Yes I can't find a good spider list even in Google.


  • Excellent, that was exactly what I was looking for.

    By the way. It is recommended to install this mod:

    http://www.vbulletin.org/forum/showthread.php?t=152321

    To remove spiders from the "Currently Active Users" list.

    Cheers,
    Gabriel.


  • I'm hoping for all existing ones.


  • That last spider I submitted IS a spider. I checked out their web site. No sense in submitting them if they are going to get ignored.


  • Ok, I added it exactly how you submitted it.


  • It should detect as soon as you upload the new file.

    Dream, the snapshots spiders aren't detected ... (48 hours are gone) i know this cause i perfectly know its IP ... :)


  • http://www.comicguide.net/images/smilies/lolrot.gif

    Well I do, it keeps with the setting, :)

    Spiders appear anyways, so I figured I should put them to good use, :cool:





  • Nortel Unveils Vision, Strategy for Israeli High-Performance Net
    Busy Friday Leads to Strong Close for Net Stocks

    PRINT Add to favorites
    #If you have any other info about this subject , Please add it free.#
    Your name:
    E-mail:
    Telphone:

    Your comments:


    If you have any other info about vBulletin Spiders Directory (for updated spiders_vbulletin.xml files) , Please add it free.
  • server code for paging
  • grid date
  • help how to put phpfile in ext window
  • replacing records in a grid
  • ext pagingtoolbar ie error
  • solved submitting form to classic asp wiht json
  • loading panel or viewport in tab via autoload
  • dataview vs gridpanel
  • design same component with different config
  • solved data not read into a store using jsonreader
  • howto loading webpage in tabpanel
  • dynamic form and pluggable sub panels
  • treepanel dropping on a leaf
  •  
  • help layout setting
  • help data store from memoryproxy json object and got a empty
  • editorgridpanel and validating by php
  • extjs com examples won t load
  • alignment text in grid
  • question on xtypes lazy rendering
  • modal overlay covers modal window too
  • how to better load panel content
  • dom size limit
  • dnd problem in organizer example dragselector plugin
  • in a field there is a way so that i can take the form him
  • fields not visible until resize
  • combobox getvalue method returns displayfield
  • how to call a function inside ext onready
  • About us |Contact us |Advertisement |Site map |Exchange links
    Copyright© 2008gigj.com All Rights Reserved