Webmaster Console
Gain better control over data collection from your website
What is the Bright Data Webmaster Console?
The Webmasters can configure a collectors.txt file to inform Bright Data about information important to data collectors, like the presence of personal information and more.
Webmasters can configure a collectors.txt file to inform Bright Data about interactive endpoints present on their websites.
The Webmaster Console offers a practical and informative solution for managing Bright Data traffic on your website.
- User-friendly control panel
- Round-trip time (RTT) statistics for website health tracking
What is a collectors.txt?
BrightBot enforces the robots.txt guidelines; however, it’s important to note that robots.txt was initially designed to guide search engine crawlers, not public web data collectors. There is a wealth of additional information that responsible data collectors should be aware of to ensure proper and respectful data interaction with your website.
Key considerations include the presence of personal information, which should be handled in compliance with applicable privacy laws. Furthermore, many public endpoints on your website may have limited resources. By communicating these limits, you can help in preventing unintentional overloading of various resources.
Bright Data will review collectors.txt information prior to implementation, with authentication tokens from partner cybersecurity companies as an exception. The decision whether to accept certain webpages with their collectors.txt is at Bright Data’s discretion, and Bright Data is not obligated to accept any requests, nor will Bright Data be liable for any consequences arising from unapproved requests.
- Enhance transparency by monitoring how Bright Data interacts with your website.
- Utilize a collectors.txt file to fine-tune access to specific sections of your website.
Webmasters can facilitate a more efficient approach for BrightBot operated by Bright Data to access their website by providing access guidelines within a collectors.txt file via the Webmaster Console. This file may contain the following information:
Inputs | Description | Format |
---|---|---|
Personal Information | Endpoints containing information which are related to an identified or identifiable natural person. | URL / Document Object |
Disallow | List interactive endpoint patterns such as ad links, likes, reviews, and posts. This instruction enables BrightBot to block these endpoints, aligning with our guidelines that prohibit data collection from these areas. | URL / Document Object |
Load | Report your organic traffic load on specific domains or subdomains and on specific time frames. Bright Bot will use this information instead of public load statistics when deciding how it should rate limit itself. | URL / Document Object Rate Time-frame |
Traffic peak time | Define time slots of peak organic traffic, reducing data collection during these times. | URL / Document Object Date|Weekday|Any Start time / End time |
How it works
1
Create a Webmaster Console
2
Authenticate your websites
3
Build a collectors.txt for each site
What is Brightbot?
Examples & Format
Collector.txt
Ignore: robots. Txt
pii: /personal_info_1
pii: /personal_info_2
// endpoints containing information which are related to an identified or identifiable natural person.
Disallow: /disallow_1
disallow: /disallow_2
// list interactive endpoint patterns such as ad links, likes, reviews, and posts.
Load: example. Com:100: min
load: /endpoint_1:4500: day
load: /endpoint_2:20000: month
// organic traffic per domain or sub-domain per timerframe as reported by
the webmaster, to be considered by bright bot
console.log( 'Code is Poetry' );
console.log( 'Code is Poetry' );
console.log( 'Code is Poetry' );
console.log( 'Code is Poetry' );