TABLE OF CONTENTS
1. MISSION STATEMENT
2. WHY WE ARE DOING THIS
3. OUR VISION
4. DEFINITIONS
5. DISCUSSION (Q&A)
6. IMPACT
7. UPCOMING TOPICS
8. LIST OF COMMITTEE MEMBERS
9. GLOSSARY
1. MISSION
To establish voluntary guidelines for the measurement of “comparable” data as it relates to online advertising. These voluntary guidelines include but are not limited to data necessary for the reporting of purchased media and for the selling of advertising opportunities as they represent themselves on the Web.
Back to Table of Contents
2. WHY WE ARE DOING THIS
All measurements are not alike. Until now, there have been different definitions for the same terms, a lack of comparability, completely unique systems that do not allow for scaleable auditing, mistrust and growth.
Back to Table of Contents
3. VISION STATEMENT
Our goal is to provide a common set of metrics for the measurement of advertising on the Internet. Widespread adoption of these metrics and the resulting comparability across Web sites will make advertising on the Web easier and more meaningful for both advertisers and publishers. It is important to note that for true comparability to exist, we need to define both the concepts and the metrics themselves as well as the methodology sites should use to generate those metrics. Third parties must use these same definitions when verifying site statistics or their results will not be comparable across Web sites.
Back to Table of Contents
4. DEFINITIONS (CONCEPT & METRIC)
Ad Request
Concept
“An opportunity to deliver an advertising element to a Web site visitor.”
Advertisements can appear in numerous forms. The most popular form is the “banner” at the top/bottom of a page. Many sites also offer other forms of advertising such as buttons or vertical opportunities. In addition, recent advances in technology, such as the development of Java applets, offer a visitor the opportunity to participate with or experience the advertisement. Others, like those that employ push technology, use “passive technology” to display advertisements regardless of visitor interaction. An ad request, on the other hand, is the measure of “active technology” which requires the user to interact with the site before a new advertisement will appear.
Metric
“The request of an advertising element as a direct result of a visitor’s action, as recorded by the advertisement server software.”
This metric is independent of content and what is actually being displayed to the visitor. An ad request does not guarantee that a visitor actually viewed an ad, it only measures the opportunity for an ad to have been delivered to the visitor. This means that an ad request will be considered valid regardless of the visitor’s ability to view graphics, and whether or not the HTML document containing the ad loads to completion. In practice, an ad request will be recorded when a Web server or Ad server engages in the technical process of an advertisement insertion.
Click
Concept
“When a visitor interacts with an advertisement.”
The mainstay of the “visitor interaction” category has been the “click,” where the visitor clicks on an advertisement and is sent to an advertiser’s site or internal buffer page. An extension to the click paradigm is the “download,” where the click initiates a download of software instead of a transfer to another location on the Web. This second area is likely to grow fast with new technologies like Java, Shockwave, ActiveX and Enliven as these technologies will provide additional ways for the visitor to interact with an advertising element.
Metric
“The opportunity for a visitor to be transferred to a location by clicking on an advertisement, as recorded by the server.”
A click does not guarantee that a visitor actually arrives at the requested target URL, it only measures the opportunity for the visitor to be transferred to the target URL. This means that a click will be considered valid even if the visitor hits “stop” or otherwise aborts before arriving at the target URL. The click will also be valid if the target URL is busy or not available. In practice, a click will typically be recorded when a Web server or Ad server executes a program designed to redirect the visitor to a target URL.
Click Rate
Concept
“A percentage of response.”
Metric
“Clicks divided by ad requests.”
Page Request
Concept
“An opportunity for an HTML document to be displayed within a browser window, which may contain text, images, media objects (i.e. Java, Shockwave, Real Audio) or other online elements.”
“Page” is used to represent the visitor’s view of a Web site through the browser window.
Metric (Non-framed page)
“The opportunity for an HTML document to appear in a browser window as a direct result of a visitor’s interaction with a Web site.”
A page request does not guarantee that a visitor actually viewed a page, it only measures the opportunity for an HTML document to have been delivered to the visitor. This means that a page request will be valid even if the HTML document does not load to completion
Metric (Framed page)
“A single page request (see non-framed) will be recorded when an HTML document is requested that will replace the entire window or a portion of the window that was present at the time of the request.”
If the request for an HTML document causes a series of frames to be loaded that make up one virtual “page”, it is to be counted as one. Additional HTML documents that are targeted to be displayed within the present window (e.g. the visitor clicks on a hypertext link) will be recorded as additional page requests. Today’s employed methods to differentiate frames into categories of “frames that make up a single page request” and “frames that generate a new page request” are limited and rely on some customization. The goal here is to be comparable with the non-frame metric and thus fit the mold of a one click, one page paradigm.
*This is an area in which we intend to make recommendations for technology changes to make the counting of pages on a framed site more intuitive. Recommendations will be made to the formal Technology Committee within the IAB.
Visitor Characteristics: Audience and Behavior
A. Audience
Concepts
“A breakdown of Audience make up as defined by one or all of the following measures.”
- Browser: The type of Web browser being used to request pages from the Web server. In addition to the software vendor’s name, a version or release number is typically available as well. Unfortunately there are no standards for browser strings, making this a very dynamic and often difficult list to maintain.
- Platform: Some Web browsers include information about the type of computer system (i.e. MAC, PC or UNIX) the visitor employs with each request. In addition, specialty browsers such as WebTV and SEGA may be more accurately called “platforms” rather than “operating systems” (i.e. Windows or MAC OS). They can be identified by information provided along with the URL request.
- Domain: (1st and 2nd level) Every browser on the Internet has an associated IP address which uniquely identifies it to the rest of the network. In most cases, there are mnemonic names associated with these IP addresses called domain names. The “1st and 2nd level domain” names refer to the most general, and second most general elements of the domain name (1st level: .com, .mil, .edu — and 2nd level: prodigy, ibm, netcom).
- Referral Link: The referring page, or referral link, is a place from which the visitor clicked to get to the current page. In other words, since a hyperlink connects one URL to another, in clicking on a link the browser moves from the referring URL to the destination URL.
- National vs. International: Traffic summarized by the physical location of the visitor or ISP, aggregated by country. National shall be the country of origin and International will be all other. The country of origin shall always be noted.
- Regionality: A measure of server requests aggregated by the visitor’s or ISP’s location. Much like “National vs. International” only with narrower classifications.
- Unresolved IP addresses: Those IP addresses that do not identify their 1st or 2nd level domain. Unresolved IP addresses shall be aggregated and reported as such and should not classified in any other category other than their own.
Technical Subcommittee comments:
The easiest information to gather is the description of the browser types, operating system, and where the visitor came from or “Referral Link.” In some cases, the type of system or “platform” may be available as well. It is worth noting however, that neither the amount and kind of descriptive information, nor the format of the information has been standardized. This can lead to difficulty in extracting browser type, operating system, or platform data. However, the other audience description goals, “Domain (1st and 2nd level domain)” and “National vs. International” can currently be measured with existing technology.
The IP address associated with a URL request can be converted to a domain name, but there are some technical problems with doing so. The process of finding a domain name from an IP address, called a DNS lookup, can cause up to 30 second delays in service. Sites concerned about the performance of their Web servers almost certainly will configure their servers not to perform DNS lookups for each browser request. High traffic sites that need to make use of domain names in delivering Web content or ad creative may build private databases to avoid DNS lookup delays. Sites that only need domain names after the fact don’t face the same operational problems, but will need to expend resources to gather the domain names while analyzing traffic data for reporting purposes.
The “1st level” of a domain name identifies the country of record for that domain. These codes can be used to aggregate traffic data by country or even to collapse data to National vs. International requests. But the country indicated may not accurately reflect the location of the visitor, so these numbers should be used cautiously. The country information is largely of bureaucratic significance. Its purpose is to define which country has jurisdiction of the registration, not to indicate the physical location of the visitor. The currently available technology does not allow servers to gather actual locations of visitors browsing a site.
The most difficult goal listed is traffic reporting by geographic region. While some sites are already providing regional services, that task is much easier than reporting site usage by the visitor’s geographic area. As with the “National vs. International” goal, the problem is that it is currently impossible to tell where the user is located from the request submitted to the Web server. And in the case of arbitrary geographic regions, there are currently no indicators that can be used to approximate an answer.
B. Behavior
Concepts
“A breakdown of Audience behavior is defined by one or all of the following.”
a. Visitor
b. Visit
c. Return Visits
d. Time
e. Average Time
Metric
a. Visitor
“As identified by one of the following methods.”
- Unique Registration
- Unique Cookie
- Unique URL Tagging
- Unique IP Address w/ heuristic
Unique Registration: Where unique individuals who visit a site identify themselves. This requires the user to take some action, usually completing a survey on the first visit, and then entering a password on subsequent visits. Sites that register visits should have no problem determining the page requests that belong to the same visitor.
Unique Cookie: Where a web server stores a small piece of information with a browser which uniquely identifies that browser. While cookies only identify unique computers – as opposed to individuals – the inactivity constrain on the calculation of visits, i.e. 30 minutes, should make it relatively safe to use cookies to determine the page requests associated with one. One caveat is caching: reportedly, some online services are caching the cookies, thus requesting pages for multiple visitors. Another occurs when you count visits just by cookies, you will end up with a batch of pages for visitors without cookies. You must use one of the other methods to estimate.the numbers of visitors that created this batch of requests.
Unique URL Tagging: The process of embedding Unique Identifiers into URLs contained in HTML content. These identifiers are identified by web servers on subsequent browser requests. Identifying visitors through information in the URLs should also allow for an acceptable calculation of visits, if caching is avoided.
Unique IP Addresses: A collection of HTTP requests from an IP address grouped together to form a visit. The process of grouping requests to form visits from IP addresses associated with a visitor yields information that guides the grouping of requests to form visits from IP addresses associated with multiple users (e.g. proxies). Visits shall NOT be calculated by assuming that all page requests from one IP were shown to one individual, unless such IP has been identified as not serving more than one visitor, i.e., not being a gateway or proxy machine. If this methodology is employed, it should be explained by the site.
b. Visit
“A series of page requests by a visitor without 30 consecutive minutes of inactivity.”
Given the current stateless nature of the Web, a “visit” is an intrinsically arbitrary definition.
*Technical consideration: It is a non-trivial matter to determine whether several page requests were performed by the same individual or not. The methods for determining a visitor are also those by which a visit would be qualified as unique. See Registration, Cookies, URL Tagging and IP Addresses.
c. Return Visits
“The average numbers of times a visitor returns to a site over a period of time.”
Relies on having a registration method in place.
d. Time
“The elapsed time from the first to the last page request that constitutes a visit, and adding the average time per page for such visit.”
e. Average Time Per Page Request
“The elapsed time from the first to the last page request that constitutes a visit, divided by the number of page requests in that visit, minus one.”
5. DISCUSSION (Q & A)
Q.
Advertisers want to know if their image appeared, not if an opportunity existed to deliver it. Why does it make sense to measure from the request vs. the delivery?
A.
We are very aware that advertisers want to know the number of actual views of their advertisement. However, it is impossible to measure completely and comparably the number of actual images that appeared before a viewing audience. With other mediums, i.e. print, you send out the “book” with the ads and you pay on a CPM for the potential viewership. Although the Web could never be that unmeasurable, it is an example that planners are used to buying on and that Web costs are compared against.
What we know for a fact is that a request for a page was made and the items (images, technology, etc.) MAY have been delivered to the visitor by a number of different locations on the Web. Let us explain in more detail:
- The requested elements can be fulfilled from a number of locations
- An HTML page is delivered with many imbedded request for images,technology, etc.
- A visitor makes a request
- the visitor’s local cache,
- the ISP’s proxy server,
- the publisher’s Web site,
- or not at all — images turned off, connection terminated,etc.
- The visitor can interact with any or none of the elements
The reason we propose counting at the “request” level instead of at the delivery level is that only a percentage of the images are “logged” or recorded by the publisher. In the steps above, you can see many different places along the process where a visitor can get the requested elements and only one of them is from the publisher. This also happens to be the order in which a visitor can receive the elements.
The visitor receives the page and then the browser to the visitor’s local cache first. If the image is there, it goes no further and does NOT record back to the publisher, even though the image appeared before the visitor. If the image is not in the local cache, it will look to the proxy server. If it finds the image there, it will look no further and NOT report back to the publisher, even though the image appeared before the visitor. However, if the image is not found in the visitor’s cache or in a proxy location, then the image will be served from the publisher and “logged” or recorded by the publisher.
As you can see the publisher has no way of counting how many images actually appeared before a visitor. All they can report is how many images came back to them, NOT how many images appeared before a visitor. The many locations from which you can receive an image offer another reason why measuring images ARE NOT comparable.
Q.
Why are images not a comparable measure?
A.
Images are not a comparable measure because of the impact `environmental’ Web factors have on the request. Environmental factors such as proxy servers, visitor caches, visitor settings (images on or off), the route on the Internet, dropped packets, etc. impact the recording of image data. Not only do these factors impact the recording, they have a different level of impact across each publisher, visitor, site, page, etc. We are unable at this time to measure the impact of each factor because they are not static and lie in the control of visitors (local cache size & image settings), the ISPs & online services (proxy servers), bandwidth (patience or hitting the stop button). The slightest change in audience across a single publisher’s variety of Web sites can change the impact level of any of the above elements on any given day. For this reason, we cannot use images as a comparable measure across Web sites or sections of a single site.
Q.
Why did you choose the term “page requests” rather than “page impressions?”
A.
Our objective was to use a term which accurately reflects what is being measured at this time. In addition, we did not want to foster confusion by redefining existing terms.
The term “page requests” is reflective of that what is currently being measured, a request by a browser to a server for a file which is defined to be a page. The term “page impressions” is currently used by different people with different meanings. As technology continues to develop more and more accurate methods of measuring, actual page impressions are likely to be developed. As this occurs, the term page impression will continue to evolve. Instead of constantly redefining “page impressions” to reflect new technology, as new technology becomes available and is adopted, the IAB will define new measures to reflect the capabilities of that technology.
We therefore envisage that for the time being when asked “how do you define page impressions?” a publisher will reply “according to the IAB’s definition of page requests.” Next year the answer may be “according to the IAB’s definition of pages viewed” and the year after it may be “according to the IAB’s definition of page interactions.”
The term “page requests” was chosen because it is reflective of the technology in use today and leaves room for the definition of “page impressions” to evolve with technology without having to be constantly “officially redefined.”
Q.
How will framed vs. non framed pages be impacted by these voluntary guidelines?
A.
Non-framed sites should not be affected in terms of page counting, as their current methods should largely suffice. Framed sites have more issues that depend largely on the current design. In the best case, a simple change in the counting algorithm is all that will be necessary. In the worst case, we may be unable to count pages based on the IAB recommendations without some site redesign. Depending on the counting method used, it is possible that the framed sites page count numbers could be reduced.
Q.
How difficult will these definitions and measures be to implement into my existing system?
A.
For sites using commercial systems, this should not be a problem as most of the larger advertisement systems already count on the methods being proposed by the IAB. Actually we are unaware of any commercial advertising system that counts based on gifs/log files.
For sites using “home grown” systems, they need to check how they do their counting. In many cases there might be little or no work, but in some cases a new method for counting will have to be implemented. The advantage for those currently counting by using log files will be a great reduction in processing time and reduction in the complexity of generating the numbers (no more log file transfers, large disk spaces requirements, etc.).
For systems moving from counting images to the IAB voluntary guidelines, they will probably see an increase in ad impressions when they convert. The rate of increase will depend highly on what “caching defeating” mechanisms are currently in place.
6. IMPACT
Pros
By adopting the voluntary guidelines for measurement of online advertising as set forth herein, the Web advertising community will have comparable data by which to accurately plan and execute their online campaigns. They will also have the highest level of measurement against any other medium.
The Web publishers will have a reliable measure by which to deliver to their paying advertisers. They will be able to accurately compare themselves to other publishers. For the first time, publishers and advertisers will have a way to measure and be measured for their ad performance on a level playing field.
Auditors will now be able to measure ‘real’ performance without the underlying impact of unmeasurable `environmental’ Web factors may have on the data they are reporting.
The industry will now have an advertising medium that has meaningful measures for advertisers, publishers, auditors and researchers.
Cons
Advertisers have been given data at a certain level, images. They are now going to be given data at the request level. At first glance advertisers may perceive this method of measurement to be less accurate. However, as explained in this document, it is quite the opposite. Advertisers have been using what equates to sample data for their actions. Now they can get what they have been needing: complete, accurate and comparable data that they can rely on across a variety of Web sites.
Some publishers will have to make adjustments to their advertising systems. Most will not.
Some counting companies may have to adjust their methods, especially those which only report from logs.
7. UPCOMING TOPICS
- Browser Strings
- Accept Types (Plugins, Languages turned on)
- Offline Browsers
- Robots
- “Look Aheads”
- Digital Certificates
8. COMMITTEE MEMBERS
Celia De Benedetti, @Home Network
Chris Evans, Accipiter
Tom Dubois, Accrue Software
Philip Werner, Art Technology Group
Peter Black, BPA Interactive
Kate Everett-Thorp, CNET: The Computer Network
Paul Hart, CNET: The Computer Network
Tom Hyland, Coopers & Lybrand
Mark Esiri, Cyber Dialogue
Kevin Mabley, Cyber Dialogue
John Megahed, Discovery Communications
Elizabeth Hobby, Discovery Communications
Jim Jones, Discovery Communications
Teri Shaffer, Ernst & Young
Geoffrey Turner, Ernst & Young
Doug Weaver, Firefly Network, Inc.
Tim Reed, I/Pro
Ariel Polar, I/Pro
Susan Fry, LinkExchange
Bill Tonwsend, Lycos
Robert Kapela, Marketwave
Scott Chalfant, MatchLogic
Scott Moore, MSNBC
David Doyle, Nando.net
Paul Grand, Netcount
James Conaghan, Newspaper Association of America
Douglas McFarland, PC Meter
Susan Feigenbaum, Playboy Enterprises, Inc
Paul Lewis, Prodigy
Allen Goldberg, Relevant Knowledge
Patrick Brem, Starwave
Rich LeFurgy, Starwave
Stanley Wong, Yahoo
Alan Phillips, ZDNet
Scot Kerr, Ziff Davis Inc.
Jim Manning, Ziff Davis, Inc.
9. GLOSSARY
Ad Request
The request of an advertisement as a direct result of a visitor’s action, as recorded by the advertisement server software.
Average Time Per Page Request
The elapsed time from the first to the last page request that constitutes a visit, divided by the number of page requests in that visit, minus one.
Browser
A program that allows users to access documents on the World Wide Web (WWW). Browsers can be either text or graphic. They read HTML coded pages that reside on a server and interpret the coding into what we see as Web pages. Netscape Navigator and Microsoft Internet Explorer are examples of Web browsers.
Cache
Caches come in many types, but they all work in the same way: They store information where you can get to it fast. A Web browser cache stores the page’s HTML code as well as any graphics and multimedia elements embedded within it. That way, when you go back to the page, everything doesn’t have to be downloaded all over again. Since hard disk access is much faster than Internet access, this speeds things up. Hard disk access however is slower than RAM, which is why there is disk caching, which stores information you might need from your hard disk in faster RAM.
Click
The opportunity for a visitor to be transferred to a location by clicking on an advertisement, as recorded by the server.
Click Rate
Clicks divided by ad requests. (see also click and ad request)
Domain
The “address” or URL of a particular Web site. This is also how you describe the name that is at the right of the @ sign in an Internet address. “netlingo.com” is the domain name of an Internet dictionary. There is an organization called InterNIC that registers domain names for a small fee and keeps two people from registering the same name.
IP Address
Internet Protocol Address – The numeric address that is translated into a domain name by the Domain Name Server. (see also domain)
ISP
Internet Service Provider – A company that provides access to the Internet. Before you can connect to the Internet you must first establish an account with an Internet Service Provider.
HTML
Hypertext Markup Language – The coding method used to format documents for the World Wide Web.
Log
A file that keeps track of network connections.
National vs. International
Traffic summarized by the physical location of the visitor or ISP, aggregated by country. National shall be the country of origin and International will be all other. The country of origin shall always be noted.
Page Request
The opportunity for an HTML document to appear in a browser window as a direct result of a visitor’s interaction with a Web site.
Platform
The type of computer or operating system on which a software application runs. For example, some common platforms are PC, Macintosh, Unix, and NeXT.
Proxy Server
A technique used to cache information on a Web server and act as an intermediary between a Web client and that Web server. It basically holds the most commonly and recently used content from the World Wide Web for users in order to provide quicker access and to increase server security. This is common for an ISP especially if they have a slow link to the Internet.
Referral Link
The referring page, or referral link, is a place from which the visitor clicked to get to the current page. In other words, since a hyperlink connects one URL to another, in clicking on a link the browser moves from the referring URL to the destination URL.
Regionality
A measure of server requests aggregated by the visitor’s or ISP’s location. Much like “National vs. International” only with narrower classifications.
Return Visits
The average number of times a visitor returns to a site over a period of time.
Time
The elapsed time from the first to the last request that constitutes a visit, and adding the average time per page for such visit.
Unresolved IP addresses
Those IP addresses that do not identify their 1st or 2nd level domain. Unresolved IP addresses shall be aggregated and reported as such and not be placed in any other section other than their own. (see also domain)
Unique Registration
Where unique individuals who visit a site identify themselves. This requires the user to take some action, usually completing a survey on the first visit, and then entering a password on subsequent visits. Sites that register visits should have no problem determining the page requests that belong to the same visitor.
Unique Cookie
Where a web server stores a small piece of information with a browser which uniquely identifies that browser. While cookies only identify unique computers – as opposed to individuals – the inactivity constrain on the calculation of visits, i.e. 30 minutes, should make it relatively safe to use cookies to determine the page requests associated with one. One caveat is caching: reportedly, some online services are caching the cookies, thus requesting pages for multiple visitors. Another occurs when you count visits just by cookies, you will end up with a batch of pages for visitors without cookies. You must use one of the other methods to estimate the numbers of visitors that created this batch of requests.
Unique URL Tagging
The process of embedding Unique Identifiers into URLs contained in HTML content. These identifiers are identified by web servers on subsequent browser requests. Identifying visitors through information in the URLs should also allow for an acceptable calculation of visits, if caching is avoided.
Unique IP Addresses
A collection of HTTP requests from an IP address grouped together to form a visit. The process of grouping requests to form visits from IP addresses associated with a visitor yields information that guides the grouping of requests to form visits from IP addresses associated with multiple users (e.g. proxies). Visits shall NOT be calculated by assuming that all page requests from one IP were shown to one individual, unless such IP has been identified as not serving more than one visitor, i.e., not being a gateway or proxy machine. If such a technique is employed, its methodology should be explained by the site. (see also IP Address)
Visit
A series of page requests by a visitor without 30 consecutive minutes of inactivity.
Web site
A location on the Internet or World Wide Web. The term Web site refers to the all encompassing body of information as a whole, for a particular domain name. A location made up of Web pages.
Contact:
Marla Nitke IAB