On The Media

James Logan

Web users ‘getting more selfish’

May 27th, 2008 by Administrator

Web users are getting more ruthless and selfish when they go online, reveals research.

The annual report into web habits by usability guru Jakob Nielsen shows people are becoming much less patient when they go online.

Instead of dawdling on websites many users want simply to reach a site quickly, complete a task and leave.

Most ignore efforts to make them linger and are suspicious of promotions designed to hold their attention.

Search rules

Instead, many are “hot potato” driven and just want to get a specific task completed.

Success rates measuring whether people achieve what they set out to do online are now about 75%, said Dr Nielsen. In 1999 this figure stood at 60%.

There were two reasons for this, he said.

“The designs have become better but also users have become accustomed to that interactive environment,” Dr Nielsen told BBC News.

Now, when people go online they know what they want and how to do it, he said.

This makes them very resistant to highlighted promotions or other editorial choices that try to distract them.

“Web users have always been ruthless and now are even more so,” said Dr Nielsen.

“People want sites to get to the point, they have very little patience,” he said.

“I do not think sites appreciate that yet,” he added. “They still feel that their site is interesting and special and people will be happy about what they are throwing at them.”

Web users were also getting very frustrated with all the extras, such as widgets and applications, being added to sites to make them more friendly.

Such extras are only serving to make pages take longer to load, said Dr Nielsen.

There has also been a big change in the way that people get to the places where they can complete pressing tasks, he said.

In 2004, about 40% of people visited a homepage and then drilled down to where they wanted to go and 60% use a deep link that took them directly to a page or destination inside a site. In 2008, said Dr Nielsen, only 25% of people travel via a homepage. The rest search and get straight there.

“Basically search engines rule the web,” he said.

But, he added, this did not mean that the search engines were doing a perfect job.

“When you watch people search we often find that people fail and do not get the results they were looking for,” he said.

“In the long run anyone who wants to beat Google just has to make a better search,” said Dr Nielsen.

Story from BBC NEWS:
http://news.bbc.co.uk/go/pr/fr/-/1/hi/technology/7417496.stm

Filed under Search having No Comments »

Encouraging people to contribute knowledge

December 17th, 2007 by Administrator

From the GoogleBlog on the new “knol” knowledgebase idea.

12/13/2007 06:01:00 PM
Posted by Udi Manber, VP Engineering

The web contains an enormous amount of information, and Google has helped to make that information more easily accessible by providing pretty good search facilities. But not everything is written nor is everything well organized to make it easily discoverable. There are millions of people who possess useful knowledge that they would love to share, and there are billions of people who can benefit from it. We believe that many do not share that knowledge today simply because it is not easy enough to do that. The challenge posed to us by Larry, Sergey and Eric was to find a way to help people share their knowledge. This is our main goal.

Earlier this week, we started inviting a selected group of people to try a new, free tool that we are calling “knol”, which stands for a unit of knowledge. Our goal is to encourage people who know a particular subject to write an authoritative article about it. The tool is still in development and this is just the first phase of testing. For now, using it is by invitation only. But we wanted to share with everyone the basic premises and goals behind this project.

The key idea behind the knol project is to highlight authors. Books have authors’ names right on the cover, news articles have bylines, scientific articles always have authors — but somehow the web evolved without a strong standard to keep authors names highlighted. We believe that knowing who wrote what will significantly help users make better use of web content. At the heart, a knol is just a web page; we use the word “knol” as the name of the project and as an instance of an article interchangeably. It is well-organized, nicely presented, and has a distinct look and feel, but it is still just a web page. Google will provide easy-to-use tools for writing, editing, and so on, and it will provide free hosting of the content. Writers only need to write; we’ll do the rest.

A knol on a particular topic is meant to be the first thing someone who searches for this topic for the first time will want to read. The goal is for knols to cover all topics, from scientific concepts, to medical information, from geographical and historical, to entertainment, from product information, to how-to-fix-it instructions. Google will not serve as an editor in any way, and will not bless any content. All editorial responsibilities and control will rest with the authors. We hope that knols will include the opinions and points of view of the authors who will put their reputation on the line. Anyone will be free to write. For many topics, there will likely be competing knols on the same subject. Competition of ideas is a good thing.

Knols will include strong community tools. People will be able to submit comments, questions, edits, additional content, and so on. Anyone will be able to rate a knol or write a review of it. Knols will also include references and links to additional information. At the discretion of the author, a knol may include ads. If an author chooses to include ads, Google will provide the author with substantial revenue share from the proceeds of those ads.

Once testing is completed, participation in knols will be completely open, and we cannot expect that all of them will be of high quality. Our job in Search Quality will be to rank the knols appropriately when they appear in Google search results. We are quite experienced with ranking web pages, and we feel confident that we will be up to the challenge. We are very excited by the potential to substantially increase the dissemination of knowledge.

We do not want to build a walled garden of content; we want to disseminate it as widely as possible. Google will not ask for any exclusivity on any of this content and will make that content available to any other search engine.

As always, a picture is worth a thousands words, so an example of a knol is below (click on the image twice to see the page in full). The main content is real, and we encourage you to read it (you may sleep better afterwards!), but most of the meta-data — like reviews, ratings, and comments — are not real, because, of course, this has not been in the public eye as yet. Again, this is a preliminary version.

source: http://googleblog.blogspot.com/2007/12/encouraging-people-to-contribute.html

Technorati Front Page - aggregator replaces streaming blog

December 5th, 2007 by Administrator

Techcrunch reports on changes at Technorati:

The recently changed home page, just three months old, is gone. In place of the streaming blog posts is a news aggregator that, like TechMeme and the New York Times’ Blogrunner, use linking behavior on news sites to determine headline news.

Blogs and mainstream media are separated. Blog headlines are on the left; MSM is on the right. Below each headline is a cluster of blogs that have linked to and discussed the story.

The news aggregator complements Technorati’s core strength as a blog search engine, Carroll says. Sometimes users want to search. Other times, they want to discover and browse. The news aggregator helps them see what bloggers and journalists are talking about right now, all over the world.

The topics feature that Technorati launched in September (front page, business, entertainment, lifestyle, politics, sports, technology) is now highlighted directly via navigation tabs on the home page.

This is something Technorati experimented with in the past (see our 2005 coverage of Technorati Explore, which never made it out of the lab), but it never dedicated meaningful resources (or the home page) to finding news patterns in blog posts. Now, the company is dedicating those resources to making it work.

“Blogger Central” and “Today In Photos”

In addition to the Front Page news aggregator, Technorati is making two other big additions to the site.

The first is a resource page for bloggers called, fittingly, Blogger Central. It shows blog posts about blogging (clustered using the news aggregator engine) as well as popular blog tags at any given time. The page also has top blogs by links and popularity.

The second, is a new product called “Today In Photos”. Like AOL’s new Mgnet product, it shows popular news via the photos and images included in those news items. People like to see and click on images. This page will show them what’s hot, visually. Users can reach the page by clicking on the grouping of images on the bottom of every page.

source: http://www.techcrunch.com/2007/12/04/exclusive-technorati-relaunches-to-focus-on-core-blogging-audience/

The Telegraph - “show period”, pushing out content in thin silos, verticals and channels to come next

December 4th, 2007 by Administrator

Will Lewis, editor in chief of The Telegraph (www.telegraph.co.uk) whilst talking (to the MediaGuardian) about his newspaper’s digital integration considers where the newspaper goes next with its website (a “show period” - pushing out content in thin silos, verticals and channels, to self-publishing sites):

“…For Lewis though, the revolution is only halfway there, although the next phase that started with integrating the Sunday, daily and website business section will proceed at a much more leisurely pace. “There is no big panic. Everyone will find different solutions. We are not asking everyone to become a Dalek who can do everything. Specialisms will emerge.”

Lewis has already planned for a future when the spectacular newspaper website growth fuelled by broadband slows. The next phase after home pages and social networking will be the “show period” - pushing out content in thin silos, verticals and channels, to self-publishing sites and places such as US gossip website glam.com, which has 23 million global unique users.

“That is what we are basing our future on. We have got to have our stuff housed in other people’s self-publishing experience. Help yourselves please.” But also, and Lewis concedes he is “way ahead of myself here”, papers eventually “will go back to the future. In a world of multiple confusion and specialisation and email alerts, once a day people are going to ask, ‘can a bunch of really clever people tell me what they think I should know about and the order in which I should know about it’?”

original interview: http://www.guardian.co.uk/media/2007/dec/03/mondaymediasection.pressandpublishing

ACAP (Automated Content Access Protocol) aims to end Publisher / Search Engine conflict

December 3rd, 2007 by Administrator

The new, non-proprietary, open standard, ACAP (Automated Content Access Protocol), is set to put an end to publisher-search engine legal clashes was unveiled and showcased in New York today, 29 November 2007 at a conference opened by World Association of Newspapers President, Gavin O’Reilly and addressed by keynote speaker AP CEO Tom Curley.

ACAP has been developed at the initiative of the World Association of Newspapers, the International Publishers Association and the European Publishers Council in close collaboration with search engines to protect the intellectual property of anyone wishing to make content available on the worldwide web. ACAP is the result of an intense 12-month pilot project which has resulted in a unique communications tool that will open the door to more and more high level content, giving all content owners the confidence to make their content available on the worldwide web.

From today, publishers globally will be encouraged to implement ACAP version 1 which will allow publishers, broadcasters and indeed any other publisher of content on the network to express their individual access and use policies in a language that search engine robots and similar automated tools can read and understand. ACAP is set to become a universal standard. Click on the following link for instructions on how to implement ACAP: http://www.the-acap.org/implement-acap.php.

Yesterday, the Times Online became the first to implement ACAP.

Politicians and business leaders have leant support to ACAP: EU Commissioner Reding spoke via video to the conference saying: “Media companies have not yet fully adapted their business models to new distribution technologies, which cut across national borders and traditionally separated sectors. The uncertainties associated with the shift to digital technologies inhibit the development of many potential online services.

The Commission is following the ACAP project closely, since it offers possibilities for a win-win situation for all stakeholders.”

Gavin O’Reilly said: “We can overcome this obstacle to development thanks to ACAP. ACAP will give the content industry worldwide the incentive to innovate, create and disseminate. Newspapers, magazines, books, journals, directory publishers: anyone involved in digital publishing can now adopt a standard that will protect their interests and will make them masters of their own content.”

Gavin continued: “ACAP has been the huge beneficiary of input, technical know-how and quiet wisdom of all of the major search engines, albeit in an “informal” way. So some 5 months on, I want to recognise this publicly, with our sincere thanks. And to demonstrate how collaborative, open and inclusive ACAP is, I am delighted to be able to welcome the very large number of representatives from Yahoo, Microsoft and Google who have joined us here today.”

Further use cases for different business models, including for the audiovisual sector will be considered during the next phase of ACAP’s development.

ACAP Project Manager Mark Bide of Rightscom Ltd said: “Unprecedented industry support and commitment to the ACAP pilot must now be followed by a huge effort to roll ACAP out to the widest possible audience in the shortest possible time so that the digital publishing sector can reap the benefits of all the hard work to date.”

source: http://www.the-acap.org/conference.php

Filed under Google, News, Search having No Comments »

Internet advertising expected to overtake magazines by 2010

December 3rd, 2007 by Administrator

In The Times, Amanda Andrews writes on the latest advertising projections from ZenithOptimedia:

The internet will overtake magazines to become the world’s third-largest advertising medium by 2010, according to Steve King, chief executive of ZenithOptimedia. The head of the media-buying firm also believes that China will become the fourth-largest market for advertisers.

Mr King, who will be speaking at the 35th UBS annual media conference in New York today, will present an optimistic view of the future for advertising. He expects internet advertising to be worth $36 billion (£17.5 billion) this year, $5 billion more than predicted in December 2006. Global online advertising will increase by 24 per cent in 2008 and 69 per cent over the next three years (2007-10), reaching $61 billion in 2010, he will say.

A spokesman for Zenith said: “We predict global internet advertising to pass three milestones in the next three years: to overtake radio advertising in 2008; to attain a double-digit share of global advertising in 2009; and to overtake magazine advertising in 2010, with 11.5 per cent of total ad spend.”

Television will continue to dominate the global advertising market in 2010, with a predicted $198.89 billion in revenues, with newspapers next at £134.82 billion. Zenith forecasts that in 2010 the internet will account for $60.88 billion of total advertising spend, while magazines will account for $60.58 billion of expenditure.

Britain is one of the world’s most mature internet advertising markets, as it is one of only four places where internet advertising accounts for 15 per cent or more of total spend.

Zenith forecasts that by 2010 the internet will account for more that 20 per cent in the same four markets – which include Denmark, Norway and Sweden – and more than 15 per cent of advertising spend in ten other countries.

Mr King predicts that worldwide advertising expenditure will grow by 6.7 per cent in 2008, up from 5.3 per cent this year, as developing markets compensate for slow growth in developed countries.

The Olympic Games in Beijing and the US presidential election will contribute to the growth.

A spokesman for Zenith said: “While the credit squeeze is dampening economic growth around the world, we do not expect the advertising market to follow suit.”

source: http://business.timesonline.co.uk/tol/business/industry_sectors/media/article2988002.ece

Filed under Advertising having No Comments »

MySpace challenges Google for ads

November 12th, 2007 by Administrator

from: Jemima Kiss in Guardian Unlimited, Monday November 5 2007

MySpace: plotting a rival to Google’s AdSense system. Photograph: Nicholas Kamm/AFP/Getty Images

MySpace is to launch a big push for its advertising offering, including plans for a DIY service that will rival Google’s AdSense system.

The News Corporation-owned social networking website is also opening up an extensive targeting trial to new brands in the US.

MySpace has been running a targeted trial since July, which categorises users according to the information on their profiles and assigns advertising accordingly.

Advertisers can now use Hypertargeting by MySpace to direct their ads to more than 100 groups and sub-groups of MySpace users including gaming, sports, travel, consumer electronics and music.

MySpace has also announced plans for a self-serve advertising system that would allow users to design, launch and analyse their own advertising campaigns on the site.

Travis Katz, the MySpace international managing director, admitted that the system is based on the same concept as Google’s lucrative AdSense system, but said it is still a different product.

“AdSense was the first targeted advertising product that was open to everyone, and tapped the long tail,” Mr Katz added.

“It showed that you don’t have to be a huge company to buy media space, you could be a small start-up, a pizza place or a band. We’re taking the same concept but it’s not just text based.”

Launching early next year, SelfServe by MySpace will allow advertisers to create ads using their own logos, graphics and images, and will provide a built-in analytics system to measure the performance of the ad.

Meanwhile, more than 50 advertisers have signed up the first phase of the Hypertargeting by MySpace platform including Ford, Toyota and Procter & Gamble. The platform is being rolled out in the US today and will be extended to English-speaking territories in January.

MySpace claims that brands involved in the targeted advertising trial have seen response rates increase by 300% for some campaigns.

Mr Katz said the company has had more than 100 people working on the system for the past year, which was now “delivering the promise”.

“It’s a very sophisticated technical engine that looks at all the publicly available data on users, groups and the interests of their friends,” he added.

“For advertisers, it’s [delivering on] the promise that internet advertising has always been, and users, who have been involved in the testing, like targeted advertising better than generic ads. They don’t like untargeted ads because they feel more intrusive, whereas if ads are relevant and of interest to them they enjoy them.”

The targeting of advertising on social networks has caused concern among some users, who do not want their personal data used by advertisers and are concerned about privacy.

Mr Katz said he believed only a small number of people are concerned about targeted ads, and added the system will allow users to opt out.

MySpace’s UK office now has more than 130 staff but, Mr Katz said, the company is “hiring so fast I can’t keep count”.

Source: http://www.guardian.co.uk/technology/2007/nov/05/myspace.advertising

IBM Predicts the End of Advertising as We Know It

November 11th, 2007 by Administrator

ARMONK, NY - 08 Nov 2007: IBM (NYSE: IBM) Global Business Services unveiled its new report, “The End of Advertising as We Know It,” forecasting greater disruption for the advertising industry in the next five years than occurred in the previous 50.

To examine the factors influencing advertising and explore future scenarios, IBM surveyed more than 2,400 consumers and 80 advertising executives globally. The IBM report shows increasingly empowered consumers, more self-reliant advertisers and ever-evolving technologies are redefining how advertising is sold, created, consumed and tracked.

Traditional advertising players risk major revenue declines as budgets shift rapidly to new, interactive formats, which are expected to grow at nearly five times that of traditional advertising. To survive in this new reality, broadcasters must change their mass audience mind-set to cater to niche consumer segments, and distributors need to deliver targeted, interactive advertising for a range of multimedia devices. Advertising agencies must experiment creatively, become brokers of consumer insights, and guide allocation of advertising dollars amid exploding choices. All players must adapt to a world where advertising inventory is increasingly bought and sold in open exchanges vs. traditional channels.

“Digital entertainment is experiencing faster adoption than anyone had previously anticipated. The advertising community needs to dramatically re-orient its business to serve consumers who increasingly access content in non-linear formats,” said Bill Battino, Communications Sector managing partner, IBM Global Business Services. “Companies must re-look at how they serve content to consumers with business models based much more on engaging consumers in a relationship.”

The report observes four change drivers tipping the advertising industry balance of power: control of attention, creativity, measurement, and advertising inventories. As shown in IBM’s global digital media and entertainment consumer survey released in August, consumers’ attention has shifted, with personal Internet time rivaling TV time. Consumers have tired of interruption advertising, and are increasingly in control of how they interact, filter, distribute, and consume their content, and associated advertising messages. IBM’s survey findings demonstrated that half of DVR owners watch 50 percent or more of programming on re-play, and that traditional video advertising doesn’t translate online: 40 percent of respondents found ads during an online video segment more annoying than any other format. Amateurs and semi-professionals are increasingly creating low cost advertising content that threatens to bypass creative agencies, while publishers and broadcasters are broadening their own creative roles. Advertisers are demanding accountability and more specific individual consumer measurements across advertising platforms. Self-service advertising exchanges are attracting revenues that were once exclusively sold through proprietary channels or transactions.

Advertising Experts’ Expectations in Line with Global Consumer Trends
IBM’s research found that advertising experts recognize the changing nature of consumers and also anticipate dramatic changes on the horizon. More than half of ad professionals polled by IBM expect that in the next five years open advertising exchanges (currently led by companies like Google, Yahoo, AOL) will take 30 percent of current revenues now commanded by traditional broadcasters and media. Nearly half of the advertising survey respondents anticipate a significant (greater than 10%) revenue shift away from the 30-second spot within the next five years, and almost 10 percent of respondents thought there would be a dramatic (greater than 25 percent) shift. Two-thirds of advertising experts surveyed by IBM expect 20 percent of advertising revenue to move from impression-based to impact-based formats within three years.

Saul Berman, IBM Media & Entertainment Strategy and Change practice leader, said, “Advertising remains integral to pop culture and continues to fund a significant portion of entertainment around the world. But it needs to morph into new formats and offer more intrinsic value to consumers, who will have more choices. The wealth of new advertising outlets means consumer analytics will have a more prominent role than ever regardless of where you reside in the value chain. Young people in particular have grown accustomed to not paying for content. Despite greater consumer control over content and advertising, we envision a world where consumers will continue to prefer to view advertising rather than pay for content directly.”

The report indicates by 2012, the landscape of the industry will change so profoundly that to survive, advertising industry players need to take aggressive steps to innovate in three key areas:

* Consumers: making micro-segmentation and personalization paramount in marketing;
* Business models: how and where advertising inventory is sold, the structure and forms of partnerships, revenue models and advertising formats;
* Business design and infrastructure: All players need to redesign organizational and operating capabilities across the advertising lifecycle to support consumer and business model innovation: consumer analytics, channel planning, buying/selling, creation, delivery and impact reporting.

IBM believes that all players will need to invest heavily in consumer analytics and automation to gain more insights about the consumer and how to reach them. For example, interactive advertising paired with consumer analytics provides compelling knowledge of who viewed and acted on an ad rather than estimates of impressions, allowing advertisers to maximize revenue and yield management. Industry players will also need to examine if they have right resources and capacity to handle increased marketing promotions and integrated advertising sales. Finally, IBM observes that the dramatic increase in both the number and variety of promotions is leading to greater investment in tools to digitally transform and reduce the cost of companies’ workflows including content management, creative development, production and sign-off processes.

The complete report with detailed recommendations for broadcasters, distributors and advertising agencies can be found at: www.ibm.com/media/endofadvertising

Filed under Advertising having No Comments »

NY Times - subcription requests for non animated ads

September 10th, 2007 by Administrator

Interesting feedback from a survey of NY Times readers requesting that on their subscription, via a reader the number of animated ads be limited or removed to enhance their reader experience. A new format has been suggested which would use static images with subtle animation.

< --- original post --->

Update On Times Reader: Mac Version En Route; Ditto For New Ad Format
By Staci D. Kramer - Sun 09 Sep 2007 09:34 PM PST

A few nuggets from an NYT (NYSE: NYT) First Look post about Times Reader by Rob Larson, Vice President, Product Management, NYTimes.com, reporting the results of a survey taken by more than 4,500 users:

– The Mac version of the digital reader designed with Microsoft (NSDQ: MSFT) is still a few months away. The first version won’t have all the Windows features so will be a free beta. The paper plans a 2008 version that will offered as a subscription service and—as is the case with the Windows version—included with home delivery subscriptions. (Sorry, Rob, as a home sub with a recent price increase, getting hard for me to think of it as “free.”) Times Reader currently runs $14.95 a month

– Survey participants asked for a limit on advertising with rapid animation. Larson: “We agree with you. We designed Times Reader to be a quiet reading experience, and the animated ads, which are so prevalent on the Web, detract from that experience.” In response, they’re drawing up specs for a new ad format to strike the right balance. that will encourage the use of static images (much as you see in the printed paper) but will allow subtle animation. The format will use static images with subtle animation. Those who choose to interact with the ad will see full animation and interactivity.

– The #1 request was for better breaking news access. The Times Reader defaults to 30-minute syncs; they’re hoping to offer a 15-minute option, which still won’t be fast enough for breaking news. They’re now working on a way to broadcast news alerts through Times Reader at the same time as they post on NYTimes.com.

Source: http://www.paidcontent.org/entry/419-update-on-times-reader/

YouTube architecture

August 2nd, 2007 by Administrator

Interesting article on the growth problems and solutions that YouTube had to go through. From highscalability.com:

< -- original post -->

YouTube Architecture
Todd Hoff’s picture
Tue, 07/17/2007 - 20:20 — Todd Hoff

* YouTube Architecture (923)

YouTube grew incredibly fast, to over 100 million video views per day, with only a handful of people responsible for scaling the site. How did they manage to deliver all that video to all those users? And how have they evolved since being acquired by Google?

Information Sources

* Google Video

Platform

* Apache
* Python
* Linux (SuSe)
* MySQL
* psyco, a dynamic python->C compiler
* lighttpd for video instead of Apache

What’s Inside?

The Stats

* Supports the delivery of over 100 million videos per day.
* Founded 2/2005
* 3/2006 30 million video views/day
* 7/2006 100 million video views/day
* 2 sysadmins, 2 scalability software architects
* 2 feature developers, 2 network engineers, 1 DBA

Recipe for handling rapid growth

while (true)
{
identify_and_fix_bottlenecks();
drink();
sleep();
notice_new_bottleneck();
}

This loop runs many times a day.

Web Servers

* NetScalar is used for load balancing and caching static content.
* Run Apache with mod_fast_cgi.
* Requests are routed for handling by a Python application server.
* Application server talks to various databases and other informations sources to get all the data and formats the html page.
*Can usually scale web tier by adding more machines.
* The Python web code is usually NOT the bottleneck, it spends most of its time blocked on RPCs.
* Python allows rapid flexible development and deployment. This is critical given the competition they face.
* Usually less than 100 ms page service times.
* Use psyco, a dynamic python->C compiler that uses a JIT compiler approach to optimize inner loops.
* For high CPU intensive activities like encryption, they use C extensions.
* Some pre-generated cached HTML for expensive to render blocks.
* Row level caching in the database.
* Fully formed Python objects are cached.
* Some data are calculated and sent to each application so the values are cached in local memory. This is an underused strategy. The fastest cache is in your application server and it doesn’t take much time to send precalculated data to all your servers. Just have an agent that watches for changes, precalculates, and sends.

Video Serving

* Costs include bandwidth, hardware, and power consumption.
* Each video hosted by a mini-cluster. Each video is served by more than one machine.
* Using a a cluster means:
- More disks serving content which means more speed.
- Headroom. If a machine goes down others can take over.
- There are online backups.
* Servers use the lighttpd web server for video:
- Apache had too much overhead.
- Uses epoll to wait on multiple fds.
- Switched from single process to multiple process configuration to handle more connections.
* Most popular content is moved to a CDN (content delivery network):
- CDNs replicate content in multiple places. There’s a better chance of content being closer to the user, with fewer hops, and content will run over a more friendly network.
- CDN machines mostly serve out of memory because the content is so popular there’s little thrashing of content into and out of memory.
* Less popular content (1-20 views per day) uses YouTube servers in various colo sites.
- There’s a long tail effect. A video may have a few plays, but lots of videos are being played. Random disks blocks are being accessed.
- Caching doesn’t do a lot of good in this scenario, so spending money on more cache may not make sense. This is a very interesting point. If you have a long tail product caching won’t always be your performance savior.
- Tune RAID controller and pay attention to other lower level issues to help.
- Tune memory on each machine so there’s not too much and not too little.

Serving Video Key Points

* Keep it simple and cheap.
* Keep a simple network path. Not too many devices between content and users. Routers, switches, and other appliances may not be able to keep up with so much load.
* Use commodity hardware. More expensive hardware gets the more expensive everything else gets too (support contracts). You are also less likely find help on the net.
* Use simple common tools. They use most tools build into Linux and layer on top of those.
* Handle random seeks well (SATA, tweaks).

Serving Thumbnails

* Surprisingly difficult to do efficiently.
* There are a like 4 thumbnails for each video so there are a lot more thumbnails than videos.
* Thumbnails are hosted on just a few machines.
* Saw problems associated with serving a lot of small objects:
- Lots of disk seeks and problems with inode caches and page caches at OS level.
- Ran into per directory file limit. Ext3 in particular. Moved to a more hierarchical structure. Recent improvements in the 2.6 kernel may improve Ext3 large directory handling up to 100 times, yet storing lots of files in a file system is still not a good idea.
- A high number of requests/sec as web pages can display 60 thumbnails on page.
- Under such high loads Apache performed badly.
- Used squid (reverse proxy) in front of Apache. This worked for a while, but as load increased performance eventually decreased. Went from 300 requests/second to 20.
- Tried using lighttpd but with a single threaded it stalled. Run into problems with multiprocesses mode because they would each keep a separate cache.
- With so many images setting up a new machine took over 24 hours.
- Rebooting machine took 6-10 hours for cache to warm up to not go to disk.
* To solve all their problems they started using Google’s BigTable, a distributed data store:
- Avoids small file problem because it clumps files together.
- Fast, fault tolerant. Assumes its working on a unreliable network.
- Lower latency because it uses a distributed multilevel cache. This cache works across different collocation sites.
- For more information on BigTable take a look at Google Architecture, GoogleTalk Architecture, and BigTable.

Databases

* The Early Years
- Use MySQL to store meta data like users, tags, and descriptions.
- Served data off a monolithic RAID 10 Volume with 10 disks.
- Living off credit cards so they leased hardware. When they needed more hardware to handle load it took a few days to order and get delivered.
- They went through a common evolution: single server, went to a single master with multiple read slaves, then partitioned the database, and then settled on a sharding approach.
- Suffered from replica lag. The master is multi-threaded and runs on a large machine so it can handle a lot of work. Slaves are single threaded and usually run on lesser machines and replication is asynchronous, so the slaves can lag significantly behind the master.
- Updates cause cache misses which goes to disk where slow I/O causes slow replication.
- Using a replicating architecture you need to spend a lot of money for incremental bits of write performance.
- One of their solutions was prioritize traffic by splitting the data into two clusters: a video watch pool and a general cluster. The idea is that people want to watch video so that function should get the most resources. The social networking features of YouTube are less important so they can be routed to a less capable cluster.
* The later years:
- Went to database partitioning.
- Split into shards with users assigned to different shards.
- Spreads writes and reads.
- Much better cache locality which means less IO.
- Resulted in a 30% hardware reduction.
- Reduced replica lag to 0.
- Can now scale database almost arbitrarily.

Data Center Strategy

* Used manage hosting providers at first. Living off credit cards so it was the only way.
* Managed hosting can’t scale with you. You can’t control hardware or make favorable networking agreements.
* So they went to a colocation arrangement. Now they can customize everything and negotiate their own contracts.
* Use 5 or 6 data centers plus the CDN.
* Videos come out of any data center. Not closest match or anything. If a video is popular enough it will move into the CDN.
* Video bandwidth dependent, not really latency dependent. Can come from any colo.
* For images latency matters, especially when you have 60 images on a page.
* Images are replicated to different data centers using BigTable. Code
looks at different metrics to know who is closest.

Lessons Learned

* Creative and risky tricks can help you cope in the short term while you work out longer term solutions.

* Know what’s essential to your service and prioritize your resources and efforts around those priorities.

* Pick your battles. Don’t be afraid to outsource some essential services. YouTube uses a CDN to distribute their most popular content. Creating their own network would have taken too long and cost too much. You may have similar opportunities in your system. Take a look at Software as a Service for more ideas.

* Keep it simple! Simplicity allows you to rearchitect more quickly so you can respond to problems.

* Sharding helps to isolate and constrain storage, CPU, memory, and IO. It’s not just about getting more writes performance.

* Constant iteration on bottlenecks:
- Software: DB, caching
- OS: disk I/O
- Hardware: memory, RAID

* Have a good cross discipline team that understands the whole system and what’s underneath the system. People who can set up printers, machines, install networks, and so on. With a good team all things are possible.

< -- end original -->

Source: http://highscalability.com/youtube-architecture

Filed under Technology, Video having No Comments »

« Previous Entries