Archive for the 'Software' Category Page 2 of 8



Jan 06

Attack Surface: Comparing Products Relative Security

One of the recurrent questions in security is which product is the most secure. Windows Xp or Linux Debian ? Firefox or Internet Explorer ? Often theses comparisons are based on subjective opinions or the number of vulnerabilities count. A third way exist: Attack Surface analysis.

Often as a security specialist, my friends and students keep ask me which product is the more secure. That is a tough question, really because how do you measure a product security ?

Current Comparison Methodologies

Authoritative answer

Of course you can go for the authoritative proof : X (very famous) has said it. However science is about questioning and experiment so every time I heard this type of answer I wonder how this X guy knows that this product is more secure.

Statistical Analysis

The other well known method is to count the vulnerability that each product have suffers from the past. This is a method based on statistics and at the first sight it seems more reasonable. For example here is result of Nilotpal’s study to compare Vista to ubuntu Drapper. (You have a similar study that compare OSX and Vista on Larry’s zdnet blog)

ubuntu+vs+Vista html m27e37cfa

While more scientific this methodology still have several flaws that make it quite unreliable:

First it assumes that from the past, you can predict the future. The basis of this approach is : if Y vulnerabilities have been discovered over the last X month then it is probable that same order of amount will be discovered in the next X month. If you use statistics on a subject that have pattern, it works really great. For example it works great for temperature, river water level or shop sales because they have pattern and cycle. For example temperature has a pattern conditioned by the rotation of earth over the sun and therefore you have a clear 12 months long cycle. But for system security there is no such cycle.

Moreover the only cycle that is well known for software is the product cycle : The interest for a product decreases with the time and ultimately it going to be supercede by a new version or a new one. It is the same for vulnerability analysis. When a product is released many peoples focus on find its holes. As time flows the number of people interested in finding holes might decrease as people move to other products/version.

ProductLifeCycle

The second flaw in the approach is that in regular statistical analysis you are able to say. Over X customer/month Y have been this or that. Here you don’t actually now how many people have look at the code to find holes and how much time they spend on it. So maybe there is more flaw found because more people are looking at this precise product.

Thirdly, a more subtle flaw in the approach appears when two products are compared over the last X month. This is not a faire comparison because theses their have been released at a different moment and therefore are not in the same part of their lifecyle. How comparing a product in its maturity stage against one in its introduction state can be objective ?

That is why, we need a more objective measure to compare products. Something that doesn’t rely on the past or some oracle but on facts. This is attack surface.

Attack Surface

An attack surface is a relative measure of product security. We say it is relative because it exists only in comparison to other products. For instance a spoon can be view as small only because some there is some bigger spoon: this is a relative measure. Similarly a product is more secure than an other (relative).

Absolute versus relative measure

An absolute measure is not possible because we can’t prove that a product is absolutely safe or has no bug. This is related to the halting machine problem and the rice theorem. If you are interested in bug detection take a look at ASTREE, the static analyzer made at the ENS.

The ultimate goal to achieve with attack surface is to be able to say “product X is more safe than product Y because it have a smaller attack surface”

Intuitively the attack surface aims at measuring how many attack vectors are available for each product. It does not measure if theses entries are actually used as attack vector but evaluate the potential. A way to view this is to think about mountain climbing: A way is relatively more easy than an other because it has more hooks to put your feet and hands. It does not tell you that the more difficult one is unusable or not it just tell you that it is more likely that easiest one is usable to reach your goal. Same for products, a product with a larger attack surface will be more likely vulnerable than one with a smaller one.

Attack surface history

Attack surface have been around in research since 2003. I believe that it is Michael Howard of Microsoft that informally define the notion of Relative Attack Surface Quotient (RASQ). The first paper on the subject called “Measuring Relative Attack Surfaces” was published in workshop by Michael Howard, Jon Pins, and Jeannette M. Wing in Dec. 2003. Since then the “Attack Surface Measurement” project is held at Carnegie Mellon.

How to measure it ?

So how an attack surface is measured ? Well that is the big challenge ! There is several on-going work on it but the basic idea is quite the same in every research.

You have three parts that define the attack surface :

  1. Target
  2. Enabler
  3. Vector

Targets are attacker objectives : a root shell is the most obvious one. Leaking sensitive data is an other etc …

Enablers are the set of process and services that allow the attacker to reach is goal for example an http server running.

Vectors can be view as the medium used to reach enabler and target. It can be a socket, memory sharing, pipe …

So roughly an attack surface is somehow the product of : Target X Enabler X Vector (It is not totally accurate and depends on the formalism but it should give you the idea). Stephen Northcutt intuitive definition is very bright (check his post on attack surface):

We can define attack surface as our exposure, the reachable and exploitable vulnerabilities that we have. The best word picture I know of is the depiction of the Spartan Phlanx depicted in Warner Brothers’ tale of the Battle of Thermopylae, based on Frank Miller’s ‘300′.”

Some other criteria can be used to derive attack graph surface for instance you can use the LOC : Line of Code index. The idea behind this index is : the more the line of code there is the more likely their is bug. However this rule of thumb have also counter example (so far). For instance the iphone bootloader code is smaller that the baseband code. however bugs have been only found in the bootloader so far.

Current uses

Some product are already on the market for attack surface analysis such as Holodeck :

attack-surface-gui-large

And you can find pretty good attack surface analysis that try to evaluate the potential security of product. The most famous is probably the “Windows Vista Network Attack Surface Analysis” by Symantec.

You also find a mention of attack surface in many windows 2008 preview. Such as in zdnet, 4sysops, and a windows2008blog.

Conclusion

Attack surface is currently the most scientificaly grounded method to compare product security. It is intuitive and simple in the concept but very complex to model and implement. This metric helps to answers important question such as does the new version of product X is better that Y from a security perspective.

See you next Sunday and a happy new year to you !

Dec 07

Blog trackback Spam analysis

This friday, I present you my analysis on a botnet that spam blogs through the trackback/pingback mechanism. They try to abuse of blog trackback mechanism to improve their web ranking on search engine. I have been able to collect data about this botnet for around a year because this botnet is targeting my personal website.

It is useful to analyze the data collected because it allows to see how spam evolve over the time and how they do. This is quite a hot topic, and other blogs (see here and here), have entry on the subject. However to the best of my knowledge this post is the first that provide a complet technical analysis based on a vast amount of data.

I tried to analyze every aspect of this spam from the daily activity I monitor, to the type of machine involved, to the type of site that they are promoting. Without spoiling all the fun, I had to do a binary analysis of the file they are try to install on your pc.

Before getting into details, let take a look at the context of this spam. It is a very specific type of spam because it target blog and not mail. Therefore it input and output of this spam is quite different from the one you observe in email.

One key specificity of blog is that they aim at creating interaction between user but also between blogs. That the “blogsphere”. A common mechanism to allow interaction between blog is the trackback/pingback mechanism.

halo wordpress

The trackback Flood

The trackback mechanism is used to notify an author that you have make a link to one of their document. It enable authors to keep track of who is linking to, or referring to their article. It is also used to allow visitor to easily navigate between posts that are related to the same subject.

For example if you have a blog and speak about this article then you can send a trackback to let the author know about it. Your trackback will be added to the list of sites that refers to it.

Trackback specification is due to Six Apart who implemented it in its Movable Type on 2002. Since 2006 it is an IETF working group and will be one day a standard.

Finally trackback allows to generate traffic and optimize sites ranks. It will also make the author of the post happy to see that people find his work useful :)

Spammer found this later functionality very “useful”. They use it to optimize their search engine ranking and drag traffic to their site. It is appealing to them because the trackback mechanism allows to inject a link that points to their site into other blog in an automatic fashion.

While anti-spam techniques are interesting, I will not detail them on this post because it is a little bit out of the scope. If I have some request on the subject, I will write a detailed post about it.

The dotclear blog system version 1 implement a trackback mechanism in the file tb.php. A trackback is done by calling this file with the post id.

For example the url www.mysite.com/tb.php?id=18 is used to add a trackback to the post 18.

More technically a trackback is an HTTP POST request (Not a HTTP GET as It is commonly written) that submit four variables : the title, the url, the excerpt and the blog_name.

Here is a sample of a spammed trackback gathered by my trap page:

[title] => Cis transgender wiki
[url] => http://cis-transgender-wiki.spacenow ronomy.yyyy/index.html
[excerpt] => wiki Cis transgender wiki …
[blog_name] => Cis transgender wiki

My personal site bursztein.eu is the target of this type of spam for more than one year. I have keep a track of every requests since the beginning on a combined apache log. The file now exceed the 700MB of data. This give you an idea of how intense the spam flood can be.

More recently, I have also set up a trap page in order to log spam request. The trap page is used to log as many information as possible in an SQL backend. One of my friends call it a honeyblog :) And that pretty much what it is: a honeypot for blog trackback. It gives the impression to spammer that their trackback are really inserted to my web page, but instead I record and analyze them. I even have created an incomplete live report of the spam activity.

More specifically, I have created this honeyblog for two main purposes

  1. To be able to analyze the content of trackback since apache log do not log request content and maybe write a publication on it.
  2. To be able to generate a spammer IP list that can be use by anyone.

So how my personal site end up to be targeted by this kind a spam ? Well it is subject to this attack because at one point I have used dotclear for a couple of month before I decided to switch to Wordpress. During this period, I was probably added to the spammer target blog list and since then my site experience spam attempt on a regular basis.

Monthly activity

At this point, you probably wonder how bad the situation is. If it is only a couple or requests every day it should not be a big deal. Well It is a big deal ! look at the chart below that report the spam activity against my site for the last 12 months:

spamattemps

Note that the number of spam for December is a prevision. It seems quite accurate if you took a look at the current live report

Before trying to analyze the spammer objectives, Let’s take a closer look to an ordinary day of spamming to understand how the spamming plate form run. I choose the 4th December 2007 which was the first day of my honey blog.

Daily activity

On the 4th December 45798 different IP tried to add 60148 trackback spam on my site. This mean that my server had to generate 60148 page for nothing. Computer load analysis shows that before the installation of the honeyblog, spam in its peak, have consume up to 30% of my server cpu power.

Here are some of the most interesting statistics computed during the analysis.

user agent

First of all the user agent repartition. The user agent is a variable sent by the client to indicate which software it use. Back in 2006, spammer user agent where very standard: Internet Explorer, opera … But today they only use blog specific user agent : Wordpress. This make sense because they try to impersonate real blog trackback. Here is a little chart of the user agent repartition for the 4th December:

useragent

As you can see, They are not using the current version of Wordpress (2.3.x). This makes me believe that they are not continuously updating their software. It seems that a company is specialized in written this type of spam software. I won’t link to them of course for obvious reasons but if you wish checkout geek and fly blog Adam have a nice post about this soft.

trackback submitter comment spam blog mass link software

Spam activity

Next I was interested in the flood behavior. At the beginning I thought that if I graph spam activity by hour I will find a “loop”. This is consistent with the idea that they are flooding one site then an other and when the list is exhausted they start at the beginning of the list again. Since my blog is only one entry in their database I should have observed activity peak. However if you look at the chart below, you see that I was wrong.

spamdayactivity

Indeed there seems to be a period of 7 hours of activity but there is never under 1700 trackback posts by hour. The only explication for this repartition is that the entire set of flooder computer are coordinate to distribute their activity. This allows us to infers that the entire spam (or at least the most part) is the result of a single individual or company activity.

To observes the spam cycle more precisely, I might be possible to try to isolate one if activity. It then can allows to infers the size of the spammer target database. But this is more be for a research paper that for a blog post I guess.

Spam plate form

I wanted to confirm the hypothesis that the spam was coordinate by a single entity. If this hypothesis was true then the plate form (the set of computer) should be quite homogenous. To ensure that I run a couple of tests. Three of the most interesting results are the geolocalisation of computer, the type of OS runned on theses computers and their uptime.

Plate form Geolocalisation

First I wanted to know in which country the plate form was localized. To found it origin I have run the list of spammer IP against a IP geolocalization database. I used the webnet77 Free database. As I expected every 45798 IP belongs to a single country : Russia.

computerorigin

This is an additional good hint that the flood is performed by a single entity. A good and unanswered question is why a spam plate form located in Russia promote Chinese sites ?

Plateform OS

This result also rise an additional question : How many computer are really behind this set of IP. At this point It was quite likely that there where in fact few computer with many IP to by pass spam filter. To validate this new hypothesis I probed 100 random IP with nmap.

osrepatition

As you see the plate form OS are pretty consistent and singular. I have never expected that it will be FreeBSD. For those who dot know : FreeBSD is an UNIX but way less popular that LINUX (I am not discussing OS merit here). You probably have heard of it because OSX core is based on it. At this point I had a strong suspicion that there where only few computer in the plate form.

Computer uptime

To be sure I runned an other test. I have measure the uptime of each IP computer. See the repartition belows:

computeruptime

As you see there is only 6 different uptime for these 100 IP. Of course this measure need to be refined and extended to many more IP to be sure but It really tends to confirms that you have few computer because it very unlikely that two different computers have the same uptime. This repartition also indicates that the plate form is pretty solid because of the long uptime of some computer. It also prove that the spam is runned 24h a day.

Spam content

Of course I run a couple of basic analysis on the spam content. For this purpose I have used a corpus of 1139 spam samples. I have used standard text analysis technique to determine the prominent characteristics of the spam. I only details in this already to long post the results for the TITLE variable because other variables analysis does not differs very much.

In this 1139 titles there where 3936 words.

Statistical breakdown

Number of different words : 2041
Complexity factor (Lexical Density) : 51.9%
Readability (Gunning-Fog Index) : (6-easy 20-hard) 2.9Total number of characters : 28942
Number of characters without spaces : 22949
Average Syllables per Word : 1.86
Average title length (words) : 3.68
Ax title length (words) : 11 ( cu*m in here mouth she will spit it back in yours)
Min sentence length (words) : 1 ( sexyimages)
Readability : (100-easy 20-hard, optimal 60-70) 45.9

Well as you see title are short and have a pretty large lexical density. I never though that sex lexical was so large :) A very important analysis is the top word. It shows that black listing word will not work well because titles does not reuse the same sentence again and again. If you look at the word occurrences frequency rank below you will see that at most a word appears in 2.7% of the spam. and beside the 6 first word this percentage drop below 1%. (I have added extra * on word to avoid being flagged as p*or*n site)

Word Occurrences Frequency Rank

Word Occurrence % Rank
nud*e 106 2.7% 1
fr*ee 80 2% 2
se*x 76 1.9% 3
por*n 56 1.4% 4
pics 51 1.3% 5
na*ked 50 1.3% 5
video 42 1.1% 6
girls 31 0.8% 7
se*xy 31 0.8% 7
he*ntai 31 0.8% 7

2 word phrases frequency

Word Occurences %
free 36 0.9%
video 26 0.6%
po*rn 25 0.6%
nu*de 24 0.6%
pics 23 0.5%
na*ked 19 0.5%
se*x 19 0.5%

The money

Finally One question remains : How do they make money ? These people does not make this the beauty of the art, they make it for money. At first I thought they sell SEO product (Search Engine Optimization) but I was wrong. I went to one of the url submitted in the trackback (Do not do this at home that can be dangerous for your computer !). For those who wonder here is what you see on this site:

Screenshot

It was a pass through to a video site (por*no) (no link here again) that offers you to download a pulsing to view the video. Of course this pulsing is a spyware (told you not to go there). I know this because I run two tests on it.

First I used virustotal. This a cool service that allows you to upload a binary and it run every antivirus software on it. Some antivirus have found that the binary is in fact a spyware downloader:

Fichier setup.exe reçu le 2007.12.13 15:35:37 (CET)
Antivirus Version Dernière mise à jour Résultat
AhnLab-V3 2007.12.13.10 2007.12.12 -
AntiVir 7.6.0.40 2007.12.13 DR/Zlob.Gen
Authentium 4.93.8 2007.12.13 -
Avast 4.7.1098.0 2007.12.12 -
AVG 7.5.0.503 2007.12.13 Downloader.Zlob.LI
BitDefender 7.2 2007.12.13 -
CAT-QuickHeal 9.00 2007.12.12 -
ClamAV 0.91.2 2007.12.13 Trojan.Dropper-2529
DrWeb 4.44.0.09170 2007.12.13 Trojan.Popuper.origin
eSafe 7.0.15.0 2007.12.12 -
eTrust-Vet 31.3.5373 2007.12.13 -
Ewido 4.0 2007.12.13 -
FileAdvisor 1 2007.12.13 -
Fortinet 3.14.0.0 2007.12.13 -
F-Prot 4.4.2.54 2007.12.12 -
F-Secure 6.70.13030.0 2007.12.13 -
Ikarus T3.1.1.15 2007.12.13 -
Kaspersky 7.0.0.125 2007.12.13 -
McAfee 5184 2007.12.12 -
Microsoft 1.3007 2007.12.13 TrojanDownloader:Win32/Zlob.gen!dll
NOD32v2 2721 2007.12.13 -
Norman 5.80.02 2007.12.12 -
Panda 9.0.0.4 2007.12.13 -
Prevx1 V2 2007.12.13 -
Rising 20.22.32.00 2007.12.13 -
Sophos 4.24.0 2007.12.13 Troj/Zlobar-Fam
Sunbelt 2.2.907.0 2007.12.13 -
Symantec 10 2007.12.13 -
TheHacker 6.2.9.157 2007.12.12 -
VBA32 3.12.2.5 2007.12.10 -
VirusBuster 4.3.26:9 2007.12.12 -
Webwasher-Gateway 6.6.2 2007.12.13 Trojan.Dropper.Zlob.Gen
 
Information additionnelle
File size: 80367 bytes
MD5: 644801d4594f665cbf2f4b0ebf76b490
SHA1: ea6b20b7aedb2016930cf0f1aaf69a82ff04c2d0
PEiD: -

Then I run the anubis spyware analyzer. This is a research tool made by the seclab of Tuwien. This a terrific tool that performs a dynamic analysis on binary. It allows to know what the spyware will do on your computer without installing it. As you can see in the report, this spyware indeed install many files on the computer

Conclusion

First of all, if you made it so far: thanks you ! I know that is a very long post, but I didn’t feel that making shortcoming was an option. What started as a little experiment turn to be a quite interesting investigation, It might even turn at some point to a research paper indeed. Even if it does give all the in and out of trackback spam, I hope that it had give you an insight of how blog spam are used today to make profit.

See you next Friday for an other post that will be eventually shorter :)