GIGJ.COM
welcome to my space
X
Welcome to:gigj.com
Search:  
NAVIGATION: Home >>
statistics, measure of significance
Published by: mike 2010-03-10

  • CiteULike: Statistical significance for genomewide studies.::
    Nov 7, 2008 In doing so, a measure of statistical significance called the q value is VL - 100 SP - 9440 KW - genome KW - statistics AU - Storey,
    http://www.citeulike.org/user/druvus/article/241030
    HOME
    I have two lists of numbers that represent a diverse list of statistics taken at two different times (let's say network performance). the list is large and I want to highlight differences which are most likely to be significant or interesting. I do not have a large historical sample to base this on.. it can only be a function of the two data points. a straight difference is not good because large value stats have larger differences than smaller value stats. (a change from 1000 to 1100 appears more significant than 1 to 50.) a percent difference is not good because small value stats have erratic changes which are big in relative (percentage) terms. e.g. a change from 2 to 4 looks more significant than 170,000-180,000. I probably don't know how to state this properly, but large magnitude values have a tendency to hover around a typical value (like the size of raindrops) while small values can go from 0 to other small values such as 5, fairly easily. I was thinking of something like using the larger of the two values as the assumed magnitude, by which the significance of a difference is scaled down. am I making any sense? if so, there must be a very standard statistical way of saying this properly. I prefer a single value that could be scaled to a fixed range, e.g. from 0 to 100 so that I can adjust the threshold of "interesting". I just want to help highlight the values in these long lists, which are most worthy of inspection.


  • I'm not sure I fully understand the question and I may be repeating what has already been said, but... Are you looking for what is known as a standard deviation? This is basically a measure of how much a set of data vary around the mean. It's fairly simple to compute; I'm sure there are computer programs that do it. You might even be able to use Excel - I don't know. If you want more info on standard deviation, do a Google search. You are bound to find more than you'd ever care to know! Good luck.


  • mathtalk, your point is well taken about this not being truly 'statistical', and about needing a model to have truly meaningful answers. I realize I'm asking for something a little bogus and magical by insisting on comparing only two points and by insisting on no knowledge of the meaning of the data. but having said that: an absolute differce does do that magical job, albeit poorly. and a relative differnce does an even better job, bordering on good enough, but I'd like to know if there's a next logical improvement that has better properties that either of these by themselves, without putting more demands on the front end of this whole process. this simple expression does a soso job: (larger+1)/(smaller+1) (+1 because there are 0s in the data) all this is really, is a prioritization for which of these differences I may select as "interesting" and go on to model, as the next step.
  • Statistics as Principled Argument - Google Books Result::
    href=http://books.google.com/books?id=YRMnGB4OwiUC&pg=PA40&lpg=PA40&dq=statistics,+measure+of+significance&source=web&ots=ZpCqDxLIDJ&sig=y3xabCAEy9QSWo6dQ191dN-0ynU&hl=en&sa=X&oi=book_result&resnum=21&ct=result class=l onmousedown=return clk(this.href,,,res,44,)>Statistics as Principled Argument - Google Books Resultby Robert P. Abelson - 1995 - Mathematics - 221 pagesAs a magnitude measure, the significance level has the attraction that statistical computer packages commonly provide the precise p values for a variety of
    http://books.google.com/books?id=YRMnGB4OwiUC&pg=PA40&lpg=PA40&dq=statistics,+measure+of+significance&source=web&ots=ZpCqDxLIDJ&sig=y3xabCAEy9QSWo6dQ191dN-0ynU&hl=en&sa=X&oi=book_result&resnum=21&ct=result
    HOME
    Measurement of quality of life in pulmonary hypertension and its ::
    Measurement of quality of life in pulmonary hypertension and its significance. E . Cenedese, R. Speich, L. Dorschner, S. Ulrich, M. Maggiorini,
    http://www.erj.ersjournals.com/cgi/content/full/28/4/808
    HOME


  • Use a statistical computer program (such as SPSS) to convert the raw scores into z scores. You can then compare means, standard deviations, and do other statistical analysis. You can get a trial version of SPSS at www.spss.com Good Luck! -Rebekah PS - Here are instructions on exactly how to do this using SPSS: http://www.uoguelph.ca/~psystats/raw_to_z-score_conversions.htm


  • Hi, lusus: I've had to address a similar issue, with "regression" testing of a computer software application. Running the software against a large number of inputs (benchmark test cases) before and after a change to the software would normally produce some expected and some unexpected changes in the output. Since the output was much too extensive for a human being to reliable compare, an automated comparison was made to identify "big" changes. My suggestion is that you go through the various categories of measurements to your problem and assign to them some "modelling" labels: - absolute versus relative: Should the threshold of "big" change be defined for this category in absolute or relative terms, i.e. X(1) - X(2) or the percentage difference of X(2) with X(1)? - primary versus secondary: Is the category a primary indicator, something central to the business purpose to be monitored (e.g. "downtime" or perhaps "dropped connections"), or is it secondary, either in the sense of being of peripheral importance or being a kind of intermediate/explanatory value that might be ignored unless related primary indicators demonstrate a "big" change? This is the sort of thing I meant by saying it is a "modelling" question, rather than a "statistical" one. As the "domain expert" you would need to be the lead in assigning these categories. I could certainly provide guidance on how to implement an automated review of the two data sets, using your guidelines. Is this the sort of help you are interested in? regards, mathtalk-ga


  • PPS - This program will also let you make graphs - which will help you see those 'outliers' (or values you think might be significant to look at for further investigation. Makes it a lot easier. -Rebekah


  • From a true statistical point of view, no, it does not make sense. Let me make sure I understand the setup. At two different points in time, you take a large set of "measurements" on a complex system (network). For example, there might be a count of users logged in, the number of files open on a file server, the number of memory pages swapped out on a database server, etc. Almost all of the corresponding numbers at the two points in time differ. You ask for a way to know which differences are most likely to be worth noticing. It is not much a statistical problem, in so far as statistics deals with repeated measurements, because each distinct measurement is only taken twice. What you have is a modelling problem. You need a model or "hypothesis" relating all these varied measurements to help formulate a notion of whether a difference in measurements is significant or not. Distinguishing variations that are relatively large or small versus ones that are absolutely large or small is probably a good first step, but it is far from the whole story. Let's turn the question around and ask this. Suppose someone with a Crystal Ball could tell you unequivocally, these pair of measurements exhibit the most significant difference. What would you do with that information? How would you proceed? What "investigation" concerning the difference between those two numbers would be possible? Or worthwhile? Those are the sorts of issues that a "model" addresses. A model of human physiology, for example, tells us that a 10 percent variation in blood temperature is more significant than a 10 percent variation in blood sugar, and that a one uint change in blood pH is more significant that a one unit change in blood volume. So we would need to know more about the "context" of your measurements to decide whether a variation has significance or not, or more to the point, whether a pattern or "constellation" of changes in measurement indicates an underlying event of importance (e.g. a "viral" attack either in the human patient or on a network). regards, mathtalk-ga


  • oh, and I am processing this with a program, but it's probably not going to be worth the effort in my situation if the expression is more than a single line. I'd like to be able to set an arbitrary threshold and say "show me the differences that rank > 80 out of a possible score of 100."


  • As stated, since you are only wanting to compare two numbers, this is not a question of statistics but of a discrete derivative. Your first attempt, a_{i+1} - a_i, was not sufficient. Your second attempt, 200*a_{i+1} - a_i/(a_{i+1} + a_i + 1) worked fairly well, but gave too much weight to small values of a_i and not enough weight to large values of a_i. If you want the difference to be between [0,100] and be dependent on the relative size of a_i, a global maximum is needed. Then, you could use: 20000 * (2*max - a_{i+1} - a_i)/max * (a_{i+1} - a_i/(a_{i+1} + a_i + 1)). You could of course very the size weighting by (2*max - a_{i+1} - a_i)/max to either a fractional power (e.g. 1/2) or a positive power (e.g. 2). The higher power applying more weight to size and a lower power applying less weight to the size. It should be noted, that the constant 20000 would have to be varied if the power is changed.





  • Nortel Unveils Vision, Strategy for Israeli High-Performance Net
    Busy Friday Leads to Strong Close for Net Stocks

    PRINT Add to favorites
  • mcdata completes acquisition of sanavigator
  • quadratec changes name to atempo
  • vixel and sagitta announce partnership agreement
  • 3pardata completes its executive team
  • veritas introduces two new software capabilities
  • ecrix announces 2u tape autoloader
  • vitesse introduces device to control monitor storage enclosure environment
  • inrange fibre channel director qualified by ibm
  • companies demonstrate coast to coast ip storage interoperability
  • storability introduces storage services delivery platform
  • auspex reports on data storage trends
  • qlogic launches santrack service and support program
  • pirus networks secures additional funding
  • hp completes acquisition of storageapps
  • managedstorage international launches gridworks
  • dantz awarded compaq storageworks value partner status
  • mcdata opens office in france
  • jni corporation announces stock repurchase program
  • nstor announces partnership with adaptec
  • emc cuts 2 400 workers q3 loss likely
  • auspex file servers certified with atempo s time navigator
  • trend micro announces nas antivirus solution
  • bigstorage annoucnes remote data mirroring solution
  • benchmark introduces rack optimized dlt autoloader
  • procom and syncsort announce certified tape backup solutions
  • qlogic fortifies executive management team
  • dot hill debuts sannet axis in europe
  • compaq demos global storage network
  • #If you have any other info about this subject , Please add it free.#
    Your name:
    E-mail:
    Telphone:

    Your comments:


    If you have any other info about statistics, measure of significance , Please add it free.
    About us |Contact us |Advertisement |Site map |Exchange links
    Copyright© 2008gigj.com All Rights Reserved