<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>INSIGHTS from Pistoia Alliance</title>
	<atom:link href="http://www.pistoiaalliance.com/blog/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.pistoiaalliance.com/blog</link>
	<description></description>
	<lastBuildDate>Fri, 04 May 2012 13:30:48 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1.2</generator>
		<item>
		<title>Application Security: Getting the Most from Your Assessment</title>
		<link>http://www.pistoiaalliance.com/blog/2012/05/application-security-getting-the-most-from-your-assessment/</link>
		<comments>http://www.pistoiaalliance.com/blog/2012/05/application-security-getting-the-most-from-your-assessment/#comments</comments>
		<pubDate>Fri, 04 May 2012 13:30:22 +0000</pubDate>
		<dc:creator>Simon Thornber</dc:creator>
				<category><![CDATA[Sequencing & omics]]></category>
		<category><![CDATA[The life science cloud]]></category>
		<category><![CDATA[cloud computing]]></category>
		<category><![CDATA[ethical hacking]]></category>
		<category><![CDATA[proofs of concept]]></category>
		<category><![CDATA[security]]></category>
		<category><![CDATA[sequence services]]></category>

		<guid isPermaLink="false">http://www.pistoiaalliance.org/blog/?p=310</guid>
		<description><![CDATA[Once again we hear from Michael Klepper, who headed up the security assessment for the Sequence Services Phase 2 proofs of concept. In his last entry, Michael discussed the nature of modern security threats. In this entry, he discusses AT&#38;T’s &#8230; <a href="http://www.pistoiaalliance.com/blog/2012/05/application-security-getting-the-most-from-your-assessment/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><em>Once again we hear from Michael Klepper, who headed up the security assessment for the Sequence Services Phase 2 proofs of concept. In his last entry, Michael discussed the <a href="http://www.pistoiaalliance.org/blog/2012/04/application-security-how-modern-hackers-hack/" target="_blank">nature of modern security threats</a>. In this entry, he discusses AT&amp;T’s general approach to assessing application security and how organizations can ensure they get the most value out of security assessments and ethical hacking.</em></p>
<div id="attachment_306" class="wp-caption alignleft" style="width: 90px"><a href="http://www.pistoiaalliance.org/components/com_wordpress/wp/wp-content/uploads/2012/04/mike-klepper-thumbnail.jpg"><img class="size-full wp-image-306" title="mike-klepper thumbnail" src="http://www.pistoiaalliance.org/components/com_wordpress/wp/wp-content/uploads/2012/04/mike-klepper-thumbnail.jpg" alt="Mike Klepper" width="80" height="80" /></a><p class="wp-caption-text">AT&amp;T&#39;s Mike Klepper</p></div>
<p>Our standard approach to security testing hits two sets of common vulnerabilities compiled by two industry leading organizations: the <a href="https://www.owasp.org/index.php/Category:OWASP_Top_Ten_Project" target="_blank">Open Web Application Security Project</a> (OWASP) and the <a href="http://www.sans.org/top25-software-errors/" target="_blank">SANS top 25</a>. While these two lists have a lot of overlap, they do take different views on security threats, with SANS focusing more on compiled applications, while OWASP centers exclusively on web applications. We usually start out by giving systems we’re evaluating an initial solid once over against these criteria and then will provide recommendations for further testing.</p>
<p>Strikingly, the vast majority of applications we assess are found to contain significant security flaws.  In particular, applications which have not been developed with strong security controls and processes as part of the development life cycle tend to have a high number of issues.  In cases such as this, after finding a significant number of issues, the focus of the engagement tends to shift away from finding additional bugs and more toward developer awareness and education so they can fix the issues.</p>
<p>Not only are secure applications rare, but organizations tend to make the same mistakes over and over. That’s because developers do what they do the way they know how to do it—and they’ll keep doing it that way until they are told or taught to do something differently. Additionally, it’s quite common for developers and their organizations to view our assessments as “bug fix” suggestions. They fix what we found, but don’t address the underlying processes and assumptions that led to those flaws existing in the first place. Consequently, the next time we get an application to evaluate from that organization, we’ll find many of the same flaws again.  Organizations need to take a strategic and programmatic approach to the application security problem in order to be successful.</p>
<p>Organizations get more from a security assessment if they can use the findings to provide feedback to and drive change in development. Rather than viewing our report as a list of tactical things to fix, organizations should consider the strategic impact of our suggestions. Application security is constantly evolving, with bad guys constantly finding new and interesting ways to hack in. So if you were vulnerable to issues A, B, and C the last time, you want to shore up your development so that your next application is secure against those—putting you in a better position to respond to issues D, E, and F.</p>
<p>Additionally, security assessments can provide specific feedback on development processes that can help organizations be more efficient. Certain issues, for instance, may reveal process or educational issues to address in your development organization. For instance, certain vulnerabilities can indicate change control problems, such as issues due to missing patches on servers or the way network routers are configured. Other vulnerabilities might indicate a problem in code development, prompting an organization to talk to developers about their processes or redirect resources based on a development team’s expertise or track record.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.pistoiaalliance.com/blog/2012/05/application-security-getting-the-most-from-your-assessment/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Application Security: How Modern Hackers Hack</title>
		<link>http://www.pistoiaalliance.com/blog/2012/04/application-security-how-modern-hackers-hack/</link>
		<comments>http://www.pistoiaalliance.com/blog/2012/04/application-security-how-modern-hackers-hack/#comments</comments>
		<pubDate>Fri, 27 Apr 2012 16:22:59 +0000</pubDate>
		<dc:creator>Simon Thornber</dc:creator>
				<category><![CDATA[Sequencing & omics]]></category>
		<category><![CDATA[The life science cloud]]></category>
		<category><![CDATA[cloud computing]]></category>
		<category><![CDATA[ethical hacking]]></category>
		<category><![CDATA[proofs of concept]]></category>
		<category><![CDATA[security]]></category>
		<category><![CDATA[sequence services]]></category>

		<guid isPermaLink="false">http://www.pistoiaalliance.org/blog/?p=305</guid>
		<description><![CDATA[Security testing has been an important step in both Phase 1 and Phase 2 of the Sequence Services project. Last year, AT&#38;T Consulting application security services evaluated the four Phase 1 proofs of concept, and we called on them again &#8230; <a href="http://www.pistoiaalliance.com/blog/2012/04/application-security-how-modern-hackers-hack/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><em>Security testing has been an important step in both Phase 1 and Phase 2 of the <a href="http://www.pistoiaalliance.org/workinggroups/sequence-services.html" target="_blank">Sequence Services</a> project. Last year, AT&amp;T Consulting application security services evaluated the four Phase 1 proofs of concept, and we called on them again this year to evaluate the three systems developed for next-generation sequencing by HP, Constellation/Genestack, and Eagle Genomics/Cycle Computing. The results were presented at the <a href="http://www.pistoiaalliance.org/Past-Event-Highlights/24-april-2012-pistoia-alliance-annual-conference.html" target="_blank">Pistoia Alliance Conference</a> on April 24, but I thought it would be interesting to hear from Michael Klepper, who is leading the security testing, about the value of security assessment and AT&amp;T’s specific approach. In this first of two blog entries, Michael explains the nature of current security threats—they aren’t what you might expect!</em></p>
<p><a href="http://www.pistoiaalliance.org/components/com_wordpress/wp/wp-content/uploads/2012/04/mike-klepper-thumbnail.jpg"><img class="alignleft size-full wp-image-306" title="mike-klepper thumbnail" src="http://www.pistoiaalliance.org/components/com_wordpress/wp/wp-content/uploads/2012/04/mike-klepper-thumbnail.jpg" alt="Mike Klepper" width="80" height="80" /></a>When organizations think about the security of their data and applications, they often consider things like firewalls and antivirus filters and intrusion detection. But for applications like the ones developed for Sequence Services Phase 2 to have value, people have to be able to reach them, which means they have to get past those protections. Applications, though, are like Mr. Spock (<a href="http://www.youtube.com/watch?v=wlMegqgGORY" target="_self">well, most of the time</a>). They operate on pure logic, doing what they are told with no feelings, no second thoughts, and limited ability to undo what they’ve been asked to do. And that’s what the bad guys are working to exploit today. They use flaws in application design and development and abuse the trust organizations have to put in their users to manipulate applications in order to expose data that organizations assume is protected.</p>
<p>Ten years ago, hackers got satisfaction by writing worms or viruses to take advantage of an unpatched security flaw. They’d take down a host or underlying web server, which would take down a website, and then they’d launch other attacks from there. This type of attack isn’t as prevalent today. The penalties for hacking are so severe that for a hacker to hack, the reward has to balance the risk. For a modern hack to pay off, it has to be monetized.</p>
<p>Hackers today capitalize on the inherent value data has to organizations. They’ll aim to fraudulently obtain IP to sell to a competitor. They’ll leverage an attack to blackmail an organization. Or they’ll attack an organization’s customers or users directly using the website, gaining personal information that they can sell elsewhere or use to commit fraud.</p>
<p>They get pretty creative. For instance, hackers will attack a call center by setting up a denial of service on the phone lines. When customers phone the center, they get a busy signal. Then the hacker will call and demand a certain amount of cash to free up the phone lines.</p>
<p>From these examples you can see that where once we mainly heard about credit card companies getting hacked, today hackers have moved from credit to debit to ACH transactions to medical and drug fraud. Just by collecting personal and business information from users, maybe by throwing up a screen on a website requesting certain data, hackers can set up fake clinics or labs and file false insurance claims. And it works because these industries aren’t as prepared. Security isn’t as front of the mind—which means more options (and easier money) for a hacker.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.pistoiaalliance.com/blog/2012/04/application-security-how-modern-hackers-hack/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Sequence Squeeze Entrants: Finite-context modeling</title>
		<link>http://www.pistoiaalliance.com/blog/2012/04/sequence-squeeze-entrants-finite-context-modeling/</link>
		<comments>http://www.pistoiaalliance.com/blog/2012/04/sequence-squeeze-entrants-finite-context-modeling/#comments</comments>
		<pubDate>Wed, 18 Apr 2012 02:39:40 +0000</pubDate>
		<dc:creator>Simon Thornber</dc:creator>
				<category><![CDATA[Bioinformatics]]></category>
		<category><![CDATA[Sequencing & omics]]></category>
		<category><![CDATA[compression algorithms]]></category>
		<category><![CDATA[next-generation sequencing]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[sequence squeeze]]></category>

		<guid isPermaLink="false">http://www.pistoiaalliance.org/blog/?p=299</guid>
		<description><![CDATA[While many of the Sequence Squeeze entries came from individuals, one contribution came from an institution: the department of electronics, telecommunications, and informatics and the Institute of Electronics and Telematics Engineering (IEETA) at the University of Aveiro, Portugal. The IEETA &#8230; <a href="http://www.pistoiaalliance.com/blog/2012/04/sequence-squeeze-entrants-finite-context-modeling/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><em>While many of the Sequence Squeeze entries came from individuals, one contribution came from an institution:  the department of electronics, telecommunications, and informatics and the Institute of Electronics and Telematics Engineering (IEETA) at the University of Aveiro, Portugal. The IEETA team comprised Armando Pinho, associate professor and director of the IEETA, Diogo Pratas (doctoral student), João Rodrigues (assistant professor), and Paulo Ferreira (full professor). Their code included parts already used in other image and DNA sequence coding projects (developed by Armando), parts that all four helped develop, and an arithmetic coder module written by John Carpinelli, Wayne Salamonsen, Lang Stuiver and Radford Neal in 1995.</em></p>
<p><em>And just a reminder: We&#8217;ll be announcing the winner of the competition next week at the Pistoia Alliance Conference in Boston! Thanks to all the entrants who submitted blog posts on their work.<br />
</em></p>
<div id="attachment_300" class="wp-caption alignleft" style="width: 210px"><a href="http://www.pistoiaalliance.org/components/com_wordpress/wp/wp-content/uploads/2012/04/IEETA_small.jpg"><img class="size-full wp-image-300" title="IEETA_small" src="http://www.pistoiaalliance.org/components/com_wordpress/wp/wp-content/uploads/2012/04/IEETA_small.jpg" alt="" width="200" height="141" /></a><p class="wp-caption-text">The IEETA team (from left) Diogo, João, Armando, and Paulo. They haven&#39;t yet tried running their algorithm on the Mac in the corner.</p></div>
<p>Last October, a colleague brought the Pistoia Alliance Competition to my attention. While I decided right away that we should participate and discussed the matter with colleagues interested in computational biology, we only started working on code in January. Then, when we finally submitted our first entry—a very simple encoder—on February 23, 2012, it failed. And so did the second, third, and fourth tries! We were starting to despair, but at last, our fifth submission was accepted: 0.2209 compression ratio, with a very long compression time.</p>
<p>Right from the beginning, our strategy was to use finite-context modeling and arithmetic coding. We had experience with these models and had obtained very good results with them, and we didn’t have time to pursue other solutions. Although we were confident that our approach could compress DNA bases well, we expected it to have trouble handling the other two components, i.e., headers and quality scores.</p>
<p>In our <a href="http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0021588" target="_blank">previous work on DNA sequence compression</a> we had combined several finite-context models of different depths, due to the highly non-stationary nature of sequence data. We used several models to competitively encode a given block of data, typically one hundred bases. The idea was to try the models on each block and pick the one that gave the lowest bitrate. To avoid having to transmit the ID of the model to the decoder, our more recent work employed a backwards adaptive strategy: instead of choosing the best model and having to send additional model information, we blend the probability estimates of all models. To properly weigh each model according to its recent performance, we have been using an exponentially decaying forgetting function.</p>
<p>The first version of our FASTQ encoder used four finite-context models combined with the aforementioned blending procedure and applied these to the file as a whole, i.e., without separating the different sources of data.  The second version, with source separation, was submitted ten days after the first one, but it did not finish executing, allegedly due to excessive use of memory. This memory issue has been puzzling us ever since. According to our calculations, we should have been using about one fourth of the memory reported by the automated judging system. We had a number of hypotheses about what might be going on, including the existence of symbols other than &#8220;ACGTN&#8221;.  By reducing the size of the deepest context for the bases component to 14 (it had been 15) we finally succeeded and achieved a 0.1730 of compression ratio. The deadline was just five days away.</p>
<p>To compress the headers and the quality scores we also used mixtures of finite-context models. To explore the strong correlation between the header of a given read and that of the next one, our context configuration gathered information not only from the header of the current read but also from the previous one. Also, because of the position dependent statistics of both the headers and quality scores, we implemented what we call multi-state finite context models, where the state is correlated with the position along the read. This allowed us to reduce the compression ratio down from an initial 0.1813 (entry number 65, still without a working decoder) to 0.1730.</p>
<p>However, we knew that better compression could be attained using a context of size 15 for modeling the bases. Because the encoder ran on our laptops at depth 15, we guessed the existence of SOLiD files in the testing set (and the corresponding size nine alphabet, i.e., &#8220;ACGT0123.&#8221;). Therefore, we added support for SOLiD files too, but the failures persisted. We obtained FASTQ files from several different sources and used them to test our encoder. Much additional work enabled us to process files with variable size reads and files where the Ns cannot be recovered from the quality scores. Finally, after some more adjustments and disabling the use of the hash tables, we managed to break the 0.17 barrier, just a few hours before the deadline: 0.1695 compression ratio.</p>
<p>The four-fold discrepancy in memory usage reported by the automated judging process was annoying (because it led us to waste time trying to solve non-existent problems) but also fruitful (because it made our encoder much more robust). Our encoder can now handle most, if not all, FASTQ files. We did not optimize the code for speed. It will never be the fastest solution, but it can be made faster than it is now. As an experimental codec, there are lots of parameters to play with. Overall, this competition was rewarding and a lot of fun for us!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.pistoiaalliance.com/blog/2012/04/sequence-squeeze-entrants-finite-context-modeling/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Sequence Squeeze Entrants: fqzcomp and sam_comp</title>
		<link>http://www.pistoiaalliance.com/blog/2012/04/sequence-squeeze-entrants-fqzcomp-and-sam_comp/</link>
		<comments>http://www.pistoiaalliance.com/blog/2012/04/sequence-squeeze-entrants-fqzcomp-and-sam_comp/#comments</comments>
		<pubDate>Mon, 16 Apr 2012 14:47:33 +0000</pubDate>
		<dc:creator>Simon Thornber</dc:creator>
				<category><![CDATA[Bioinformatics]]></category>
		<category><![CDATA[Sequencing & omics]]></category>
		<category><![CDATA[compression algorithms]]></category>
		<category><![CDATA[next-generation sequencing]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[sequence squeeze]]></category>

		<guid isPermaLink="false">http://www.pistoiaalliance.org/blog/?p=295</guid>
		<description><![CDATA[James Bonfield of the Wellcome Trust Sanger Institute has had an interest in data compression since his teenage years working on a “trusty BBC Micro.” More recently, he developed the ZTR file format for compressed ABI capillary traces and the &#8230; <a href="http://www.pistoiaalliance.com/blog/2012/04/sequence-squeeze-entrants-fqzcomp-and-sam_comp/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><em>James Bonfield of the Wellcome Trust Sanger Institute has had an interest in data compression since his teenage years working on a “trusty BBC Micro.” More recently, he developed the ZTR file format for compressed ABI capillary traces and the SRF format for compressed Illumina traces. His Sequence Squeeze entry built on some 2010 work for Sanger.</em></p>
<div id="attachment_296" class="wp-caption alignleft" style="width: 90px"><a href="http://www.pistoiaalliance.org/components/com_wordpress/wp/wp-content/uploads/2012/04/jkb_small.jpg"><img class="size-full wp-image-296" title="jkb_small" src="http://www.pistoiaalliance.org/components/com_wordpress/wp/wp-content/uploads/2012/04/jkb_small.jpg" alt="" width="80" height="110" /></a><p class="wp-caption-text">James Bonfield</p></div>
<p>My first entry in the Sequence Squeeze Competition was based on some compression code that I released on the Sanger ftp site. At the time, it received limited interest. So it was gratifying to see this competition announced—it gave me a chance to put that code to use and, in fact, that code had one of the highest decompression speeds among the entries, though it’s weak on compression ratio.</p>
<p>A common thread between all my entries (and indeed other people’s) is to break the data down into names, sequences, and qualities and apply specific algorithms to each. This thinking moved me beyond using zlib for compression to finally utilizing statistical modeling techniques coupled to a fast range coder from Eugene Shelwien&#8217;s public domain coders6c2.rar (a simple but effective demonstration of modeling and coding, although in the end I replaced all except the 90 lines of range coder itself).</p>
<p>From the start, though, I felt that FASTQ wasn&#8217;t the ideal starting point.  Optimal compression of sequence fragments involves understanding how they fit together, which requires producing a sequence assembly or describing differences between one sequence and a known reference. With enough time I could implement these tools, but they already exist and have been written by cleverer people than myself. Why repeat this work?</p>
<p>Also if we have made the effort to assemble or align data, we should care about preserving this information. Therefore we should offer to provide the alignments in addition to the raw FASTQ output. Enter the SAM/BAM formats: feature-rich formats for storing sequence alignments with wide community support.</p>
<p>This revelation led to a split in my entries. I used fqzcomp as a raw FASTQ compressor and sam_comp as a SAM/BAM compressor.  For the latter I chose bowtie2 as a sequence aligner, but my code should work equally well with other programs or even de novo sequence assemblers. Sam_comp stores names and qualities in much the same way fqzcomp does—it removes common prefixes in names and employs statistical modelling for the rest of the information, with previous names and quality values acting as a context to improve predictions.</p>
<p>Where the two tools differ substantially is with the DNA sequence itself.  Fqzcomp used simple statistical modelling, allowing for reverse complement matches. Sam_comp instead stored the chromosome, position and any differences between the called bases and their corresponding reference bases.  As expected, reference-based compression is a clear winner for file size, but whether it is the &#8220;best&#8221; or &#8220;correct&#8221; solution largely depends on the scenario.</p>
<p>Sam_comp originally sorted data into genomic order, building one model per position with the reference acting as a prior. This meant that the reference was optional, and on deep data largely irrelevant too in regards to the final compression ratios. For later submissions I reverted to storing data in the original order output by the instrument as it reduced the data size, mainly because sequence names get larger and more complex when ordered by genomic position. Personally I feel the original position sorted sam_comp offered more flexibility, so I would consider it the better tool, but your mileage may vary.</p>
<p>I can confidently say my entries benefited by the open nature of the contest. Had the competition been closed, with a score table only being visible after the submission deadline, I might have sat back and waited for the results. Instead, seeing an entry beaten spurred me to improve my submissions—and at the end, we had quite a lot of one-upmanship happening. It was rather exciting!</p>
<p>Coupled with this, the<a href="http://encode.ru/threads/1409-Compression-Competition-15-000-USD" target="_blank"> encode.ru forums</a> had numerous surprisingly open discussions on ideas, particularly from Matt Mahoney who went as far as to post code snippets.  Given the number of views of that thread and downloads of various sourceforge projects, it’s clear the open nature of the competition raised the quality of all the entries.</p>
<p>Taking a retrospective look at the leaderboard, I think <a href="http://www.pistoiaalliance.org/blog/2012/03/sequence-squeeze-entrants-quip/" target="_blank">Daniel</a> had better compression of quality values while on deeply sequenced data <a href="http://www.pistoiaalliance.org/blog/2012/04/sequence-squeeze-entrants-zpaq/" target="_blank">Matt&#8217;s</a> DNA encoding is often ahead.  I suspect the true best program out there will be a combination from multiple authors.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.pistoiaalliance.com/blog/2012/04/sequence-squeeze-entrants-fqzcomp-and-sam_comp/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Sequence Squeeze Entrants: ZPAQ</title>
		<link>http://www.pistoiaalliance.com/blog/2012/04/sequence-squeeze-entrants-zpaq/</link>
		<comments>http://www.pistoiaalliance.com/blog/2012/04/sequence-squeeze-entrants-zpaq/#comments</comments>
		<pubDate>Tue, 03 Apr 2012 20:01:38 +0000</pubDate>
		<dc:creator>Simon Thornber</dc:creator>
				<category><![CDATA[Bioinformatics]]></category>
		<category><![CDATA[Sequencing & omics]]></category>
		<category><![CDATA[compression algorithms]]></category>
		<category><![CDATA[next-generation sequencing]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[sequence squeeze]]></category>

		<guid isPermaLink="false">http://www.pistoiaalliance.org/blog/?p=287</guid>
		<description><![CDATA[One of our Sequence Squeeze entrants is a veteran of compression competitions. Matt Mahoney’s context-mixing compression algorithm implementing in the PAQ series of open source compressors moved to the top of several benchmarks in 2004 and won the Calgary compression &#8230; <a href="http://www.pistoiaalliance.com/blog/2012/04/sequence-squeeze-entrants-zpaq/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><em>One of our Sequence Squeeze entrants is a veteran of compression competitions. Matt Mahoney’s context-mixing compression algorithm implementing in the PAQ series of open source compressors moved to the top of several benchmarks in 2004 and won the </em><a href="http://mailcom.com/challenge/" target="_blank"><em>Calgary compression challenge</em></a><em>. He has also established his own </em><a href="http://mattmahoney.net/dc/text.html" target="_blank"><em>competition for large text compression</em></a><em>. He has focused on data compression as a means of studying AI and believes that the key to compression is understanding what the data means—something at which computers should be able to excel. </em></p>
<div id="attachment_288" class="wp-caption alignleft" style="width: 90px"><a href="http://www.pistoiaalliance.org/components/com_wordpress/wp/wp-content/uploads/2012/04/matt_small.jpg"><img class="size-full wp-image-288" title="matt_small" src="http://www.pistoiaalliance.org/components/com_wordpress/wp/wp-content/uploads/2012/04/matt_small.jpg" alt="" width="80" height="95" /></a><p class="wp-caption-text">Matt Mahoney</p></div>
<p>I introduced ZPAQ in 2009 (writes Mahoney) as a way to solve the problem of incompatibility between PAQ versions, so I was interested in showing that ZPAQ could be used to solve novel problems like the one presented in the Sequence Squeeze Competition. I started with the Wikipedia definition of the FASTQ format and opted to focus on the Sanger variant (Phred+33). I also made some assumptions that were specific to the test files. Among these were that N had a quality score of 0; A,C,G,T were always 2.38; and there were no other bases. Such assumptions aren&#8217;t guaranteed to hold but are unfortunately necessary to get the best compression. I did look at some FASTQ files from other sources, such as the ABI SOLiD (color space) files from the 1000 Genomes Project. Unfortunately my program won&#8217;t support these as-is, but I don&#8217;t think any of the other top submissions will either.</p>
<p>I first heard about this competition from<a href="http://encode.ru/threads/1409-Compression-Competition-15-000-USD" target="_blank"> http://encode.ru/threads/1409-Compression-Competition-15-000-USD</a>. Honestly, there are probably only about 100 people in the world who know how to develop and implement novel data compression algorithms, and probably half of them are on this message board. There was quite a bit of discussion, including posting of approaches, algorithms, code, and experimental results by competitors, especially me and James Bonfield. You might think that such openness would put one at a disadvantage by revealing your secrets to competitors, but I found quite the opposite. I know more about compression and less about DNA sequencing than James does, so we each benefited from the other&#8217;s knowledge.</p>
<p>Interestingly, James and I both wrote pre-processors to split the data into headers, sequences, quality, and alignment to an external reference genome (if any). Mine differs in that I wrote my own alignment algorithm using a simple hash table of 32 base segments. He used bowtie, so got better (but slower) compression, probably by modeling insertions and deletions and by using quality data, which mine doesn&#8217;t do. Also, I used my own entropy coding based on context mixing models implemented in ZPAQ, which I had developed earlier. I believe he used the public domain PPMd. I did some testing of competing programs but didn&#8217;t really look at the source code to see how it worked.</p>
<p>ZPAQ allows you to specify the arrangement of context models and the code to compute their contexts. It is very general, but difficult to use. I did a lot of experiments, first testing with general purpose compressors, then analyzing the statistical properties of the data, then experimenting with different model variations of my own design. It is a process I have used to develop other custom compressors. I had at first experimented with models that did not pre-process, but added that later mainly as a speed optimization. My original goal was to output a single ZPAQ archive that could be read by any ZPAQ compliant decoder, but this did not work out because ZPAQ doesn&#8217;t support post-processing from multiple blocks. (I may have to add that feature later).</p>
<p>I wrote a detailed description of my algorithm <a href="https://docs.google.com/document/pub?id=1f-8C-ZfCUTEsO-EqvlcTXQ0M5aYM61Aet902dA8QZZk">here</a>, for those who want to read more.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.pistoiaalliance.com/blog/2012/04/sequence-squeeze-entrants-zpaq/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Sequence Squeeze Entrants: Quip</title>
		<link>http://www.pistoiaalliance.com/blog/2012/03/sequence-squeeze-entrants-quip/</link>
		<comments>http://www.pistoiaalliance.com/blog/2012/03/sequence-squeeze-entrants-quip/#comments</comments>
		<pubDate>Fri, 30 Mar 2012 18:09:15 +0000</pubDate>
		<dc:creator>Simon Thornber</dc:creator>
				<category><![CDATA[Bioinformatics]]></category>
		<category><![CDATA[Sequencing & omics]]></category>
		<category><![CDATA[compression algorithms]]></category>
		<category><![CDATA[next-generation sequencing]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[sequence squeeze]]></category>

		<guid isPermaLink="false">http://www.pistoiaalliance.org/blog/?p=281</guid>
		<description><![CDATA[As a preview to the announcement of the winner of the Sequence Squeeze Competition, I’ve asked some of the participants to share the rationale and approach of their algorithms. Our first blog comes from Daniel Jones, a graduate student in &#8230; <a href="http://www.pistoiaalliance.com/blog/2012/03/sequence-squeeze-entrants-quip/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><em>As a preview to the announcement of the winner of the Sequence Squeeze Competition, I’ve asked some of the participants to share the rationale and approach of their algorithms. Our first blog comes from Daniel Jones, a graduate student in computer science and engineering at the University of Washington. Jones’s work centers on developing methods to leverage RNA sequencing data to study host response to viral infection. He says a colleague turned him onto the competition because it sounded “right up his alley,” in that it involved both software development and next-generation sequencing. </em></p>
<div id="attachment_282" class="wp-caption alignleft" style="width: 90px"><a href="http://www.pistoiaalliance.org/components/com_wordpress/wp/wp-content/uploads/2012/03/dJones_small.jpg"><img class="size-full wp-image-282 " title="dJones_small" src="http://www.pistoiaalliance.org/components/com_wordpress/wp/wp-content/uploads/2012/03/dJones_small.jpg" alt="" width="80" height="107" /></a><p class="wp-caption-text">Daniel Jones</p></div>
<p>To truly wring the most compression possible from FASTQ data (writes Jones), we need an approach that takes advantage of its unique structure. From a data compression perspective, the most interesting property of sequencing data is that it consists of random fragments of a larger sequence: the genome or transcriptome, for example. There are two obvious ways we could exploit this. If we already know what the larger sequence is, we can store the reads as positions within it, rather than storing each read. If we don&#8217;t know the larger sequence, we can try to assemble it from scratch.</p>
<p>My entry to the Sequence Squeeze competition, an adorable little C program called Quip, takes the latter approach. In cases where we know the reference sequence, we can often achieve better compression. De novo assembly brings with it some significant advantages, though. It is simple to use, completely self-contained, and applicable to any sequencing data. We do not need to know the genome sequence, and we do not need to perform alignment sensitive to splicing. These are especially useful properties for metagenomics in which a wide variety of species might be sampled in one lane, or in RNA-Seq, where spliced transcripts are often not well annotated and can be difficult to discover.</p>
<p>I began to think about assembly shortly after I heard about the competition, assuming that no one else would be dumb enough to try it (incidentally, a correct prediction). The first step of most assembly algorithms is to count the number of occurrences of short subsequences, or k-mers, in the sequenced reads. The primary reason no one assembles a genome on a laptop during a coffee-break is that counting billions of k-mers requires a lot of time and memory. At first glance, that makes it an impractical option for compression. Switching from gzip to Quip is enough of a hassle without having to upgrade to a supercomputer to do so.</p>
<p>Even if only a subset of the reads are assembled, I knew I would need to find a way to count k-mers more efficiently than the hash table based approaches used by existing assemblers. After a bit of research (i.e., reading all the data structure articles on wikipedia), I hit upon just the thing: the counting Bloom filter.</p>
<p>Though common in some applications&#8211;for example, the Bloom filter variant used in Quip was invented by researchers at Cisco to make routers route faster&#8211;it has seen almost no use in computational biology. This is partly due to its obscurity, and partly because, put simply, the Bloom filter is just weird. It works very much like a hash table, but with the bizarre property that occasionally, at random, it mixes up its k-mers and reports the wrong answer. But, if we put up with its eccentricity, we get a data structure that takes up vastly less space than a typical hash table and makes assembling reads a practical approach to compression. You don&#8217;t need to buy a supercomputer. In fact, I did nearly all of the development and testing of Quip on my humble little MacBook!</p>
<p>Assembling a subset of the reads and storing many reads as alignments saves significant space, especially in RNA-Seq where there is less to assemble. The rest of the compression gains came mostly by exploiting rather simple statistical modeling of the data with arithmetic coding, an extremely powerful compression method that has until recently been encumbered by a deluge of a frivolous patents. Building individual, specialized models for read IDs, quality scores, and nucleotide sequences gives huge gains over general compression programs that just lump everything together. The net result is a program which is leaps ahead of gzip in terms of compression, but does not sacrifice speed.</p>
<p>With an exponential increase in the amount of sequencing data being generated, even moderate improvements in compression will translate into a significant reduction in the cost of storage and management over time. Choosing to adopt new compression algorithms like Quip may not be easy, but it’s well worth the reward.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.pistoiaalliance.com/blog/2012/03/sequence-squeeze-entrants-quip/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Do Biomarker Exchange Standards Exist?</title>
		<link>http://www.pistoiaalliance.com/blog/2012/03/273/</link>
		<comments>http://www.pistoiaalliance.com/blog/2012/03/273/#comments</comments>
		<pubDate>Fri, 23 Mar 2012 20:47:06 +0000</pubDate>
		<dc:creator>John Wise</dc:creator>
				<category><![CDATA[Bioinformatics]]></category>
		<category><![CDATA[Standards & deliverables]]></category>
		<category><![CDATA[assays]]></category>
		<category><![CDATA[biomarker exchange]]></category>

		<guid isPermaLink="false">http://www.pistoiaalliance.org/blog/?p=273</guid>
		<description><![CDATA[Last week I was part of a very interesting email exchange on the topic of biomarker exchange standards. The correspondents were Sandor Szalma of Johnson &#38; Johnson, who is heading up the embryonic working group on this subject, and James &#8230; <a href="http://www.pistoiaalliance.com/blog/2012/03/273/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.pistoiaalliance.org/components/com_wordpress/wp/wp-content/uploads/2012/01/johnwise.jpg"><img class="alignleft size-full wp-image-234" title="johnwise" src="http://www.pistoiaalliance.org/components/com_wordpress/wp/wp-content/uploads/2012/01/johnwise.jpg" alt="John Wise head shot" width="80" height="80" /></a>Last week I was part of a very interesting email exchange on the topic of biomarker exchange standards. The correspondents were Sandor Szalma of Johnson &amp; Johnson, who is heading up the <a href="http://www.pistoiaalliance.org/blog/2012/01/kicking-off-new-working-groups/" target="_blank">embryonic working group on this subject</a>, and James McGurk, director of informatics at Daiichi Sankyo Pharma Development. Because for every person brave enough to ask a question there are often many others silently asking it, I thought I&#8217;d provide the exchange in full. We welcome further questions on this topic; please leave a comment.</p>
<p><strong>James McGurk</strong>: I think Sandor Szalma’s presentations about biomarker data exchange have been extremely good and stimulated much-needed discussion.  It is clear that this is a pressing need in both drug discovery and development.  Since I spent many years in discovery informatics before moving to development I am familiar with the different approaches these organizations have regarding the exchange and use of data. One thing that has puzzled me about this entire discussion is this:  It appears that we are, primarily, discussing exchange of experimental results: subject/patient/organism phenotype data, laboratory tests, genetic and genomics results, bio-images, etc. If that is the case I am wondering why we are focusing on data exchange standards. These types of results are routinely exchanged as part of development projects. Standards organizations such as CDISC and HL7 have deep experience developing and maintaining operational systems precisely for this purpose. And although the exchange standards are far from perfect, most or all of the key elements and infrastructure are already in place and in widespread use. It seems to me we run the risk of trying to solve problems that have already been solved.  I think it would behoove us to explore whether these existing systems might be fit-for-purpose.  If they are not an exact fit, it is possible that slight modifications will allow them to serve.</p>
<p>I recognize, based on my experiences in discovery informatics, that most discovery organizations have internally developed data management systems which can be very idiosyncratic.  This means that data exchange with outside collaborators of vendors can be quite difficult.  As biomarker work is increasingly outsourced even in discovery, this may become untenable.  Organizations will have to move to adopt more generic data models.  This has been the path in development for the past decade. Unless (as is certainly possible) I do not understand fully what others mean by “biomarkers” I think we might benefit from consideration of paths other than the creation <em>de novo</em> of another standard.  I certainly can see the need to develop systems to integrate different data types into a “biomarker signature”, the creation of controlled vocabularies for metadata which might work as part of existing standards, or the development of other meta-standards. At this point, though, I think it’s important to clarify whether or not biomarkers are different enough from the types of data we routinely manage and exchange that they require their own set of standards.</p>
<p><strong>Sandor Szalma</strong>: To address several of your points above. Indeed, the various experimental results are routinely exchanged between organizations, but how this exchange occurs varies from organization to organization. Since pharma is now more focused on managing large amounts of parallel collaborations, managing all of these data without standards is a problem. Yes, certain organizations have taken on standards development, but data outside of clinical trials safety/efficacy and not directly involved in health care currently do not have an appropriate organization trying to standardize. So this is an opportunity. Even if, as you state, exchange standards for some of these are in place, that has not been my experience within Janssen and it seems that many people share this experience with me in other pharmas, too. It is possible that is solved, just that solution is not widely known/adopted. Part of this workstream&#8217;s task is to explore the existing standard space and perhaps find a way to bootstrap this project by simple modification of existing standards if they indeed exist or are available. We do not intend to create a new standard where there is already an existing one or where a simple modification could do the job. It is not &#8220;standards for standards&#8217; sake&#8221; but a much more utilitarian approach&#8211;we have a business problem and we need is solved (yesterday).</p>
<p>As you state in your last sentence, a key first step would be to determine if biomarker data is different enough from other data we know how to manage and exchange. I&#8217;m not aware of the existence of a generic data exchange standard supporting biological assays&#8211;of which biomarkers are a special case. Am I wrong on this point? Do you know of such a standard?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.pistoiaalliance.com/blog/2012/03/273/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Sequence Squeeze Countdown</title>
		<link>http://www.pistoiaalliance.com/blog/2012/03/sequence-squeeze-countdown/</link>
		<comments>http://www.pistoiaalliance.com/blog/2012/03/sequence-squeeze-countdown/#comments</comments>
		<pubDate>Tue, 13 Mar 2012 23:00:49 +0000</pubDate>
		<dc:creator>Simon Thornber</dc:creator>
				<category><![CDATA[Bioinformatics]]></category>
		<category><![CDATA[Sequencing & omics]]></category>
		<category><![CDATA[compression algorithms]]></category>
		<category><![CDATA[next-generation sequencing]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[sequence squeeze]]></category>

		<guid isPermaLink="false">http://www.pistoiaalliance.org/blog/?p=267</guid>
		<description><![CDATA[If you haven&#8217;t been tracking the Sequence Squeeze Competition, you&#8217;ll want to watch the leaderboard over the next few days. Entries have been flying in since the weekend, and only today, the long-standing leading algorithm was pushed off the top &#8230; <a href="http://www.pistoiaalliance.com/blog/2012/03/sequence-squeeze-countdown/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.pistoiaalliance.org/components/com_wordpress/wp/wp-content/uploads/2012/02/SeqSqueeze_smaller.jpg"><img class="alignleft size-full wp-image-246" title="SeqSqueeze_smaller" src="http://www.pistoiaalliance.org/components/com_wordpress/wp/wp-content/uploads/2012/02/SeqSqueeze_smaller.jpg" alt="Sequence Squeeze Competition logo" width="250" height="77" /></a>If you haven&#8217;t been tracking the Sequence Squeeze Competition, you&#8217;ll want to watch the <a href="http://www.sequencesqueeze.org/" target="_blank">leaderboard</a> over the next few days. Entries have been flying in since the weekend, and only today, the long-standing leading algorithm was pushed off the top slot!</p>
<p>It&#8217;s just 48 hours until the competition close on March 15. Remember that all entries are subject to adjudication by our judging panel: Yingrui Li of BGI, Guy Coates of the Wellcome Trust Sanger Institute, Tim Fennell of the Broad Institute, and Nick Lynch of the Pistoia Alliance. They&#8217;ll be evaluating not just how the algorithms perform, but the usability and novelty of the code, as described <a href="http://www.sequencesqueeze.org/terms/index.html" target="_blank">here</a>.</p>
<p>Everyone is welcome to attend the <a href="http://www.pistoiaalliance.org/2012-Events/24-april-2012-pistoia-alliance-annual-conference.html" target="_blank">Pistoia Alliance Conference</a> in Boston, where we&#8217;ll be announcing the competition&#8217;s winner. In the meantime, coders, keep on coding!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.pistoiaalliance.com/blog/2012/03/sequence-squeeze-countdown/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Another Embryonic Working Group: Standards for Exchanging Screening Data</title>
		<link>http://www.pistoiaalliance.com/blog/2012/03/262/</link>
		<comments>http://www.pistoiaalliance.com/blog/2012/03/262/#comments</comments>
		<pubDate>Fri, 02 Mar 2012 20:58:52 +0000</pubDate>
		<dc:creator>John Wise</dc:creator>
				<category><![CDATA[Collaboration & community]]></category>
		<category><![CDATA[Standards & deliverables]]></category>
		<category><![CDATA[CROs]]></category>
		<category><![CDATA[information ecosystem]]></category>
		<category><![CDATA[screening data]]></category>

		<guid isPermaLink="false">http://www.pistoiaalliance.org/blog/?p=262</guid>
		<description><![CDATA[There have been some exciting developments in the half-month since the Pistoia Alliance face-to-face meeting of the board and operational team. Just today, the fledgling working group on Biomarker Exchange Standards held its third meeting. And next Wednesday another working &#8230; <a href="http://www.pistoiaalliance.com/blog/2012/03/262/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.pistoiaalliance.org/components/com_wordpress/wp/wp-content/uploads/2012/01/johnwise.jpg"><img class="alignleft size-full wp-image-234" title="johnwise" src="http://www.pistoiaalliance.org/components/com_wordpress/wp/wp-content/uploads/2012/01/johnwise.jpg" alt="John Wise head shot" width="80" height="80" /></a>There have been some exciting developments in the half-month since the Pistoia Alliance face-to-face meeting of the board and operational team. Just today, the fledgling working group on Biomarker Exchange Standards held its third meeting. And next Wednesday another working group spawned from the <a href="../../../../Previous-2011-Events/infoecoworkshop.html">Information Ecosystem Workshop</a> last October in Hannover will have its kickoff meeting. This group is aimed at developing standards for exchanging screening data, and we already have over 90 people <a href="http://screeningstandardstc1.eventbrite.com/">signed up for the teleconference</a>.</p>
<p>I asked David Kniaz, director of business architecture at Merck and the one spearheading this topic, to share a bit of information on this topic.</p>
<p><strong>What problem is driving this topic? </strong></p>
<p>To facilitate drug development and lower the cost and risk associated with launching new drugs, pharmaceutical companies are virtualizing and externalizing R&amp;D through various types of partnerships.  Screening externalization is a particular area of focus within drug discovery programs. Yet no established standards exist supporting consistency of data formats and naming conventions in this area. As a result, significant one-off effort is required to determine how data and physical samples can be exchanged—effort that slows discovery project cycle time, increases cost, and reduces efficiency and data quality.</p>
<p><strong>What is this group’s aim?</strong></p>
<p>The proposed project, which the Alliance will be devoting resources to accelerate, is targeted to improve discovery project efficiency through the development of standards to simplify and improve internal and external screening data exchange.  The initial proposed focused areas include &#8220;routine&#8221; primary and secondary in vitro pharmacology assays typically outsourced by large pharmaceutical organizations to CROs. CRO data exchange is clearly important, but it’s not the only aspect to consider. As one enthusiastic member of this initial team put it, “I can’t even get my assay data management system to play in the sandbox with my ELN!”</p>
<p><strong> </strong></p>
<p><strong>Who should be involved? </strong></p>
<p>I’m thrilled at the response to the teleconference so far. Clearly, this issue is resonating in the community. We’ll require a range of subject-matter expert resources to provide input into the scope of the project and develop the proposal and business case. We’re also looking for opportunities to leverage existing standards where possible and to partner with other organizations such as the Society for Lab Automation and Screening (SLAS).</p>
]]></content:encoded>
			<wfw:commentRss>http://www.pistoiaalliance.com/blog/2012/03/262/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Getting Things Done at the Pistoia Alliance F2F</title>
		<link>http://www.pistoiaalliance.com/blog/2012/02/getting-things-done-at-the-pistoia-alliance-f2f/</link>
		<comments>http://www.pistoiaalliance.com/blog/2012/02/getting-things-done-at-the-pistoia-alliance-f2f/#comments</comments>
		<pubDate>Fri, 24 Feb 2012 15:07:35 +0000</pubDate>
		<dc:creator>Nick Lynch</dc:creator>
				<category><![CDATA[Alliance news & milestones]]></category>
		<category><![CDATA[Collaboration & community]]></category>
		<category><![CDATA[bioitworld]]></category>
		<category><![CDATA[conferences]]></category>
		<category><![CDATA[meetings]]></category>
		<category><![CDATA[membership]]></category>

		<guid isPermaLink="false">http://www.pistoiaalliance.org/blog/?p=256</guid>
		<description><![CDATA[So what is the face-to-face meeting of the Pistoia Alliance Board and Operational Team that Michael referred to in his latest entries? It’s ironic—when we held the first such meeting in 2010, it garnered a meaty story in BioITWorld. We &#8230; <a href="http://www.pistoiaalliance.com/blog/2012/02/getting-things-done-at-the-pistoia-alliance-f2f/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.pistoiaalliance.org/components/com_wordpress/wp/wp-content/uploads/2011/05/nicklynch-e1321556194472.jpg"><img class="alignleft size-full wp-image-118" title="nicklynch" src="http://www.pistoiaalliance.org/components/com_wordpress/wp/wp-content/uploads/2011/05/nicklynch-e1321556194472.jpg" alt="" width="112" height="100" /></a>So what is the face-to-face meeting of the Pistoia Alliance Board and Operational Team that Michael referred to in his <a href="../../../../blog/2012/02/nick-lynch-receives-first-pistoia-alliance-leadership-award/">latest </a><a href="../../../../blog/2012/02/dragons%e2%80%99-den-roars-at-royal-society-of-chemistry/">entries</a>? It’s ironic—when we held the first such meeting in 2010, it garnered a <a href="http://www.bio-itworld.com/issues/2010/jan/first-base.html">meaty story in <em>BioITWorld</em></a>. We were flattered, but a bit perplexed about why that meeting is what put us on BioITWorld’s radar. To us, as a virtual organization, it seemed completely natural and incredibly important that our board and operational team officers get together face-to-face. Yes, all of us know each other and frequently meet up at conferences and other industry events. But the face-to-face offers a unique time for all of us to get in the same room, plan and strategize around Pistoia’s future delivery, and come to consensus.</p>
<p>What’s probably bigger news (are you listening, <a href="https://twitter.com/#%21/search/users/%40bioiteditor">@BioITEditor</a>?) is that these meetings have continued to happen. Last year, the face-to-face resulted in the board establishing a formal mission statement and several project plans that we showcased at our <a href="../../../../Previous-2011-Events/12-april-2011-pistoia-alliance-conference.html">first Pistoia Conference and Members’ Meeting</a> last April and pushed through to the<a href="../../../../workinggroups/sequence-services.html"> Sequence Services Phase 2</a> and <a href="http://www.sequencesqueeze.org/">Sequence Squeeze</a> workstreams that are completing now.</p>
<p>This year’s meeting had palpable energy and action. The ability to tack to the mission statement kept the discussion aligned and focused, and we are exploring a variety of new projects that have the potential to have an enormous impact on our two big aims: lowering barriers to innovation and enabling interoperable business processes. And everyone in the group agreed that they keep coming back because of the quality of the people at the table and what we can do together. The Alliance isn’t just one segment of the industry talking to itself—we have a mix of people offering different perspectives representing key components of the ecosystem. Most importantly, our sequence-oriented projects are proving that the Alliance is not another talking shop, and this action in turn gives us a real opportunity to effect change and bring other necessary core knowledge to the table for future projects.</p>
<p>I’m excited about what we have in the works. In particular, we’ve begun planning in earnest for the second annual<a href="../../../../2012-Events/24-april-2012-pistoia-alliance-annual-conference-and-members-meeting.html" target="_blank"> Pistoia Alliance Conference</a>, which will be held in Boston on April 24 before the BioITWorld Conference and Expo. Check out our <a href="http://www.pistoiaalliance.org/Event-Detail-Pages/boston-2012-agenda.html" target="_blank">tentative agenda</a>. <a href="http://pistoia2012.eventbrite.com/">Registration is open</a> as well, so we hope to see a good turnout in Boston.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.pistoiaalliance.com/blog/2012/02/getting-things-done-at-the-pistoia-alliance-f2f/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

