The Journey Begins with your microbiome

Thanks for joining me!

This is a companion site to the analysis site at: 

This blog is moving, please update your links to

Most of the content was originally posted on with the pages on the left being a restructuring of selected posts from over a thousand posts on that site.

Also a PodCast:

Recommended Site For Testing

If you have ME/CFS or other financially disastrous condition, there is always a nasty cost factor for testing. My usual recommendation is for the cheapest, high quality provider that provides information for upload to my analysis site. Some sites provide a mountain more of information — but the benefit from that extra information is almost nothing (and it adds $$$$ and complexity).

  • is shutting down. This had been my personal usual site because using a variety of techniques, the cost was $25/sample. Don’t order from there.
  • Thryve is what I am starting to use. Their reports may be processed here for independent suggestions. I would also recommend


Details on Bacteria Selection

People have asked why going different suggestion choices give different results – sometimes contradictory ones! The suggestions are determined by the bacteria included to alter. There is no magic way to select the bacteria. The site gives you a variety of choices/methods reflecting various requests expressed. This post attempts to explain these choices. Remember a typical microbiome result may be 600 bacteria – picking a dozen bacteria at random will give different suggestions every time. Many of the bacteria are ‘noise’ with no health impact in most cases

Quick Suggestions

This looks at only the bacteria in Dr. Jason Hawrelak criteria for a healthy gut. If you are outside of his ranges then low values are attempted to be increased and high values are decreased.

Number of Bacteria Considered: 15

If any other published author care to provide their criteria and grant permission to use, it can be added.

Medical Conditions

When you click on one of the items on the “Adjust Condition A Priori” link there is no microbiome to refer to so one is synthetically created. This is done by looking at the reported shifts and computing one.

From the Autism Profile

We apply some fuzzy logic here.

  • If just one report, we run with that value
  • If equal number of high and low we ignore.
  • If different number of high and low, we compute the difference, and deem the winner to be included

We then create a profile using the 12%ile value for low and 87% for high value.

The resulting synthetic microbiome is then processed using the 50%ile as our reference, scaling. The data to be processed may look like this:

Example for Autism

Number of Bacteria Considered: depending on condition: 5- 300 bacteria, typically 30

Advance Suggestions

This is the workhorse which gives many options to both increase and decrease bacteria included. It takes all bacteria (regardless of possible medical significance) as a starting point and adjusts them.

Add in all those I am missing that are seen in % of other samples

No bacteria is seen in every samples. Some people have none of some bacteria and they are concerned about this. This allows you to include very common bacteria that you are missing with a zero value. It is questionable if this philosophical belief have significance. The most common bacteria is listed below.

Limit to Taxonomy Rank of ….         

It appears that often the real health significant items are at the lowest level of the bacteria hierarchy. There are good Lactobacillus strains and there are bad Lactobacillus strains (which have been reported to be fatal). This allows you to focus only the bottom levels. The more levels, the more bacteria are targeted – and the greater that ‘noise’ may hide what is significant.

A simple analogy. A kid at a school did some vandalism, you have a vague idea of who (the Species). Do you proceed to punish him with all of his friends (i.e. the Genus)? Do you punish those in classes that he is in (the Family) – keeping all of the classes in for detention. Do you push the entire grade in that school (the Order)? The entire school (the Class). Morale and school performance will change for those impacted.

Number of Taxa at different ranks seen in at least 10% of uploads

Bacteria Selection Choices

This attempted to filter to outliers before the [My Taxa View] was created which allowed hand selection. The philosophical reasoning is that very high and very low are the most probable cause of health issues. This discards bacteria that are in the middle range. You specify if you want to focus only on:

  • top/bottom 6% – Example Count: 6
  • top/bottom 12% – Example Count: 35
  • top/bottom 18% – Example Count: 66

Filter by High Lactic Acid/Lactate Producers

This was a special early request from a reader. It will filters to those bacteria that are lactic acid producers where the values are above the 50%ile. Everything else is excluded. This functionality has been improved using EndProducts Explorer and hand picking the taxa (thus you can do it for any end product in our system). Values are scaled from the difference to the median value.

This was retained because lactic acid issues often result in cognitive impairment, hence a simple route for those people.

Deprecated: Filtering by….

Filtering by medical conditions, symptoms have been deprecated and replaced by hand picked taxa. This allows unlimited combinations of conditions and symptoms to be handled.

Where to go to pick shifts matching end products, symptoms, medical conditions

My Biome View

This is for people that wishes to ‘eye-ball’ the choice of bacteria. This shows the relative ranking/percentile and how many samples have it. For a bacteria that is seen in only a dozen sample (like Legionellaceae below) or with a count of 100 or less, is unlikely to have any significance.

How are Hand Pick Taxa Handled?

We maintain the same pattern: What is the difference from the median/50%ile (NEVER the average) and we then scale it and feed those values into the suggestion engine.

How are Lab Results handled

The original approach of giving every bacteria equal weight has been updated recently. Like with Medical Conditions above, we create a synthetic microbiome using

  • 1 down arrow for 18%ile value
  • 2 down arrow for 12%ile value
  • 2 down arrow for 6%ile value
  • 1 up arrow for 82%ile value
  • 2 up arrow for 88%ile value
  • 3 down arrow for 94%ile value

Bottom Line

We always use a provided range (Jason Hawrelak) or the difference from the 50%ile/median. We never use Average — and feel that any lab that reference averages do not really understand the data and lack adequate statistical staff. Once upon a time, in the early days we used average but as we got familiar with the data we realized how wrong that approach was — the data is not a bell curve/normal distribution. A simple example is below.

Almost 80% of people have below average counts.

A series of post looking at the microbiome overtime

While the medical condition is autism, the same approach may be applied to other conditions.

  • Technical Study on Autism Microbiome – comparing citizen science to published science. There is little agreement between published studies, but citizen science agrees with some published studies.
  • Child Autism microbiome over time – Part 1 – Using the bacteria taxa identified above, we look at 11 samples over 2 years to see how these key taxa varied.
  • Child Autism microbiome over time – Part 2 – We look at the predicted symptoms for each of these 11 samples and how certain bacteria cluster that are associated with autism
  • End Products and Autism, etc – We look at citizen science identification of end product shifts associated with autism. Often the pattern is not too high Or too low BUT too high and too low — that is, out of balance
  • Child Autism microbiome over time – Part 3 – we examine the end products over the two years and saw that Camel Milk with L.Reuteri made a significant change in the microbiome. A side effect was that Eubacteriaceae started to climb and kept climbing until it was very extreme. This bacteria produces formic acid which alters the pH of the gut and is hostile to many bacteria, including Bifidobacterium.

Distribution Charts by Lab/Source

This is the next step of dealing with the Taxonomy Nightmare before Christmas. On the taxa detail pages, allow people to view the distributions be specific labs. For illustration I will be using Lachnospiraceae because it is reported in almost all sources.

You will see a new drop down

Log of Values

20% below 12
15% below 12
40% below 12
55% below 12
68% below 12

Actual Values

We will use Ruminococcaceae, http://localhost:42446/library/details?taxon=541000 . Again, something everyone reports

Because most are uBiome, then the shapes above and below are similar
The highest value found was still below the average values of other tests

Bottom Line

There are oddities with some taxa between labs. These charts will help determine better if your readings are atypical or not.

Diets to change Microbiome are suspect…

This 2019 review, Is a vegan or a vegetarian diet associated with the microbiota composition in the gut? Results of a new cross-sectional study and systematic review, concluded:

” No consistent association between a vegan diet or vegetarian diet and microbiota composition compared to omnivores could be identified. Moreover, some studies revealed contradictory results. This result could be due to high microbial individuality, and/or differences in the applied approaches. Standardized methods with high taxonomical and functional resolutions are needed to clarify this issue. “

I have seen that also in extracting facts to the database. While diet (based on these studies) is still on the suggestions list, it is not recommended to use. Specific food is a very different question. Diets tend to be nebulous collections of foods making things very undefined.

FastQ interpretation between providers

I recall reading reviews of difference of reports by bloggers who took two samples from the same stool and sent them to different analysis labs. There are a dozen possible explanation for those differences.

Due to the demise of uBiome, a number of former users downloaded their FASTQ data files and processed that data through different providers that will determine the bacteria taxonomy from FastQ files. Most of us naively believed that the reports would be similar – after all it is digital data in and thus similar taxonomy would be delivered… It appears that things are a lot more complex than that.

From Standards seekers put the human microbiome in their sights, 2019

What is in a FastQ File

A taxonomy download may be 20-30,000 bytes. This contains the bacteria name and hopefully the taxa number with the percentage or count out of a million. The FastQ file is the result of a machine reading the DNA bits of bacteria in your microbiome. It is a lot bigger. DNA bits are represented by 4 characters (A,T,C,G) The typical data would be 170,000,000 bytes (170 Megs).

If you examine the text, yes text, you will see line after line with:


These strings have been matched to certain bacteria, just like your DNA would match to you (and other people closely related). If you go over the US National Library of Medicine, you will find information on these sequences, like this for Bacillus subtilis , a common probiotic.

So, the process is matching up to a reference set. At this point of time we walk into the time trap!

A firm like uBiome may have gotten the latest values when it was started. I suspect a business decision was made not to constantly update them. Why you ask? The answer is simple, to maintain consistency and comparability from sample to sample over time. If they use newer ones, then they should reprocess the old ones to be consistent, but then reports will change in minor or major ways — resulting in support emails and phone calls. Support can be a major expense. So keep to what we started with. I suspected that with uBiome Plus, they were working on using new reference values, after all it was a different test!!

Each provider has a different set of reference sequences. Their sequences may be proprietary (not in the publish site above). This means that to compare results, you need to use the same reference sequences to match with your FastQ microbiome data. If not, it may result in a “bible” by taking page 1 from King James Bible, page 2 from the Vulgate, page 3 from Tynsdale’s translation, etc. Things become a hash.

Another issue also arises, bacteria get renamed or refined. The names used in an older reference library may not match the names in a latter reference library.

For myself, I have the FastQ for all of my uBiome tests and my Thryve Inside tests. I will continue on requiring these FastQ files from testing firms so I can keep the ability to compare samples to each other overtime by running them through the same provider.

I have created a page to allow comparison between FastQ files processed to taxonomy by different provider. The button to get to it, is at the top of the Samples Page – “FastQ Results Comparison”


This takes you to a list of all of your samples. Note that I have 4 samples with the same date below. It is actually just 1 FastQ file interpreted by four different providers. There are additional providers.


This produces a report showing the normalize count (scaled to be per million). I also have the raw count on the page as tool tips over each numbers.


Who has the right numbers?

Without full disclosure by all of the providers, it is difficult to tell.

With all things equal, the current provider that you are getting samples processed through would be the first choice. Why? it allows you to do immediate comparisons. This is not that critical because both will convert a FastQ file to a taxonomy in less than a hour.

What about Research Findings?

Fortunately, researchers use the same process for each study. That means that the results are relatively independent of the process used. It does mean that Study A may find some bacteria are high or low and this is NOT reported in Study B. The why may be very simple, that bacteria was never looked for. Things get fuzzy. With the distribution of bacteria known for a particular method, then we can determine if it is high or low… but that means sufficient samples with that method. With uBiome, we had a large number of samples from this one provider and that allow us to make some good citizen science progress.

Bottom Line on why the difference

  • Different reference libraries
  • Change in bacteria classifications (same sequence, different name)
  • Bugs in software

Open Source: The challenge of picking what bacteria to alter

A rich 16s taxonomy report may contain thousands of species. Every documented modifier increases and decreases dozens of species. While we have 1800 potential modifiers, the challenge of finding a perfect modifier is very hard, if not improbable.

This post describes a variety of approaches that could be taken. In the ideal site, all of these choices should be available for a very knowledgeable consumer or medical professional to select the best candidate.

Axiomatic Approach

This means that some expert says what they, usually based on clinical experience, believe to be a healthy microbiome. Typically this has the risk of being a regionalized definition of a healthy microbiome — where the diet and the dna of the region are intrinsically included.

One example is Jason Hawrelak in Hobart, Tasmania, Australia. If you are from Chennai, India and a vegetarian, many of his proportions may not apply. See this post for the discussion on how microbiome varies by country, dna and even longitude.

Ideal species proportions Example
Bacteroides spp<=20%
Faecalibacterium prausnitzii>=10-15%
Eubacterium spp<=15%
Roseburia spp5-10%
Ruminococcus spp<=15%
Blautia spp5-10%
Total butyrate producers>=40%
Bifidobacterium spp>=2.5-5%
Akkermansia spp1-3%
Lactobacillus spp0.01-1%
Escherichia Coli<0.1%
Methanobrevibacter spp~0.01%
Bilophila wadsworthia<0.01%
Desulfovibrio spp<0.01%

Filter by medical conditions or symptoms

We may have 1000 species. From published studies, we know that the average of people with a condition compare to controls may be high or low (and occasionally both). Some of the people with the condition may have a normal value — it is the group that has a low or high value. These values may be connected to the diet, age or other confounders of the group being studied.

Repeatability of results

We may have 20 studies on the microbiome associated with Facebookitis. Some studies report the same species, other species are only reported in just one study. I could assume that the more studies that a species is mentioned, the more reliable that species is associated. So we may have Facebookamina found in 12 studies and Twitteramina in just 1. We have a multitude of choices: use only the species over some threshold (say the median number), create a new weighting for the number of studies (for example, Log(Number of Citations)), etc. This is one of the challenges of building your algorithm.

Of the 1000 species, perhaps 30 are reported high or low in studies for Facebookitis. In our sample of 1000 species we find that we have 20 of them. Of these 20, we find that 12 have matching shifts to the studies. It is these 12 that are our candidates to shift. We ignore everything else.

Of the 10 we do not have, 3 are low. Do we deem this to be a low? If these three are only seen in 4% of the population do we still deem them to be a low? There is only a 12% chance that these will be reported. Is this noise or significant?

We could do this process for multiple other conditions that we have. I tend to avoid tossing in every condition because conditions often are interconnected. My preference is to always start with the most annoying condition or symptoms.

Weighting of one bacteria shift to another

There are many ways of weighing – giving a value to the amount of shift. The weight is important when we try finding modifiers because we are trying to estimate the net expected benefit for each modifiers so we can select the best modifiers.

The classic approach is a naive: how many do you have compared to the average. I do not recommend this approach. Let us consider some factors involved:

We find that only 60% of people have any of this bacteria. We can compute the average two ways:

  • Average over those who have it: Say 0.5 %
  • Average over every one (so those who do not have it is counted as a zero). This means that the average is now 0.3%

If your value is 0.48%, are you high or normal?

If the average for a different bacteria (say genus) is 20% and your value is 22% and you are 0.6% for the prior bacteria (using 0.5% average). At a per million level you are 20,000 high for one and 1000 for the other – do you give a weight of 20,000 and 1000? But one is 10% higher and the other is 20% higher. Surely you should give the one with a bigger shift, a greater weight? Picking the weighting is another step of developing the algorithm.

If one of the bacteria happens to be Clostridium difficile, you likely want it to be zero. This seems like an exception to any logic you developed above.

Hand Picking

The above methods are mechanical. People often have experience or beliefs. Hand picking means going thru and selecting the species one by one after looking at the literature and association for each species that are outside of the expected range.

Expect Range

The expected range can be computed many ways. The classic lab approach is to compute the average and then the standard deviation. The normal range becomes mean – 2 std dev to mean + 2 std dev. IF THIS WAS A TRUE BELL CURVE, this means that 5% are above and 5% are belove. I strongly do not recommend this approach.

I have moved onto actual percentiles of the labs. So if you want to use the 5% criteria, you look it up against the actual data.

My own preference is 10% with the additional criteria that a strongly supporting (correlated) species must also be 10%. The goal is to identify a species-conspiracy and address (arrest) them as a whole.

This is one more decisions that you need to make in developing an algorithm.


Modifiers are the same situation as diseases and symptoms. Multiple studies with different results. Existing diet, DNA, etc may be confounders of the published studies.

Rather than repeat the discussions above (how many studies reported the same thing etc.), just re-read above. A major confounder is that different studies may not have tested for the same things — a no change report will often be omitted from the studies…. leaving what was tested for being uncertain.

Bottom Line

Where are we:

  • We have a collection of bacteria shifts which may be due solely to diet or DNA with no association to any condition or symptoms | given diet and DNA.
  • We need to identify which bacteria to change (a simple true/false)
  • We need to give a value/weight for how relatively important each bacteria is to change.
  • We have modifiers which are likely to impact these bacteria (a simple true/false)
  • We need to give a value/weight for how certain each bacteria is to be change.

Now we need to optimise across all of these variables to get the optimal suggestions. The item to be optimized is the estimated weighted shift of taking a set of modifiers against a set of bacteria. The key word is estimated.

Understanding the impact of your medicines

I just pushed out an update on that may help you understand what various prescription, over the counter and some supplements may be doing to your microbiome.

Select any of the links highlighted below

The next page will show some choices at the top:

Compare Impact

This is intended to allow you to better choice between alternatives – for example Aspirin versus  Paracetamol (acetaminophen). I am sure people will find more uses for it.

The process is simple, search for each item, and put a check beside it. Select the Compare Impact radio button and then click the submit button below it.

This will take you to a page listing the impact side by side. In this case we seel that their impacts are similar, but different on a few items. At the family level there are a few differences

If a family that is important to you is shifted the wrong way, you may wish to consider the better one


This is intended when you are prescribed drugs to treat some conditions and wish to reduce the impact on the microbiome by counteracting the drug or drugs impact on the microbiome.

For this example, we pick lovastatin (a statin), Famotidine (Pepcid AC).

We may wish to first see how much impact they have together (do they reinforce or counteract each other)

Bad news — they reinforce each other in decreasing many families

Just pressing back, and changing radio buttons, and submit produces suggestions.

The suggestions are done by creating a virtual microbiome report based on the above shifts and running that through our AI engine.

The suggestion page is the new format with the long lists hidden until you ask to see them.

The Take or Avoid list is defaulted to 100 items (which is one reason that I toggle visibility). Remember – none of these items are guaranteed to work, nor do you need to take all of them. Each item increases your odds

The avoid list values are a lot higher, and thus you may wish by reducing any of these items that you are taking.

Automatic Upload and Login from 3rd Party Sites

An upload from a 3rd party site may be done by posting json to

By uploading, you consent to allow your microbiome data and symptoms to be made available to citizen scientists for further discoveries.

Required consent is cited above. 3rd party is responsible to obtain consent.

Json Structure

The structure is simple:

  • The key is issued by us and identifies where the data is coming from (“source”)
  • logon and password are the authentication pair that you generate. These are used for logging on. Logon and Password should be the same for all samples from the same user (so we can display on a timeline).


The taxonomy uses the official taxon numbers and the percentage.


On your site, create a page that does a post to /email/logon3rd with two elements:

<form method=”post”
action=”“><input type=”hidden” name=”logon” value=”whatever” />
<input type=”hidden” name=”password” value=”whatever” />
<input type=”submit” value=”Logon to MicrobiomePrescription” />

What does probability mean on suggestions

Probability is an estimate whether something may help, not how much it can help. The relative help between two items is rarely found in any study.

Probability is based on the number of studies reporting that something shifts a bacteria in the desired way. A single report that Blue Cheese reduces Xeonella may get a value of 0.1 while 10 reports that barley reduces Boozella would get a value of 1.0 (if 10 reports were the maximum number of reports reported). 

Studies often contradict each other – typically caused by a confounder that the study ignored. To address this, we aggregate the number of reports with scaling. For example:

  • 6 reports showing desired changes
  • 2 reports showing undesired changes

Could be computed as 6-2=4. Due to this increase uncertainty, we do other methods for example:

  • exp (log(6/8) + log(2/8))

Once we compute all of these numbers, we then scale them so the maximum value is +1.

UI Improvement on Taxonomy Time Lines

People who track their microbiome overtime, (much wiser than doing a one time and expecting that to be a magic cure), know that I support timeline analysis on

How to get to timeline analysis
On the next page you will see this button, click it after selecting the samples.

Three Views: Value, Log(Value), Percentile

At the top you will see two select choices and a link to the associated library page. All of the data is downloaded so you can quickly explore the patterns without having to wait for the next bacteria to load.

Percentile – the spike in 2019 was the ME/CFS Relapse
Actual values — showing that I am finally getting Lactobacillus. Note the “No Value” is shown when nothing was reported.
Using Log(Value)