Poupou's blog http://pages.infinit.net/ctech/poupou.html Looking for perfect security? Try a wireless brick.<br> Otherwise you may find some unperfect stuff here... Copyright &copy; 2003-2006 S&eacute;bastien Pouliot. All rights reserved. Thu, 24 Mar 2011 13:36:33 GMT spouliot@videotron.ca spouliot@videotron.ca Blog-e-ho! The Full Price of FullName e953ae40-7f6f-4840-b6c7-8a4dea921a51 http://pages.infinit.net/ctech/20110301-0810.html Tue, 01 Mar 2011 20:10:23 GMT <p>Everyone using, even a small part of, <a href="http://www.mono-project.com/Cecil" target="_blank">Cecil</a> knows it's amazing. Now there can be some inconveniants to use <i>part of</i> something - because the other parts cannot (or rarely) be totally ignored. In <a href="http://www.mono-project.com/Gendarme" target="_blank">Gendarme</a>'s case it only uses the <b>reading</b> side of Cecil - but the later would not be so useful without support for <b>writing</b> as well.</p> <p>This leads to a few things that are not optimal, from Gendarme's point of view, inside Cecil. The <i>biggest</i> issue is that a feature like <b>writing support</b> removes a lot of <b>caching</b> possibilities. This can be seen in Cecil's <code>FullName</code> properties (often used in <code>ToString</code> overrides) where, most of them, will re-generate (i.e. allocate a new string) the full name for each call.</p> <p>This is not something new - it's actually been that way since Cecil gained write-ability long ago (GSoC 2006). I hoped the situation would be better with cecil-light (and maybe it is to some extent) but the new <a href="http://www.mono-project.com/" target="_blank">Mono</a> <a href="http://www.mono-project.com/Profiler" target="_blank">Log Profiler</a> (re)opened my eye to this issue recently.</p> <p>Using the log profiler made it easy to see all the string allocations caused (in part) by the <code>FullName</code> properties. Running mono with <b>--profile=log</b> will enable to log profiler (see <b>man mono</b> for more options) and the resulting <b>output.mlpd</b> (default name) file contains the results. To generate a text report you then use the <b>mprof-report</b> tool. E.g.</p> <pre> mono --profile=log bin/gendarme.exe --config rules/rules.xml --set self-test \ --log self-test.log --ignore=self-test.ignore --severity=all --confidence=all \ bin/gendarme.exe bin/gendarme-wizard.exe bin/Gendarme.*.dll mprof-report --traces --maxframes=8 output.mlpd > report </pre> <p>The resulting <b>report</b> log file contains lots of details but, for this blog entry, I'll focus solely on the <b>Total memory allocated</b> line. <h2>Phase 1: Caching and avoiding the cache</h2> <p>Removing the duplicate allocations is simple. A new extention method, <code>GetFullName</code>, was added to call and cache <code>FullName</code>. However this traded memory at the expense of hitting the cache very often (i.e. extra lookup time). While updating Gendarme's code base it become quickly apparent that, in most cases, the full name was not really needed. I.e. a check for the <code>Namespace</code> and <code>Name</code> properties (both available without the string allocation cost) were enough. So to avoid the cache another new extention method, <code>IsNamed</code> was added. The first result were significative: <pre> before Total memory allocated: 84653632 bytes in 1014775 objects after Total memory allocated: 73385936 bytes in 905715 objects diff 11267696 bytes 109060 objects 13.3% 10.7 % </pre> <h2>Phase 2: API</h2> <p>Some Gendarme's framework API also promoted the use of the full name. Again to avoid hitting the cache they were changed, one by one, to use separate namespace and name parameters. <h3>Inherits</h3> <pre> before Total memory allocated: 73385936 bytes in 905715 objects after Total memory allocated: 73422560 bytes in 906391 objects </pre> <p>Notice that memory usage actually grown to fix the API. That's because: <ul> <li>nothing more, allocation wise, is saved (i.e. phase 1 has all the gains);</li> <li>some rules needs a bit more data to work with the split namespace/name versus the full name.</li> </ul> Still we avoid retrieving the fullnames from the cache when it (definitively) was not required.</p> <h3>Implements</h3> <pre> before Total memory allocated: 73422560 bytes in 906391 objects after Total memory allocated: 73425864 bytes in 906958 objects </pre> <p>Again a small memory increase, for the same reasons as <code>Inherits</code> above.</p> <h3>HasAttribute</h3> <pre> before Total memory allocated: 73425864 bytes in 906958 objects after Total memory allocated: 71064328 bytes in 807886 objects </pre> <p>Here a bit of duplicated code was removed, leading to less code to analyze (i.e. it's a self-test, running Gendarme on Gendarme), in turn requiring a bit less memory.</p> <h3>Contain[Any]Type</h3> <pre> before Total memory allocated: 71064328 bytes in 807886 objects after Total memory allocated: 71050024 bytes in 807698 objects </pre> <p>Another small drop. Some (now) unused extention methods and unrequired <code>GetFullName</code> usage were removed.</p> <h3>Cecil's HasTypeReference</h3> <p>Cecil itself use the <code>FullName</code> properties (JB removed a few cases recently, he had early access to my data ;-) and has some API that requires its use, e.g. <code>ModuleDefinition.HasTypeReference<code>. That could be worked around by using <code>ModuleDefinition.GetTypeReferences</code> and some, nice looking, LINQ-y replacements.</p> <pre> before Total memory allocated: 71064328 bytes in 807886 objects after Total memory allocated: 72072960 bytes in 834140 objects </pre> <p>Memory goes up again! Why ? for the same reasons as <code>FullName</code>, i.e. Cecil's <code>ModuleDefinition.GetTypeReferences</code> and <code>GetMemberReferences</code> are allocating a new array on each call. Again this provides no useful value to read-only applications, like Gendarme, so the results were (again) cached.</p> <pre> before Total memory allocated: 72072960 bytes in 834140 objects after Total memory allocated: 71285616 bytes in 817145 objects </pre> <p>So the final increase (the cached arrays) is a lot smaller than my first attempt :-)</p> <h2>Conclusion</h2> <p>I'll let the numbers speak for themselves:</p> <pre> original Total memory allocated: 84653632 bytes in 1014775 objects final Total memory allocated: 71285616 bytes in 817145 objects diff 13368016 bytes 197630 objects 15.8 % 19.5 % </pre> <p>There are likely a few other, indirect uses of <code>FullName</code> or <code>GetFullName</code> that could be avoided. I suspect most will be found (and fixed) before 2.12 is released - anyway there's no <i>real</i> harm in them.</p> <p>Beside confirming <i>old</i> suspects, another fun aspect of profiling is that you'll notice a lot of things when reading the logs, some of them surprising because they challenge/defy your expectations... and give real, nice ideas for further optimizations. Stay tuned :-)</p> Easy to (mis)use API 6e430721-8d2d-44de-a798-12d605cb987f http://pages.infinit.net/ctech/20110304-0315.html Fri, 04 Mar 2011 15:15:05 GMT <p>Here's another take at reducing string allocations inside <a href="http://www.mono-project.com/Gendarme" target="_blank">Gendarme</a> using the new <a href="http://www.mono-project.com/Profiler" target="_blank">Log Profiler</a>. This time I focused on a very helpful, but easy to abuse, API: <code>StreamReader.ReadLine</code>. Similar methods suffers from similar fates.</p> <p>The .NET framework has quite a few helpers like this one. They work great when quickly hacking a solution but they also have serious limitations in the real world. E.g. how long is a line ? from a <code>Stream</code> it could be infinite, eventually leading to a <code>OutOfMemoryException</code>. Same goes for <code>ReadToEnd</code> wrt file size, <code>ReadAllLines</code>... (that sounds like a rule in itself ;-)</p> <p>Even if you control the line/file size there's still a price to pay: each line becomes a new string. Now that's not a big deal if you actually need, <i>as is</i>, each line. However if you (pretty common pattern) read lines, then parse each/most of them then you get a lot of <i>extra</i> allocations.</p> <h2>make self-test</h2> <p>When doing a <b>make self-test</b> Gendarme read two text files to find which known defects should ignored (i.e. not reported). E.g.</p> <pre> -rw-r--r-- 1 poupou users 3169 2011-01-05 15:32 <a href="https://github.com/mono/mono-tools/raw/master/gendarme/mono-options.ignore" target="_blank">mono-options.ignore</a> -rw-r--r-- 1 poupou users 55154 2011-02-28 18:54 <a href="https://github.com/mono/mono-tools/raw/master/gendarme/self-test.ignore" target="_blank">self-test.ignore</a> </pre> <p>So, that's 58323 bytes for less than 700 lines (including blanks and comments). However the (very simple) file format requires to split each, non-comment, line in two parts: <ol> <li>an indicator (is this a <b>R</b>ule, <b>A</b>ssembly, <b>T</b>ype, <b>M</b>ethod or a <b>#</b> comment); and</li> <li>a (rule / assembly / type / method) full name</li> </ol> This means that the original string, returned from <code>ReadLine</code> is often a short lived variable.</p> <p>So what if we were reading this into a, re-usable, <code>char[]</code> buffer ? Could we drop the allocations by half ? It was worth a try and <a href="https://github.com/mono/mono-tools/raw/master/gendarme/framework/Gendarme.Framework.Helpers/StreamLineReader.cs" target="_blank">StreamLineReader</a> was born. Here's the total allocations before and after <a href="https://github.com/mono/mono-tools/raw/master/gendarme/console/IgnoreFileList.cs" target="_blank">IgnoreFileList</a> was updated.</p> <pre> before Total memory allocated: 71512640 bytes in 823879 objects after Total memory allocated: 71322520 bytes in 823084 objects 190120 bytes in 795 objects </pre> <p>Ok, 190,120 bytes may not a huge gain (that's 0.25% of the allocations required for a <b>self-test</b>). Still it represent 3.25 bytes saved for each byte being read from the files (a good ratio) because other, string and non-strings, allocations are now avoided as well.</p> <h2>Why bother?</h2> <p><a href="https://github.com/mono/mono-tools/raw/master/gendarme/console/IgnoreFileList.cs" target="_blank">IgnoreFileList</a> was not very high in the profiler logs. However <a href="https://github.com/spouliot/gendarme/wiki/Gendarme.Rules.Portability.MonoCompatibilityReviewRule%282.10%29" target="_blank">MonoCompatibilityReviewRule</a> is at the <b>top</b>, for the same reason, since it download (from MoMA web service), uncompress then <b>read</b> three text files. Here's an extract of the logs:</p> <pre> Allocation summary Bytes Count Average Type name 25515184 153796 165 System.String 11693296 bytes from: Gendarme.Framework.Runner:Initialize () Gendarme.Rules.Portability.MonoCompatibilityReviewRule:Initialize (Gendarme.Framework.IRunner) Gendarme.Rules.Portability.MonoCompatibilityReviewRule:LoadDefinitions (string) Gendarme.Rules.Portability.MonoCompatibilityReviewRule:Read (System.IO.TextReader) System.IO.StreamReader:ReadLine () (wrapper managed-to-managed) string:.ctor (char[],int,int) string:CreateString (char[],int,int) (wrapper managed-to-native) string:InternalAllocateStr (int) 1365136 bytes from: Gendarme.Framework.Runner:Initialize () Gendarme.Rules.Portability.MonoCompatibilityReviewRule:Initialize (Gendarme.Framework.IRunner) Gendarme.Rules.Portability.MonoCompatibilityReviewRule:LoadDefinitions (string) Gendarme.Rules.Portability.MonoCompatibilityReviewRule:Read (System.IO.TextReader) System.IO.StreamReader:ReadLine () System.Text.StringBuilder:set_Length (int) System.Text.StringBuilder:InternalEnsureCapacity (int) (wrapper managed-to-native) string:InternalAllocateStr (int) 1164952 bytes from: Gendarme.Framework.Runner:Initialize () Gendarme.Rules.Portability.MonoCompatibilityReviewRule:Initialize (Gendarme.Framework.IRunner) Gendarme.Rules.Portability.MonoCompatibilityReviewRule:LoadDefinitions (string) Gendarme.Rules.Portability.MonoCompatibilityReviewRule:Read (System.IO.TextReader) System.IO.StreamReader:ReadLine () System.Text.StringBuilder:Append (char[],int,int) System.Text.StringBuilder:InternalEnsureCapacity (int) (wrapper managed-to-native) string:InternalAllocateStr (int) 1030624 bytes from: Gendarme.ConsoleRunner:Initialize () Gendarme.Framework.Runner:Initialize () Gendarme.Rules.Portability.MonoCompatibilityReviewRule:Initialize (Gendarme.Framework.IRunner) Gendarme.Rules.Portability.MonoCompatibilityReviewRule:LoadDefinitions (string) Gendarme.Rules.Portability.MonoCompatibilityReviewRule:ReadWithComments (System.IO.TextReader) string:Substring (int,int) string:SubstringUnchecked (int,int) (wrapper managed-to-native) string:InternalAllocateStr (int) 966576 bytes from: Gendarme.Framework.Runner:Initialize () Gendarme.Rules.Portability.MonoCompatibilityReviewRule:Initialize (Gendarme.Framework.IRunner) Gendarme.Rules.Portability.MonoCompatibilityReviewRule:LoadDefinitions (string) Gendarme.Rules.Portability.MonoCompatibilityReviewRule:ReadWithComments (System.IO.TextReader) System.IO.StreamReader:ReadLine () (wrapper managed-to-managed) string:.ctor (char[],int,int) string:CreateString (char[],int,int) (wrapper managed-to-native) string:InternalAllocateStr (int) </pre> <p>We see the <code>System.IO.StreamReader:ReadLine</code> and also the <code>string:Substring</code> - a clear hint that (some) lines are being parsed. Changing the rule to use the <code>StreamLineReader</code> shows how much memory can be saved.<p> <pre> Total memory allocated: 71322520 bytes in 823084 objects Total memory allocated: 68936880 bytes in 816067 objects 2385640 bytes in 7017 objects 3.3 % 0.8 % </pre> <p>That's much better percentage wise. However the ratio (wrt file size) is much lower because two of the three files using by the rule do not require parsing the lines, i.e. what <code>ReadLine</code> returned was usable "as-is" and kept in a <code>HashSet</code>. Only <b>monotodo.txt</b>, which has an optional text message, needs some extra parsing - even if only to remove the '-' at the end of the line.</p> <p>Newer logs show, more clearly, that most allocations are done on the <i>unparsed</i> files - i.e. the optimization did not reach them:</p> <pre> 22888072 145733 157 System.String 13264152 bytes from: Gendarme.ConsoleRunner:Initialize () Gendarme.Framework.Runner:Initialize () Gendarme.Rules.Portability.MonoCompatibilityReviewRule:Initialize (Gendarme.Framework.IRunner) Gendarme.Rules.Portability.MonoCompatibilityReviewRule:LoadDefinitions (string) Gendarme.Rules.Portability.MonoCompatibilityReviewRule:Read (Gendarme.Framework.Helpers.StreamLineReader) (wrapper managed-to-managed) string:.ctor (char[],int,int) string:CreateString (char[],int,int) (wrapper managed-to-native) string:InternalAllocateStr (int) 1131424 bytes from: Gendarme.ConsoleRunner:Initialize () Gendarme.Framework.Runner:Initialize () Gendarme.Rules.Portability.MonoCompatibilityReviewRule:Initialize (Gendarme.Framework.IRunner) Gendarme.Rules.Portability.MonoCompatibilityReviewRule:LoadDefinitions (string) Gendarme.Rules.Portability.MonoCompatibilityReviewRule:ReadWithComments (Gendarme.Framework.Helpers.StreamLineReader) (wrapper managed-to-managed) string:.ctor (char[],int,int) string:CreateString (char[],int,int) (wrapper managed-to-native) string:InternalAllocateStr (int) </pre> <p>Right now my options, to further reduce string usages, are a bit limited - at least without changing the file format, which we inherit from MoMA. E.g. the file <b>missing.txt</b> has more than 55000 lines because it covers every assemblies shipped by MS.NET 4.0. Gendarme could easily read (and allocate) entries that are <b>only</b> needed by the assemblies being referenced by the code analyzed - if that data was available. </p> <p>This will become important because I expect (or at least wishes) for similar rules (e.g. something similar to <a href="http://msdn.microsoft.com/en-us/library/cc667408.aspx" target="_blank">CA1903:UseOnlyApiFromTargetedFramework</a>) to be added to Gendarme in the next releases. Yet there's more planning needed (other rules requirements) before changing the format.</p> <p>Still it's nice to know the <a href="http://www.mono-project.com/Profiler" target="_blank">tooling</a> needed to guide such work is available and simply waiting for time / hackers :-)</p> News flash: Cheap certificates are cheap! dce472ac-7d90-4f8e-b359-45bd0e21ea17 http://pages.infinit.net/ctech/20110324-0943.html Thu, 24 Mar 2011 09:43:17 GMT In case you missed the <a href="http://blogs.comodo.com/category/it-security/" target="_blank">news</a> or <a href="http://www.microsoft.com/technet/security/advisory/2524375.mspx">update</a> (as many other things have been happening lately) <a href="http://www.comodo.com/" target="_blank">some CA</a> has, <a href="http://groups.google.com/group/mozilla.dev.tech.crypto/browse_thread/thread/9c0cc829204487bf" target="_blank">not for the first time</a>, issued <i>bad</i> certificates. E.g. issuing some SSL certificates to somebody (or <a href="http://www.computerworld.com/s/article/9214998/Firm_points_finger_at_Iran_for_SSL_certificate_theft" target="_blank">some country</a>) that is not the owner of the domain. If you trust that CA root (and it has many), then everything that it signed is, by default and until revoked/updated, trusted.</p> <p>How does this affects <a href="http://www.mono-project.com/" target="_blank">Mono</a> ?</p> <h2>Framework</h2> <p>Mono, <b>by default</b>, does not trust any CA root - this is <b>not</b> the job of a framework (it provides the plumbing, not the water).</p> <p>There are several reasons for that (see the <a href="http://www.mono-project.com/FAQ:_Security#Why_doesn.27t_Mono_includes_root_certificates_from_X.2C_Y_and_Z_.3F" target="_blank">FAQ</a>) but honestly there are very few people that needs to trust all <b>139</b> (as of today) CA roots (that <a href="http://www.go-mono.org/docs/index.aspx?link=man%3Amozroots%281%29" target="_blank">mozroots</a> could install). In fact the probability of having false certificates issued grows with the number of CA and the more of them you trust, the more likely you'll be affected.</p> <h2>Applications</h2> <p>For applications the main challenge is that they cannot (or at least should not) totally depend on the state of the user/machine certificate store(s).</p> <p><b>Why ?</b> It could be empty (e.g. Mono's default), it could have a lot of junk (e.g. old self-signed, test certificates) or it could include every CA known to this (and likely other) world(s).</p> <p>OTOH (and unlike frameworks) applications generally knows what they want/need in order to execute properly. E.g. if you're using SSL to check for new mail on GMail then you can (and should) easily add a few check and refuse certificates that are not related to the job.</p> <p>Now since I just told you not to (totally) depend on the certificate store(s) then you might feel you have something to do wrt the above issue. Have a look at this <a href="http://www.mono-project.com/UsingTrustedRootsRespectfully" target="_blank">wiki page</a> for some approaches. But if you're already dealing (and checking) for specific hosts then you're likely ok - unless your host(s) match of the fraudulent certificates. </p> <p>If your code is more general (e.g. it can connect to anything) then you can resort to the browser solution: a blacklist. Here's some sample code that will refuse the known-to-be fraudulent certificates.</p> <code><xmp class="code-csharp"> // to be used as the RemoteCertificateValidationCallback of an SslStream public static bool ValidateServerCertificate (object sender, X509Certificate certificate, X509Chain chain, SslPolicyErrors sslPolicyErrors) { if (sslPolicyErrors != SslPolicyErrors.None) return false; // blacklist of known-to-be fraudulent certificates switch (certificate.GetSerialNumberString ()) { case "009239D5348F40D1695A745470E1F23F43": // addons.mozilla.org case "00D8F35F4EB7872B2DAB0692E315382FB0": // Global Trustee case "00B0B7133ED096F9B56FAE91C874BD3AC0": // login.live.com case "00E9028B9578E415DC1A710A2B88154447": // login.skype.com case "392A434F0E07DF1F8AA305DE34E0C229": // login.yahoo.com_2 case "3E75CED46B693021218830AE86A82A71": // login.yahoo.com_3 case "00D7558FDAF5F1105BB213282B707729A3": // login.yahoo.com case "047ECBE9FCA55F7BD09EAE36E10CAE1E": // mail.google.com case "00F5C86AF36162F13A64F54F6DC9587C06": // www.google.com return false; default: return true; // or your own existing logic } } </xmp></code> <h2>Users</h2> <p>How many people have the keys to your home ? Scrap that - no matter the number, the fewer the better. Now curious about how many CA you're currently trusting ?</p> <pre> ~ @ certmgr -m -list -c Trust | grep "Unique Hash" | wc -l 0 ~ @ certmgr -list -c Trust | grep "Unique Hash" | wc -l 140 </pre> <p>Yep, I executed <a href="http://www.go-mono.org/docs/index.aspx?link=man%3Amozroots%281%29" target="_blank">mozroots</a> to see how many certificates would be added but it won't stay that high very long ;-). More details in <a href="http://www.go-mono.org/docs/index.aspx?link=man%3Acertmgr%281%29" target="_blank">certmgr</a> documentation to remove (all/some) of them.</p> <p><b>Update: </b><a href="http://pages.infinit.net/ctech/20110324-0133.html" target="_blank">Instructions</a> for using certmgr to remove the CA root that signed the bad certificates.</p> But but but... fbd2131c-23f7-4ba8-b6ad-b0e327597f40 http://pages.infinit.net/ctech/20110324-0133.html Thu, 24 Mar 2011 13:33:15 GMT <p>Since all the <a href="http://pages.infinit.net/ctech/20110324-0943.html" target="_blank">previously mentioned</a> certificates were issued by a single certificate authority you also have the option of removing <i>only</i> this CA from those <a href="http://www.go-mono.org/docs/index.aspx?link=man%3Amozroots%281%29" target="_blank">mozroots</a> installed. Note that this: <ul> <li>does not solve the root (pun intended) issue. The same situation can occurs with other CA (from the same or a different company);</li> <li>will remove the trust from all certificate signed (past and future) by this CA.</li> </ul> <h2>Instructions</h2> <p>First check how many certificates you have installed in your <b>Trust</b> store:</p> <pre> ~ @ certmgr -list -c Trust | grep "Unique Hash" | wc -l 140 </pre> <p>Next remove the CA root certificate that signed all those bad certificates:</p> <pre> ~ @ certmgr -del -c Trust 89B5351EC11451D06E2F95B5F89722D527A897B9 </pre> <p>Finally validate that the certificate was removed.</p> <pre> ~ @ certmgr -list -c Trust | grep "Unique Hash" | wc -l 139 ~ @ certmgr -list -c Trust | grep "UTN-USERFirst-Hardware" </pre> <p>If the number was decreased by one and the string <b>UTN-USERFirst-Hardware</b> can't be found anymore then <b>this batch</b> of bad certificates won't affect you.</p> <p>Note: Repeat the above steps with <b>-m</b> if you installed root certificates on the machine store.</p>