<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Adventures in CUDA Path Tracing: Part 1</title>
	<atom:link href="http://www.sjbrown.co.uk/2009/08/15/cuda-path-tracing/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.sjbrown.co.uk/2009/08/15/cuda-path-tracing/</link>
	<description>It works on my machine.</description>
	<lastBuildDate>Mon, 19 Jul 2010 09:55:01 +0100</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.6</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Daniel</title>
		<link>http://www.sjbrown.co.uk/2009/08/15/cuda-path-tracing/comment-page-1/#comment-5194</link>
		<dc:creator>Daniel</dc:creator>
		<pubDate>Mon, 28 Dec 2009 16:19:57 +0000</pubDate>
		<guid isPermaLink="false">http://www.sjbrown.co.uk/?p=295#comment-5194</guid>
		<description>Hi Simon,

I wonder how you get such clean results in the GI using path tracing. Do you also calculate the direct light at each hit point of the path? Otherwise most paths wouldn’t ever reach the small area light source and thus resulting in a dark pixel color. Or do you use some kind of importance sampling?
The other question I would like to know is how do you use the cuda threads? Are you using bucket-rendering or some kind of image partitioning? Or do you use other techniques to avoid the watchdog time-out? Do you sum up the sampling values in your kernel or outside using the cpu?

Thanks in advance.

Daniel</description>
		<content:encoded><![CDATA[<p>Hi Simon,</p>
<p>I wonder how you get such clean results in the GI using path tracing. Do you also calculate the direct light at each hit point of the path? Otherwise most paths wouldn’t ever reach the small area light source and thus resulting in a dark pixel color. Or do you use some kind of importance sampling?<br />
The other question I would like to know is how do you use the cuda threads? Are you using bucket-rendering or some kind of image partitioning? Or do you use other techniques to avoid the watchdog time-out? Do you sum up the sampling values in your kernel or outside using the cpu?</p>
<p>Thanks in advance.</p>
<p>Daniel</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Simon Brown</title>
		<link>http://www.sjbrown.co.uk/2009/08/15/cuda-path-tracing/comment-page-1/#comment-3303</link>
		<dc:creator>Simon Brown</dc:creator>
		<pubDate>Sun, 23 Aug 2009 09:01:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.sjbrown.co.uk/?p=295#comment-3303</guid>
		<description>Hey Marco, nice to hear from you!

Yep it wouldn&#039;t be the persistant threads themselves that would reduce pressure, I would need to do less work in each job too.  Of course this increases bandwidth requirements, atomic serialisation costs, etc since I must now save/restore state for each job as the lists get executed, so I have no idea how much of a win this will be in practise.</description>
		<content:encoded><![CDATA[<p>Hey Marco, nice to hear from you!</p>
<p>Yep it wouldn&#8217;t be the persistant threads themselves that would reduce pressure, I would need to do less work in each job too.  Of course this increases bandwidth requirements, atomic serialisation costs, etc since I must now save/restore state for each job as the lists get executed, so I have no idea how much of a win this will be in practise.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Marco Salvi</title>
		<link>http://www.sjbrown.co.uk/2009/08/15/cuda-path-tracing/comment-page-1/#comment-3295</link>
		<dc:creator>Marco Salvi</dc:creator>
		<pubDate>Sun, 23 Aug 2009 00:08:11 +0000</pubDate>
		<guid isPermaLink="false">http://www.sjbrown.co.uk/?p=295#comment-3295</guid>
		<description>Nice work Simon!

Although switching to a persistent threads based system won&#039;t probably help your registers pressure issues.
Registers are statically allocated when a kernel is kicked on a multiprocessor, so for all your jobs handled by persistent threads you will pay the cost of your max reg usage on all of them (and perhaps even more).</description>
		<content:encoded><![CDATA[<p>Nice work Simon!</p>
<p>Although switching to a persistent threads based system won&#8217;t probably help your registers pressure issues.<br />
Registers are statically allocated when a kernel is kicked on a multiprocessor, so for all your jobs handled by persistent threads you will pay the cost of your max reg usage on all of them (and perhaps even more).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Simon Brown</title>
		<link>http://www.sjbrown.co.uk/2009/08/15/cuda-path-tracing/comment-page-1/#comment-3141</link>
		<dc:creator>Simon Brown</dc:creator>
		<pubDate>Sun, 16 Aug 2009 15:48:24 +0000</pubDate>
		<guid isPermaLink="false">http://www.sjbrown.co.uk/?p=295#comment-3141</guid>
		<description>According to wikipedia, the 9500 GT has around 1/8 the number of shader cores as a 275 GTX (which I&#039;m tempted to buy), so that should give me an immediate 8x speedup.  A 275 GTX (being compute 1.3) would also have double the register count per multiprocessor, and since I&#039;m massively register bound I&#039;d expect a further 2x speedup.

The kernel is tiny: maybe a couple of hundred lines of C.  Took a few evenings to get this far, but I have plenty of CPU reference code to copy from.</description>
		<content:encoded><![CDATA[<p>According to wikipedia, the 9500 GT has around 1/8 the number of shader cores as a 275 GTX (which I&#8217;m tempted to buy), so that should give me an immediate 8x speedup.  A 275 GTX (being compute 1.3) would also have double the register count per multiprocessor, and since I&#8217;m massively register bound I&#8217;d expect a further 2x speedup.</p>
<p>The kernel is tiny: maybe a couple of hundred lines of C.  Took a few evenings to get this far, but I have plenty of CPU reference code to copy from.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Kevin</title>
		<link>http://www.sjbrown.co.uk/2009/08/15/cuda-path-tracing/comment-page-1/#comment-3138</link>
		<dc:creator>Kevin</dc:creator>
		<pubDate>Sun, 16 Aug 2009 14:19:42 +0000</pubDate>
		<guid isPermaLink="false">http://www.sjbrown.co.uk/?p=295#comment-3138</guid>
		<description>Looks good. Performance is kind of disappointing, but then maybe the latest beast from nvidia would blow your socks off? I have no idea. Did this take much time to implement? Is it a lot of CUDA code?</description>
		<content:encoded><![CDATA[<p>Looks good. Performance is kind of disappointing, but then maybe the latest beast from nvidia would blow your socks off? I have no idea. Did this take much time to implement? Is it a lot of CUDA code?</p>
]]></content:encoded>
	</item>
</channel>
</rss>
