electro-music.com   Dedicated to experimental electro-acoustic
and electronic music
 
    Front Page  |  Radio
 |  Media  |  Forum  |  Wiki  |  Links
Forum with support of Syndicator RSS
 FAQFAQ   CalendarCalendar   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   LinksLinks
 RegisterRegister   ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in  Chat RoomChat Room 
go to the radio page Live at electro-music.com radio 1 Please visit the chat
poster
 Forum index » News... » PC
Quad Core
Post new topic   Reply to topic
Page 2 of 2 [44 Posts]
View unread posts
View new posts in the last week
Mark the topic unread :: View previous topic :: View next topic
Goto page: Previous 1, 2
Author Message
BobTheDog



Joined: Feb 28, 2005
Posts: 4044
Location: England
Audio files: 32
G2 patch files: 15

PostPosted: Sun Jun 14, 2009 6:42 am    Post subject: Reply with quote  Mark this post and the followings unread

Well there is the old saying that Microsoft Products expand to fill the space available.

It isn't just Office, I use Visual studio and the 2008 version is half the speed of the 2005 version which is half the speed of the 2003 version.

What they can do to make a C++ compiler run four times slower with five years of work I have no idea.

Actually I do, programmers are getting much worse.
Back to top
View user's profile Send private message
jksuperstar



Joined: Aug 20, 2004
Posts: 2503
Location: Denver
Audio files: 1
G2 patch files: 18

PostPosted: Sun Jun 14, 2009 9:31 pm    Post subject: Reply with quote  Mark this post and the followings unread

BobTheDog wrote:
Actually I do, programmers are getting much worse.


Amen.
Back to top
View user's profile Send private message Visit poster's website
Acoustic Interloper



Joined: Jul 07, 2007
Posts: 2067
Location: Berks County, PA
Audio files: 89

PostPosted: Sat Jul 04, 2009 6:58 pm    Post subject: Reply with quote  Mark this post and the followings unread

Antimon wrote:
jksuperstar wrote:
I just wish programmers would all go through some form of embedded programming training. Reducing the bloat in software would go a LONG way to making the cache, and CPU power, a lot more efficient. Most likely far more so than the advances that modern cache architectures could attain themselves.


Not to mention that that would make the applications perform better. I am baffled every time I see that half-second latency between key press and added character in MS Word.

/Stefan

Ah, a subject so near and dear to my heart love

Here's the title of a National Science Foundation lab equipment proposal I am writing this month:

"MRI-R2: Acquisition of Computer Music Instrumentation for Research into Real-time, Reactive, Embedded Processing Architectures"

Wish me luck! (Any formal references to back up the grant justification would be much appreciated!)

Also, if anybody has any pointers to some embedded card with a uP and some floating point DSPs with a small footprint monitor (an O.S. light) worth looking at, I'd really appreciate that. A reasonable target for cross-compilation would fit into this lab. Maybe fast uPs with some DSP instructions would be good enough. I'd just like to have a target platform without the layers-of-shit between the audio dataflow apps and the I/O. Cross-compiling to an efficient implementation of GStreamer seems tractable. The base would be an efficient implementation of GStreamer including a goodly collection of signal processing libraries. Whether they run on DSPs or fast mainstream processors is maybe a secondary consideration.

At this point my proposal is mostly for algorithms and application architecture research using off-the-shelf components (Macs, a couple Pacaranas, other COTS hardware and software), but with a reasonable hardware target, doing some minimal-size O.S. work would be good.

_________________
When the stream is deep
my wild little dog frolics,
when shallow, she drinks.
Back to top
View user's profile Send private message Visit poster's website
DrJustice



Joined: Sep 13, 2004
Posts: 2114
Location: Morokulien
Audio files: 4

PostPosted: Sun Jul 05, 2009 5:57 am    Post subject: Reply with quote  Mark this post and the followings unread

Acoustic Interloper wrote:
"MRI-R2: Acquisition of Computer Music Instrumentation for Research into Real-time, Reactive, Embedded Processing Architectures"

Wish me luck!

I wish you the best of luck with your proposal!

Quote:
Also, if anybody has any pointers to some embedded card with a uP and some floating point DSPs with a small footprint monitor (an O.S. light) worth looking at, I'd really appreciate that.

I'm also always on the lookout for the ultimate DSP platform. Can you use PCI cards? Stuffing a PCI card into a Mini-ITX card with linux seems like one possibility for making it embedded'ish. The multi DSP PCI cards out there are way out of my price range tough. I'l love to roll my own, but that too costs quite a bit and takes time.

If had a year to spend I might go out an get e.g.:
Mini ITX board + Raggedstoine FPGA + twin TigerSHARC's
Or the ITX could be dropped and a CPU in the FPGA used instead.
Sigh, I actually have something similar (ITX + PCI-FPGA + ADSP21364 EZ-KIT Lite), but not enough time...

jksuperstar wrote:
I just wish programmers would all go through some form of embedded programming training. Reducing the bloat in software would go a LONG way to making the cache, and CPU power, a lot more efficient. Most likely far more so than the advances that modern cache architectures could attain themselves.

Well put. It's quite shocking how little 'modern programmers' know about the underlying architecture and how to use it efficiently. In fact it seems that many go out of their way to spend resources on layers and layers of crutches, soft pillows, sweet scents, fair promises and so on, only to end up with slow, crashing, dysfunctional applications. /rant.

DJ
--
Back to top
View user's profile Send private message Visit poster's website
Acoustic Interloper



Joined: Jul 07, 2007
Posts: 2067
Location: Berks County, PA
Audio files: 89

PostPosted: Sun Jul 05, 2009 8:23 am    Post subject: Reply with quote  Mark this post and the followings unread

DrJustice wrote:
I wish you the best of luck with your proposal!

thanks
Quote:
If had a year to spend I might go out an get e.g.:
Mini ITX board + Raggedstoine FPGA + twin TigerSHARC's
Or the ITX could be dropped and a CPU in the FPGA used instead.
Sigh, I actually have something similar (ITX + PCI-FPGA + ADSP21364 EZ-KIT Lite), but not enough time...

Thanks for the pointers as well. Time is, indeed, the biggest problem. Also, with the students I have, I am sure that applications level work will be more enticing than o.s. work. Part of the study / research plan will be along the lines of how to use multi-threaded cores effectively within applications, but of course these o.s.es get in the road. If I run across a grad student with a taste for o.s. work, that would be the enabler.
Quote:
In fact it seems that many go out of their way to spend resources on layers and layers of crutches

After the Kernighan and Plauger books on Software Tools came out in the early 80's, I gave a talk entitled "Software Shims" at an AT&T training center down in Princeton. It was only partly tongue in cheek.

_________________
When the stream is deep
my wild little dog frolics,
when shallow, she drinks.
Back to top
View user's profile Send private message Visit poster's website
Antimon



Joined: Jan 18, 2005
Posts: 4145
Location: Sweden
Audio files: 371
G2 patch files: 100

PostPosted: Sun Jul 05, 2009 8:30 am    Post subject: Reply with quote  Mark this post and the followings unread

DrJustice wrote:

Well put. It's quite shocking how little 'modern programmers' know about the underlying architecture and how to use it efficiently. In fact it seems that many go out of their way to spend resources on layers and layers of crutches, soft pillows, sweet scents, fair promises and so on, only to end up with slow, crashing, dysfunctional applications. /rant.

DJ
--


Hmm, I work as a programmer, often in environments with a lot of concurrency (lots of EJBs running, database access, web GUIs etc). I realise that I haven't made that big an effort to find out what to keep in mind if you want to make the most of an underlying multi-processor/core architecture. What do I need to know? Is there some buzzword I could google on, or some nice web-tutorial?

I've worked my way up from assembler on Commodore 64s and Amigas, x86s and SPARCs, then C then Modula 3 then Java and loads of other stuff. I feel I have a good grip on the essentials of most computer components. What I don't know is how a computer finds out that I have made a module in a program that is safe to keep running separately on a processor, apart from the other modules in the same program. Or a way to give the processor this kind of information in said program.

I'm not sure I need to know this, because the stuff I work with is more likely to suffer from bottlenecks in disk or network access. I feel that it's more cpu-intense stuff like games or music apps that need to work well with processor multiplicity.

Edit: a significant "not" got lost

/Stefan

_________________
Antimon's Window
@soundcloud @Flattr home - you can't explain music
Back to top
View user's profile Send private message Visit poster's website
DrJustice



Joined: Sep 13, 2004
Posts: 2114
Location: Morokulien
Audio files: 4

PostPosted: Sun Jul 05, 2009 11:48 am    Post subject: Reply with quote  Mark this post and the followings unread

Antimon wrote:
DrJustice wrote:

Well put. It's quite shocking how little 'modern programmers' know about the underlying architecture and how to use it efficiently. In fact it seems that many go out of their way to spend resources on layers and layers of crutches, soft pillows, sweet scents, fair promises and so on, only to end up with slow, crashing, dysfunctional applications. /rant.

DJ
--
Hmm, I work as a programmer, often in environments with a lot of concurrency (lots of EJBs running, database access, web GUIs etc). I realise that I haven't made that big an effort to find out what to keep in mind if you want to make the most of an underlying multi-processor/core architecture. What do I need to know? Is there some buzzword I could google on, or some nice web-tutorial?

Firstly, I don't think you really are in the category that don't know how it hangs together and works at the lower levels...

Quote:
I've worked my way up from assembler on Commodore 64s and Amigas, x86s and SPARCs, then C then Modula 3 then Java and loads of other stuff.

See what i mean Very Happy

There's no magic bullet to search out, only knowledge, experience and hard work. Hence my support for jk's original notion of starting out at a lower level - In the flying business they have this sussed; you start on a Cessna and work through the grades before you can fly a Jumbo.

Quote:
I'm not sure I need to know this, because the stuff I work with is more likely to suffer from bottlenecks in disk or network access. I feel that it's more cpu-intense stuff like games or music apps that need to work well with processor multiplicity.

So you've identified at least one performance bottleneck (I/O often is one, of course). Not in the target group then. Sorry Wink

But seriously, if your box is I/O bound and you're the only one using it, it may not be that critical. Even so, an I/O bound app can often run better by causing less swapping, less context switches, higher degree of locality, less memory bandwidth usage etc.. As you say, games, music applications are more critical in that one respect.

As I did learn in my dot com days, there are situations where knowledgeable programmers probably can't escape dragging in huge libraries and massive subsystems simply because you have to deliver something "completely new" all the time, in a lighter shade of blue this round, while complying with this weeks hippest data exchange protocols and so on (say some ephemeral ring-tone delivery site, Mad Cow themed, since the old Crazy Frog one is - well, old). And by next month your app will be phased out anyway. Needless to say I don't fit well into such companies due to incompatiblities in values Twisted Evil

DJ
--

Last edited by DrJustice on Sun Jul 05, 2009 11:53 am; edited 2 times in total
Back to top
View user's profile Send private message Visit poster's website
Acoustic Interloper



Joined: Jul 07, 2007
Posts: 2067
Location: Berks County, PA
Audio files: 89

PostPosted: Sun Jul 05, 2009 11:48 am    Post subject: Reply with quote  Mark this post and the followings unread

Antimon wrote:
What I don't know is how a computer finds out that I have made a module in a program that is safe to keep running separately on a processor, apart from the other modules in the same program. Or a way to give the processor this kind of information in said program.

Short of running a parallelizing compiler for a special purpose architecture, I think the main approach is very loosely coupled multithreading, where loose coupling means as little data sharing and accompanying synchronization overhead between threads as possible.

I used multiple Windows threads in the pipelined MIDI-finger-picking parser of this paper, Figure 1. Testing was on an Alienware Windows XP machine with a dual-threaded core. The good news is that, if any complex processing caused new samples to arrive before the last stages of processing old samples had completed, the second hardware thread would kick in and process the newly incoming sample. Lost samples were less common.

The bad news is that a pipeline like this is no good for latency, because each stage still takes the same amount of time that it would in a single-threaded app, plus there is some synchronization overhead between the stages. In this particular architecture, the Stage 3 score matching pipeline stage is usually the bottleneck. Pipelining like this can increase throughput but it does not help latency.

What I need to do next is redesign the Stage 3 bottleneck, first the algorithm in general (which is a brute force algorithm), then also looking at thread-parallel search within that stage. I haven't had time yet, but the plan is basically to divide the score-search-space into independent regions that can be matched by parallel threads, in cases where local matching fails.

I have my programming students write multi-threaded versions of algorithms like quicksort, which, after partitioning data, work on the data independently. Divide-and-conquer problems where the data partitions can be accessed separately after partitioning are good candidates for multithreading.

Of course, there is a tradeoff against caching there, since caching usually argues for locality of reference, i.e., trying not to splatter memory references across the address space. Multi-threaded data partitioning does make for a lot of disjoint memory references. If each hardware thread gets a portion of the cache to work with, this may not be such a problem. Obviously, if you have many threads working in parallel on disjoint data, they have to have a way to feed themselves data.

As long as single-threaded or few-threaded processors dominate, there is a negative incentive for factoring applications across multiple threads, because then the thread synchronization overhead adds cost. You need the hardware and software together.

For synthesis it seems that the voices could be legitimately generated in separate threads, and then mixed. Fine grain individual voice generation is not the level at which voices need to interact. But they should make the job of downstream mixing as simple as possible, e.g., by scaling amplitude before the mixer, since the mixer is intrinsically serial and therefore single threaded.

I put in a grant proposal for some slightly used (but still fully warrantied and free for colleges) Sun servers with "Cool Threads" last month, that are heavily multi-threaded processors. One of our IT staff warned me that, unless the apps are written to take advantage of the hardware threads, these processors run slower for conventional single threaded apps than conventional CPUs. That's perfect for teaching students the tradeoffs in homework exercises.

_________________
When the stream is deep
my wild little dog frolics,
when shallow, she drinks.
Back to top
View user's profile Send private message Visit poster's website
Antimon



Joined: Jan 18, 2005
Posts: 4145
Location: Sweden
Audio files: 371
G2 patch files: 100

PostPosted: Sun Jul 05, 2009 12:31 pm    Post subject: Reply with quote  Mark this post and the followings unread

It bugs me a bit that there seems to be something a bit magical about this multicore/processor thing, when it should be pretty simple. From a programmer's point of view, I mean. So Dale, when you write a loosely coupled multithreaded app, will this be noticed by the runtime for the chosen programming language or the OS in some way? How can it be noticed?

I work mostly with Java professionally, and I've heard that Java's threading is implemented inside the VM, not using thread support from the native os. I guess it's not possible to use multiprocessing in Java program, short of starting up several VMs...

I'll check out that MIDI pipeline paper, thanks for sharing.

/Stefan

_________________
Antimon's Window
@soundcloud @Flattr home - you can't explain music
Back to top
View user's profile Send private message Visit poster's website
Acoustic Interloper



Joined: Jul 07, 2007
Posts: 2067
Location: Berks County, PA
Audio files: 89

PostPosted: Sun Jul 05, 2009 1:28 pm    Post subject: Reply with quote  Mark this post and the followings unread

Antimon wrote:
It bugs me a bit that there seems to be something a bit magical about this multicore/processor thing, when it should be pretty simple. From a programmer's point of view, I mean. So Dale, when you write a loosely coupled multithreaded app, will this be noticed by the runtime for the chosen programming language or the OS in some way? How can it be noticed?

I work mostly with Java professionally, and I've heard that Java's threading is implemented inside the VM, not using thread support from the native os. I guess it's not possible to use multiprocessing in Java program, short of starting up several VMs...

I'll check out that MIDI pipeline paper, thanks for sharing.

/Stefan

As long as the OS maps POSIX threads to hardware threads where possible, or the JVM to O.S. threads (more below), this should do it. Just have an initial thread partition the problem into (hopefully big) non overlapping data spaces, start concurrent threads to solve the subproblems, and sync the results together when the subpieces are done. Divide and conquer, just like throwing a bunch of people at a task. The application's use of threads must reflect this structure in order to utilize hardware multithreading within the application.

The project that I gave my students was to write a functional version of quicksort, and in addition, if the size of each partition after the divide phase of quicksort was greater than some threshold, spawn a thread to sort one side while the original thread sorted the other. The original thread would synchronize by doing a join, so that when the child thread exited, the original thread would then have an entirely sorted array. A more efficient approach would have been to keep a pool of worker threads and manage the pool instead of doing the join. thread creation and termination have overhead, so rather than repeatedly create and later join dying threads, we could have pooled them for reuse.

Because quicksort is recursive, the recursive threads may spawn other threads in a tree structure reflecting the recursion. The original assignment was in Python and is posted here. Python shows no speedup because the Python interpreter only allows 1 thread inside itself at a time -- a limitation in Python known as the GIL (global interpreter lock).

However, one of the students in this class rewrote it in Java and found that, on a dual-threaded PC, she got a speedup not too far from 2x. It wasn't quite 2x because of serial startup and completion. I don't have the exact numbers, but there was a definite knee in processing time at 2 threads. Beyond that there were no more hardware threads, and so additional multithreading added only costs.

So pthreads under C/C++ and Java threads using the Hotspot VM should make use of hardware threads. I may get my advanced data structures class to do an assignment along these lines this fall. I can post the results here.

Sun added java.util.concurrent library classes to get rid of some of the overhead that comes with the language's synchronized sections and methods as the default synchronization mechanism, albeit at the cost of some safety (you have to be more careful -- the language gives you more rope with which to hang yourself in the interest of efficiency).

There is some discussion of Hotspot Multithreading on Solaris here, maybe a little dated. They don't talk about Windows or OSX, but I'd guess all implementations of the VM use hardware threads where available. It would be worth a test.

Even if the CPU supports multiple hardware threads, there are tradeoffs beyond the number of hardware threading units, for example each thread takes its own stack space. Ideally there should be application config parameters for setting the number of threads / data thresholds / stack sizes / etc., not hard code, in order to take the most benefit from the hardware. This definitely takes some intentional engineering o the application. The Sun tutorial on this stuff is informative.

If you are interested, I can dig up my student's Java code and attach it here. I can't get at it right now because they changed the password protection scheme on that machine and I can't log in right now. So it goes.

EDIT 1: This book has quite a bit of discussion on writing Java applications to utilize multithreaded hardware. I don't have the full textbook.

_________________
When the stream is deep
my wild little dog frolics,
when shallow, she drinks.
Back to top
View user's profile Send private message Visit poster's website
Antimon



Joined: Jan 18, 2005
Posts: 4145
Location: Sweden
Audio files: 371
G2 patch files: 100

PostPosted: Sun Jul 05, 2009 1:47 pm    Post subject: Reply with quote  Mark this post and the followings unread

Wow, thanks! Great explanation. Smile Looks like Java is cleverer than I thought, at least on a PC. I feel like doing some experimenting of my own.

My brain tires quickly when following technical discussions on spare time these days, but I find this multiprocessing thing intriguing for some reason. Don't dig out your student's code if you don't want to, my brain may draw the line at reading code. Wink

/Stefan

_________________
Antimon's Window
@soundcloud @Flattr home - you can't explain music
Back to top
View user's profile Send private message Visit poster's website
Acoustic Interloper



Joined: Jul 07, 2007
Posts: 2067
Location: Berks County, PA
Audio files: 89

PostPosted: Sun Jul 05, 2009 1:51 pm    Post subject: Reply with quote  Mark this post and the followings unread

Antimon wrote:
Don't dig out your student's code if you don't want to, my brain may draw the line at reading code. Wink

/Stefan

I took a quick look at my "save all CD" from last semester and this code isn't there, so I may not have it. She may be back in India right now.

I'll almost certainly do some benchmarking with my advanced data structures class this fall. I will post the best solution. Hopefully it'll be mine!

_________________
When the stream is deep
my wild little dog frolics,
when shallow, she drinks.
Back to top
View user's profile Send private message Visit poster's website
Acoustic Interloper



Joined: Jul 07, 2007
Posts: 2067
Location: Berks County, PA
Audio files: 89

PostPosted: Tue Jul 07, 2009 2:57 pm    Post subject:
Subject description: some pthreads code
Reply with quote  Mark this post and the followings unread

I used the attached pthreads code to do the multithreaded divide and conquer quicksort using posix threads. May do Java later. The Sun / Solaris machine showed no improvement, so seemingly it's an old single threaded machine.

I ran it on my 3 gz windows xp tower, which claims to have 2 cpus, whatever that means. the first number in the command line is the number of pseudo-random integers to generate and sort, and the second number is a threshold. The lower the threshold, the more threads. A threshold that is > than the number of ints uses only the main thread. My PC readings below start from this condition (main thread only) and then lower the threshold repeatedly with each test run. There is no guarantee that halving the threshold doubles the number of threads. It all depends on how good my quicksort partitioning is. It's probably OK.

As far as terminology goes, my working exposure to this stuff goes back to my DSP software tools work in Bell Labs in the last decade and beginning of this one. Multi-core meant separate CPUs (or DSPs) with at least partially separate address spaces. You could not have mapped these onto a single process with multiple threads because of the disjoint address spaces. On the other hand, a multithreaded (as contrasted to multicore) CPU runs multiple hardware threads within a single address space, and can exploit its parallelism only via multithreaded code.

I guess that multiple cores could support multithreading in a single process if a memory management unit maps all the cores to a single address space. That almost appears to be what windows is doing when I watch the CPU meter. It shows 2 CPUs (whatever that means). The initial test condition, where my app runs only 1 thread, shows 1 CPU maxxed out; the other is concurrently performing some work. When I run this CPU-intensive test with multiple threads in a single process, both CPU load graphs max at 100%. So it appears that my app uses both cores even though it is a single process, and it appears that when it is running single threaded, the other core is running other apps. I can't think of a way to do both things unless the cores can be shared within a single process/address-space or alternatively distributed to multiple processes. Must use the MMU I think.

I'll post my Mac Laptop Pro numbers in a little while. Here are the windoze numbers. I internally call time() to get the elapsed time. I originally was calling clock() to get CPU time, but it wasn't clear which CPU(s) were being counted. Wall time has slop, but at least it's what we ultimately care about.

The following builds and runs under Cygwin using g++, by the way. Its should say SECS, not uSECS -- it is reporting elapsed seconds for the sort only.

$ ./thread_qsort 104857600 200000000 12345
SORT TIME IN uSECS = 44

$ ./thread_qsort 104857600 100000000 12345
SORT TIME IN uSECS = 37

$ ./thread_qsort 104857600 50000000 12345
SORT TIME IN uSECS = 32

$ ./thread_qsort 104857600 25000000 12345
SORT TIME IN uSECS = 31

$ ./thread_qsort 104857600 12500000 12345
SORT TIME IN uSECS = 31

It stops improving at 31 sceonds, and would start to grow when the overhead of starting/stopping threads becomes significant.

The fans run like crazy, by the way!


31/44 = 70%, or about a 30% improvement in going from one hardware thread to two. If they are doing this with distinct cores mapped via the MMU to a single address space, that's not bad. I assume that the reason it goes through the 37 second measurement on the first split is that my quicksort partitioning is a little lopsided.

My wife is calling me for a snack -- PRIORITY INTERRUPT! Be back with some Mac numbers later.


threadqsort.zip
 Description:

Download
 Filename:  threadqsort.zip
 Filesize:  72.22 KB
 Downloaded:  898 Time(s)


_________________
When the stream is deep
my wild little dog frolics,
when shallow, she drinks.
Back to top
View user's profile Send private message Visit poster's website
Acoustic Interloper



Joined: Jul 07, 2007
Posts: 2067
Location: Berks County, PA
Audio files: 89

PostPosted: Tue Jul 07, 2009 3:50 pm    Post subject:
Subject description: Mac laptop Pro
Reply with quote  Mark this post and the followings unread

Here are the Mac Laptop Pro numbers for a machine that says this about itself:

Hardware Overview:

Model Name: MacBook Pro
Model Identifier: MacBookPro5,1
Processor Name: Intel Core 2 Duo
Processor Speed: 2.53 GHz
Number Of Processors: 1
Total Number Of Cores: 2
L2 Cache: 6 MB
Memory: 4 GB
Bus Speed: 1.07 GHz

The "Activity Monitor" appears similar to the PC. When the first, single-threaded test runs, one so-called core is at 100% ate the other near 0%. There is some activity on the other core, but I am beginning to think that maybe these are hardware threads and you cannot run two apps with separate address spaces concurrently. I'll test that later, but for now:

Dale-Parsons-MBP:threadqsort dparson$ ./thread_qsort 104857600 200000000 12345
SORT TIME IN uSECS = 33
Dale-Parsons-MBP:threadqsort dparson$ ./thread_qsort 104857600 100000000 12345
SORT TIME IN uSECS = 30
Dale-Parsons-MBP:threadqsort dparson$ ./thread_qsort 104857600 50000000 12345
SORT TIME IN uSECS = 19
Dale-Parsons-MBP:threadqsort dparson$ ./thread_qsort 104857600 25000000 12345
SORT TIME IN uSECS = 19
Dale-Parsons-MBP:threadqsort dparson$ ./thread_qsort 104857600 12500000 12345
SORT TIME IN uSECS = 18

18/33 is about 55%, for a 45% improvement.

The second measurement is almost certainly lopsided partitioning by quicksort. The Activity Monitor shows 1 "CPU" starting right away but then quitting much earlier than the other.

This jargon:

Number Of Processors: 1
Total Number Of Cores: 2

really implies two hardware threads within a single core (as we used to call cores in Bell Labs). That's the next test -- I'll run two single-threaded processes concurrently and see what happens.

_________________
When the stream is deep
my wild little dog frolics,
when shallow, she drinks.
Back to top
View user's profile Send private message Visit poster's website
Acoustic Interloper



Joined: Jul 07, 2007
Posts: 2067
Location: Berks County, PA
Audio files: 89

PostPosted: Tue Jul 07, 2009 3:58 pm    Post subject:
Subject description: two separate CPU intensive processes
Reply with quote  Mark this post and the followings unread

This test on the Mac of two separate single-threaded CPU bound processes running concurrently shows that both cores are used. The Activity Monitor shows both being busy as well.

Dale-Parsons-MBP:threadqsort dparson$ ./thread_qsort 104857600 200000000 12345 &
[1] 410
Dale-Parsons-MBP:threadqsort dparson$ ./thread_qsort 104857600 200000000 12345 &
[2] 411
Dale-Parsons-MBP:threadqsort dparson$ wait
SORT TIME IN uSECS = 34
[1]- Done ./thread_qsort 104857600 200000000 12345
SORT TIME IN uSECS = 34
[2]+ Done ./thread_qsort 104857600 200000000 12345
Dale-Parsons-MBP:threadqsort dparson$

Looks like the cores can be mapped either to separate processes/address spaces or to a single process/address space. Memory management unit at play. (Not in our good old DSPs!)

So you can get an advantage with either multithreaded uniprocesses or load-balanced multiprocesses.

_________________
When the stream is deep
my wild little dog frolics,
when shallow, she drinks.
Back to top
View user's profile Send private message Visit poster's website
Acoustic Interloper



Joined: Jul 07, 2007
Posts: 2067
Location: Berks County, PA
Audio files: 89

PostPosted: Tue Jul 07, 2009 4:05 pm    Post subject:
Subject description: back to windoze
Reply with quote  Mark this post and the followings unread

Windows XP on the old clunky emachines shows similar twoc-ore usage on two separate, single-threaded processes, although with noticeable degradation:

$ ./thread_qsort 104857600 200000000 12345 &
[1] 1184

$ ./thread_qsort 104857600 200000000 12345 &
[2] 2956

$ wait
SORT TIME IN uSECS = 60
[1]- Done ./thread_qsort 104857600 200000000 12345
SORT TIME IN uSECS = 60
[2]+ Done ./thread_qsort 104857600 200000000 12345

_________________
When the stream is deep
my wild little dog frolics,
when shallow, she drinks.
Back to top
View user's profile Send private message Visit poster's website
Jason



Joined: Aug 12, 2004
Posts: 466
Location: Los Angeles, CA. USA

PostPosted: Tue Jul 07, 2009 8:29 pm    Post subject: Re: Quad Core Reply with quote  Mark this post and the followings unread

Jason wrote:
Anyone using any new Intel quad cores (duos or new Quad Xeons) or AMD's for that ........


Having started this thread almost 2 years ago, I thought I would chime since
I have since upgraded my computer to a Intel Core 2 Quad Q9550, 2833MHz My older system was a P4 @ 3.4Ghz.
Still on a 32bit OS I am not taking full advantage of all the features, though I need more RAM before I make a change to any 64bit OS DDR3 or not..

In regards to audio applications and my work-flow I am much happier.
Some of the apps I use claim multi-core support but not sure to what extent. Regardless, I can run projects much larger with more plugins and at higher resolutions.

I find other more day to day operations slow as ever such as web browsing or other apps. The cpu being just 1 component having up to date hard drives and other components is essential for maximum throughput. Drive fragmentation alone can bog a system down.
Back to top
View user's profile Send private message Visit poster's website
jksuperstar



Joined: Aug 20, 2004
Posts: 2503
Location: Denver
Audio files: 1
G2 patch files: 18

PostPosted: Thu Jun 10, 2010 9:46 pm    Post subject: Reply with quote  Mark this post and the followings unread

Jason,

For "overall speedup", you might look at a faster storage solution, such as an SSD. There is plenty of tuning that came out of people using SSDs, and getting even more speed/efficiency from a PC once the disk I/O isn't as slow.

Here's a link that might help you take more advantage of your new system (I assume your still running XP if it's still 32bit). It minimizes disk I/O, and what's left, pushes the unnecessary stuff into a RAMDISK. It's meant for SSD users, but it's overall good tuning advice.
http://www.ocztechnologyforum.com/forum/showthread.php?43460-Making-XP-pro-SSD-friendly
Back to top
View user's profile Send private message Visit poster's website
Acoustic Interloper



Joined: Jul 07, 2007
Posts: 2067
Location: Berks County, PA
Audio files: 89

PostPosted: Thu Jul 21, 2011 5:41 am    Post subject:
Subject description: Java outperforms optimized C++
Reply with quote  Mark this post and the followings unread

I managed to get our comp. sci. department 3 multiprocessor servers via a Sun grant in summer 2009 before Oracle gobbled them up. Two are sparc and one is amd opteron. I spent last summer learning how to use them, and taught a grad course in the spring.

One of the more surprising discoveries is that, for some application architecture benchmarks (I mostly wrote my own), Java outperforms optimized C++. I can think of 3 reasons so far.

1. Sun put all their good people compiler people on Java.
2. The JVM's just-in-time, dynamic optimizer is better at finding hotspots and using run-time profiling to make optimizations than the static approach of C++ compilers.
3. Java's very explicit memory model allows compiler writers more precision with respect to cache consistency across multiple processors.

The sparc machines are harder to optimize than the amd, largely because the cores have very small l1 caches shared by multiple threads, whereas each amd core is 1 thread with a big cache. But, there are benchmarks where the sparcs outrun the amd; apparently those benchmarks are not cache-limited.

How all this applies to audio processing is a complicated topic. As I see it, for an audio processing chain, you can multithread 2 things.

A. The pipeline of signal tranforms. Assigning a hardware thread to each pipeline stage improves throughput, but it doesn't reduce latency (which is what we typically want) unless the uniprocessor solution was starved for cpu cycles. Starving would make the latency get worse because pipeline stages would be waiting for cpu, instead of wating for data to flow.

B. An individual, bottleneck stage in a pipeline could be "parallelized across." This gets into more complicated algorithm work, but it could definitely knock down latency.

Papers are starting to appear on using GPUs for audio signal processing -- filters, fft, etc. They are good for crunching parallel arrays of data typical to signal processing, and are also finding their way into other non-graphics specialty application areas such as network packet processing. You can get hundreds of gpu threads on a card at a fraction of the cost of conventional multicore cpus of similar number, but the trick is to integrate your code onto them effectively. I expect we'll see GPU solutions for electro-music applications in the not too distant future.

_________________
When the stream is deep
my wild little dog frolics,
when shallow, she drinks.
Back to top
View user's profile Send private message Visit poster's website
Display posts from previous:   
Post new topic   Reply to topic
Page 2 of 2 [44 Posts]
View unread posts
View new posts in the last week
Goto page: Previous 1, 2
Mark the topic unread :: View previous topic :: View next topic
 Forum index » News... » PC
Jump to:  

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Forum with support of Syndicator RSS
Powered by phpBB © 2001, 2005 phpBB Group
Copyright © 2003 through 2009 by electro-music.com - Conditions Of Use